Postgres-XC 1.2.1 Documentation | ||||
---|---|---|---|---|
Prev | Fast Backward | Chapter 1. Getting Started | Fast Forward | Next |
Note: XCONLY: The following description applies only to Postgres-XC.
Before we proceed, you should understand the basic Postgres-XC system architecture. Understanding how the parts of Postgres-XC interact will make this chapter somewhat clearer.
Postgres-XC, in short, is a collection of PostgreSQL database clusters which act as if the whole collection is a single database cluster. Based on your database design, each table is replicated or distributed among member databases.
To provide this capability, Postgres-XC is composed of three major components called GTM, Coordinator and Datanode. GTM is responsible to provide ACID property of transactions. Datanode stores table data and handle SQL statements locally. Coordinator handles each SQL statements from applications, determines which Datanode to go, and decomposes it into local SQL statements for each Datanode.
You should run GTM in a separate server because GTM has to take care of transaction requirements from all the Coordinators and Datanodes. To group multiple requirements and responses from Coordinator and Datanode running on the same server, you can configure GTM-Proxy. GTM-Proxy reduces the number of interaction and the amount of data to GTM. GTM-Proxy also helps to take care of GTM failure.
It is a good convention to run both Coordinator and Datanode in a same server because we don't have to worry about workload balance between the two. You can have any number of servers where these two components are running. Because both Coordinator and Datanode are essentially PostgreSQL database, you should configure them to avoid resource conflict. It is very important to assign them different working directory and port number.
Postgres-XC allow multiple Coordinators which accept statements from applications independently but in an integrated way. Any writes from any Coordinator is available from any other Coordinators. They acts as if they are single database. Coordinator's role is to accept statements, find what Datanodes are involved, Ode-compose incoming statements for each Datanode if needed, proxy statements to target Datanode, collect the results and write them back to applications.
Coordinator does not store any user data. It stores only catalog data to determine how to decompose the statement, where the target Datanodes are, among others. Therefore, you don't have to worry about Coordinator failure much. When the Coordinator fails, you can just switch to the other one.
GTM could be single point of failure (SPOF). To prevent this, you can run another GTM as GTM-Standby to backup GTM's status. When GTM fails, GTM-Proxy can switch to the standby on the fly. This will be described in detail in high-availability sections.
As described above, Coordinator and Datanode of Postgres-XC are essentially PostgreSQL database servers. In database jargon, PostgreSQL uses a client/server model. A PostgreSQL session consists of the following cooperating processes (programs):
Note: The following description applies both to Postgres-XC and PostgreSQL if not described explicitly. You can read PostgreSQL as Postgres-XC except for version number, which is specific to each product.
A server process, which manages the database files, accepts connections to the database from client applications, and performs database actions on behalf of the clients. The database server program is called postgres.
The user's client (frontend) application that wants to perform database operations. Client applications can be very diverse in nature: a client could be a text-oriented tool, a graphical application, a web server that accesses the database to display web pages, or a specialized database maintenance tool. Some client applications are supplied with the Postgres-XC distribution; most are developed by users.
As is typical of client/server applications, the client and the server can be on different hosts. In that case they communicate over a TCP/IP network connection. You should keep this in mind, because the files that can be accessed on a client machine might not be accessible (or might only be accessible using a different file name) on the database server machine.
The Postgres-XC can handle multiple concurrent connections from clients. To achieve this it starts ("forks") a new process for each connection. From that point on, the client and the new server process communicate without intervention by the original postgres process. Thus, the master server process is always running, waiting for client connections, whereas client and associated server processes come and go. (All of this is of course invisible to the user. We only mention it here for completeness.)