3. PGCluster PGCluster is a multi-master replication system designed for PostgreSQL open source database. PostgreSQL has no standard or default replication system. There are various third-party software packages that provide replication for this database. When using PostgreSQL in a large system, the replication and load distribution functions are required. There are many replication systems designed for PostgreSQL. We selected PGCluster and evaluated its features and performance. There are two formal PGCluster Web sites. http://pgfoundry.org/projects/pgcluster/ http://pgcluster.projects.postgresql.org/ We chose PGCluster for the following reasons. (1) High-functionality Replication System PGCluster is a multi-master system in which all databases can be the master of replication. It is a synchronous high performance replication system that guarantees returning the same result from accessing any database at any timing. (2) Requires No Special Hardware Many people think that a cluster system requires special hardware. However, PGCluster does not demand such hardware. The only thing needed is a single server that runs PostgreSQL. PGCluster does not let existing investments go to waste. (3) A Replication Solution from Japan As we said before, there are many replication solutions for PostgreSQL. PGCluster is a solution developed in Japan and one of the replication systems receiving attention. The developer is Atsushi Mitani. We think that PGCluster is widely used in Japan because Japanese documentation is available. There a very large number of downloads from pgfoundry, the Web site that provides third-party software for PostgreSQL. It is a replication solution that has come to international attention. - 3-1 -
3.1 Functional Overview This section describes the functions of PGCluster. 3.1.1 Basic Configuration PGCluster consists of the three components listed in Table 3.1-1. Table 3.1-1: PGCluster Components Component Function Receives connections from clients. Distributes the load when necessary. (It is not a hardware load balancer, but a software implementation.) Propagates the queries that generate changes such as INSERT, UPDATE, and DELETE to all cluster servers. Actually stores data and executes queries. The load balancer, replication server and cluster server are not physical units. They are logical units. The following table lists the number of servers that are required to realize the full functionality of PGCluster. Component Number of Nodes 1 1 3-3-2 -
It is preferable for each server to be physically independent. Figure 3.1-1 illustrates a typical PGCluster configuration. Client Figure 3.1-1: PGCluster Configuration Description of Figure 3.1-1 in the load balancer is the daemon that balances the load. in the cluster server is a database daemon. in the replication server is the daemon that performs replication. The symbol indicates a disk that stores data. - 3-3 -
3.1.2 Replication PGCluster has replica (copy) in all cluster servers. Changes in the database must be propagated to all databases. This requirement is satisfied by replication, and the replication server performs such replication. Replication is a mechanism that reflects the result of query from a load balancer to a cluster server to the other cluster servers. Figure 3.1-2 illustrates the operation of the replication mechanism. (2) (3) (7) (6) (1) (8) (5) (5) Figure 3.1-2: Operation of Replication Description of Figure 3.1-2 (1) Sends a query. (2) Sends a query (to an arbitrary cluster server). (3) Sends a query. Sends a query (which is executed in the cluster server). (5) Sends after executing the query. (6) Sends after executing the query. (7) Sends the result of the query. (8) Sends the result of the query. - 3-4 -
3.1.3 Load Balancing In PGCluster, a search requires a query to only one server. It is not necessary to query all servers. PGCluster has a search load distribution function. It is possible to use a setting file to specify the rate of queries sent to each database server. Load distribution is implemented with the load balancer. Figure 3.1-3 illustrates a search operation. (2) (3) (1) Figure 3.1-3: Search Operation Description of Figure 3.1-3 (1) Sends a query. (2) Sends a query (to an arbitrary cluster server for execution). (3) Sends the result of the query. Sends the result of the query. - 3-5 -
3.1.4 Degraded Operation If one of the three cluster servers in operation goes down, the failed server is automatically isolated and the database operation continues with the remaining two servers. Figure 3.1-4 illustrates a degraded operation. It is an example in which a cluster server goes down while it is processing a query sent from the replication server. (2) (3) (8) (7) (1) (9) (5) (6) Figure 3.1-4: Degraded Operation Description of Figure 3.1-4 (1) Sends a query. (2) Sends a query (to an arbitrary cluster server). (3) Sends a query. Sends a query. (While the sent query is being executed, a failure occurs on one of the cluster servers.) (5) Sends a query execution error. (Or there is no response from the server and the cluster server is isolated.) (6) Sends after executing the query. (7) Sends after executing the query. (8) Sends the result of the query. (9) Sends the result of the query. - 3-6 -
3.1.5 Online Recovery While the remaining two cluster servers are in degraded operation, the user can recover the failed server online. Figure 3.1-5 illustrates an online recovery operation. (3) (6) (5) (2) (8) (7) (1) (8) Figure 3.1-5: Online Recovery Operation Description of Figure 3.1-5 (1) Starts online recovery. (Locks one of the cluster servers, leaving only one in operation) (2) Sends a query. (3) Sends a query. Sends a query. (There is no cluster server in operation. The execution log is saved in the buffer.) (5) Sends a query execution error. (6) Sends the result of the query. (7) Sends the result of the query. (8) Sends a query. (After copying data between cluster servers, executes the query saved in the replication server and performs online recover.) - 3-7 -
3.2 Range of Application 3.2.1 Restrictions PGCluster adds the following restrictions on the operation of standalone PostgreSQL. (Excerpts from the formal PGCluster Web site) PGCluster1.0 (1) For the record OID, a unique value is automatically set within each cluster database, therefore consistency of OIDs between cluster database servers is not guaranteed. (2) Replication of large objects is not supported. PGCluster1.1 (1) If large objects are replicated, they must be placed in a directory that can be accessed from all cluster servers. PGCluster1.3 (1) If large objects are replicated, they must be placed in a directory that can be accessed from all cluster servers. (2) In an environment using TABLESPACE, cluster database recovery is not supported. (3) The Windows operating system is not supported. 3.2.2 Range of Application in Availability Use of PGCluster is effective for systems that require uninterrupted operation, because a failure in a single server does not hinder the operation of an entire system. 3.2.3 Range of Application in Performance PostgreSQL with PGCluster outperforms standalone PostgreSQL in cases where there are many search queries for which load distribution operates. When determining the possibility of introducing PGCluster, It is necessary to examine the ratio of updates to searches in a real system. - 3-8 -
3.3 Procedure and Essence of Evaluating PGCluster This section describes some viewpoints specific to PGCluster based on Chapter 2 MS Cluster Evaluation Criteria. 3.3.1 Difference in PGCluster Versions The major versions of PGCluster follow those of PostgreSQL. Currently, three versions (1.0, 1.1, and 1.3) of PGCluster are released. The following table lists the correspondence between PGCluster and PostgreSQL versions. PGCluster Version Original PostgreSQL Version 1.0 7.3 1.1 7.4 1.3 8.0 Each version of PGCluster has introduced new functions as well as support for the corresponding PostgreSQL version. At the time of this writing, PGCluster 1.1 and 1.3 are development versions which are not recommended for use in actual operations. This evaluation was performed with a basic five-server configuration. It verified that the basic PGCluster operated properly and that there is merit in using PGCluster. The evaluation items in 3.3.2 or later were therefore not covered. In the following descriptions, the items that were not conducted are indicated with not applicable. - 3-9 -