Building a High-Availability PostgreSQL Cluster Presenter: Devon Mizelle System Administrator Co-Author: Steven Bambling System Administrator ARIN critical internet infrastructure
2 What is ARIN? Regional internet registry for North America and parts of the Caribbean. Distributes IPv4 & IPv6 addresses and Autonomous System Numbers (Internet number resources) in the region Provides authoritative WHOIS services for number resources in the region
3 ARIN s Internal Data! Inside of our database exists all of the v4 and v6 networks that we manage, the organizations that they belong to, and the contacts at those organizations. This means that data integrity and how we store said data is extremely important.
4 Requirements Multi- member Automatic Failover Prevent a tainted master from coming online Needs to be ACID- Compliant
5 Why Not Slony or pgpool-ii? Slony replaces pgsql s replication Why do this? Why not let pgsql handle it? Pgpool is not ACID-Compliant Doesn t confirm writes to multiple nodes
6 Our solution CMAN / Corosync Red Hat + Open-source solution for crossnode communication Pacemaker Red Hat and Novell s solution for service management and fencing Both under active development by Clusterlabs Interested in using it due to active development by Clusterlab
7 CMAN/ Corosync Provides a messaging framework between nodes Handles a heartbeat between nodes Are you up and available? Does not provide status of service, Pacemaker does Pacemaker uses Corosync to send messages between nodes CMAN has the ability to do more - but we just use it as a messaging framework
8 CMAN / Corosync Builds a cluster ring using a configuration file Used by Pacemaker in order to pass status messages between the nodes Simply a framework for communication no heavy lifting in our implementation
9 About Pacemaker Developed / maintained by Red Hat and Novell Scalable Anywhere from a two-node to a 16node setup Scriptable Resource scripts can be written in any language Monitoring Watches out for service state changes Fencing Disables a box and switches roles when failures occur Shareable database between nodes about status of services / nodes
10 Pacemaker Master? Sync An XML database (known as a CIB - cluster information base) is generated with the status of each resource and passed between nodes The state of pgsql is controlled by Pacemaker itself Pacemaker uses a resource script to interact with pgsql Can determine the state of the service (Master / Sync / Async) Async
11 Other Pacemaker Resources Fencing Pacemaker also handles the following resources besides PGSQL: * Fencing of resources * IP Address colocation IP Addresses
From the bottom up How does it all tie together? 12
13 Pacemaker Master Replication vip Sync All slaves in the cluster point to a replication vip This interface moves to whichever node is the master - this is called a colocation constraint Another vip for our application servers to connect to follows the master as well Client vip Async App
14 Event Scenario Master Sync Master Async? In the event that a node becomes unavailable, cman notifies pacemaker to fence or shut off communication to the node via SNMP to the switch The SYNC slave becomes the Master The ASYNC slave becomes the SYNC slave Upon manual recovery, the old Master becomes the async slave If any resources inside of Pacemaker on the master fail their monitoring check, fencing occurs as well These resources include: Both replication and client vips X X Async Sync
15 PostgreSQL Still in charge of replicating data The state of the service and how it starts is controlled by Pacemaker
16 Layout cman Slave Client cman Master cman Slave
Introspection Using Tools to Look Deeper 17
# crm_mon -i 1 -Arf 18 We disable quorum within the pacemaker HA cluster to allow for failure down to a single node cluster in the event multiple nodes fail 8 Resources configured ofce::heartbeat::ipaddr2 is the resource used to create the vip can be shell, ruby, etc. Primitive vs multistate Primitive only runs on one of the nodes in the cluster (vips, fencing) Multi- state resource runs on multiple nodes (pgsql) The vips are colocated. If anything happens to either of them, the entire node fails and moves to the next master There is a specific check interval for each resource stonith for fencing
# crm_mon i 1 -Arf (cont) 19 * All of the status comes from the pgsql pacemaker resource script receiver- status is error because the resource is written to monitor and check for cascading. We don t use cascading, haven t invested cycles Master- postgresql is the weight. Uses the weight to determine whom should be promoted next in line, which is why async has INFINITY STREAMING
Questions? 20 Devon Mizelle