HAOSCAR 2.0: an open source HA-enabling framework for mission critical systems

HAOSCAR 2.0: an open source HA-enabling framework for mission critical systems Rajan Sharma, Thanadech Thanakornworakij { tth010,rsh018}@latech.edu High availability is essential in mission critical computing systems to enable breakthrough science and to advance economic and business developments, especially in today s digital world. HA systems are increasingly vital due to their ability to sustain critical services to users. To stay competitive, most companies need more reliable systems for supporting their daily business work. Thus, we foresee the critical importance of enabling the cyber infrastructure with HA. The HAOSCAR 2.0 is independent of OSCAR, and now it supports both Debian and Red Hat based Linux systems. We verified HAOSCAR 2.0 by testing it in Ubuntu 9.10 server edition for Debian support and CentOS 5.6 for Red Hat support. The way HAOSCAR enhances HA is by adopting component redundancy to eliminate single-point-of-failure; it also incorporates a self-healing mechanism, failure detection, automatic synchronization, fail-over and fail-back. In the next release,we plan to have API for advance users in which developers and administrators can extend the functionality of HAOSCAR using the provided hooks. API's will allow users to create event notification services and powerful rule based systems. They can also be used to determine the state of the monitored services. In this article, we also provide examples of existing and new systems where HAOSCAR 2.0 acts as an application in improving HA. The example systems are web application and an exchange server system that exchanges the patients' data between hospitals. The need of the high availability systems is increasing dramatically. Every company or organization needs more reliable systems for supporting its daily business work. We see the demand of high availability of other systems. Initially, HAOSCAR was tied with OSCAR packages. HAOSCAR eliminated the single-point-of-failure of the clusters by duplicating a head node in the OSCAR clusters. If the head node fails, all applications in the cluster may fail. To alleviate such a failure, HAOSCAR provides a self-healing mechanism, failure detection and recovery, failover and fail-back in OSCAR clusters. HAOSCAR added a secondary head node into the system when the primary node fails. The secondary head node will take the responsibility. The HAOSCAR team sees the need for computer systems to use high availability in scientific discovery as well as critical business problems. We developed a new version of HAOSCAR that is not tied to OSCAR. HAOSCAR supports many cluster management technology systems like OSCAR, Debian and Red Hat based systems. We are planning to support ROCKS in future. The main goal of the new HAOSCAR 2.0 project is to improve flexibility, provide an open solution, and provide a combined power of HA and performance computing solution. HAOSCAR should support most IT infrastructure based on Linux operating system (such as web servers, and clusters) by providing the much-needed redundancy for mission critical grade applications. To achieve high availability, component redundancy is adopted to eliminate single point of failure. Our enhancement incorporates a self-healing mechanism, failure detection, automatic synchronization, failover and fail-back. HAOSCAR provides a simple High Availability solution for users. The installation process includes a few steps asking for information from the user. HAOSCAR includes a feature to clone the system in the installation step to make the data and software stacks consistent. If the primary component has failed, the clone node takes over the head node responsibilities. HAOSCAR also has a feature to monitor services

with a flexible event-driven rule-based system. Moreover, HAOSCAR provides data synchronization between the primary and secondary system. All of these features are enabled in the installation process. HAOSCAR 2.0 Hardware Architecture Figure 1 illustrates the architecture of HA-OSCAR. The beta release supports a cloud of private network interfaces and failover (as shown in figure 1) and external reliable storage. Users can manually add and configure more NICs, switches, and external storage. Figure 1 HAOSCAR Hardware Architecture. HAOSCAR consists of the following major system components: 1. Primary server: responsible for receiving and distributing the requests to specified clients. Each server has at least two NICs; one is connected to the Internet by a public network address, one is connected to a local LAN to which both head nodes are connected, and additional optional NICs may be connected anywhere.

2. Standby primary server: activates its services, monitors primary server and takes over when a failure in the primary server is detected. 3. Multiple clients(optional): dedicated to computation. 4. Local LAN Switches: provide local connectivity among head and client/compute nodes. Each head node should have at least two NICs. One of the NICs is a public interface to an outside network and the other is a private interface to its local LAN and computing nodes. The configuration depends on how a user wants to connect their NICs to either the public or private network. Our illustrations assume eth0 is a private interface and eth1 is public interface. Figure 2 shows sample of HA-OSCAR head node network configuration. HAOSCAR 2.0 Software Architecture Figure 2 Sample of HAOSCAR head node network configuration Figure 3 HAOSCAR Software Architecture

HAOSCAR combines many existing technology packages together to provide a HA solution. HAOSCAR software architecture has three components as shown in Figure3. The first component is IP monitoring using heartbeat. Heartbeat is a service designed to detect failure of the physical components, such as network and IP service. When the primary node is not healthy, Heartbeat handles IP fail-over and failback mechanism. The second component is the service monitoring, MONIT. MONIT is a small and light weight service used to monitor the health of important services to make those services highly available. MONIT will attempt to restart the failing services for four times by default and is tunable. If the services can not be successfully restarted, MONIT will send the message to heartbeat to trigger fail-over. The third component, data synchronization is provided by a service called hafilemon. During a fail-back event, data will synchronize from the secondary server to the primary server. This backwards synchronization, occurs to propagate changes made to the secondary server while it is the head node. By default, hafilemon will invoke rsync two minutes after it detects the first change in files, to allow groups of changes to transmit together. Users can change this time according to the need of their applications. System-Imager is used for cloning the primary node during the installation process. It creates a standby head node image from the primary server. Moreover, when users need to replace the secondary node, they simply run a script to invoke System-Imager to clone the primary system to a new system. Improvements on Beta Release over Earlier Versions of HAOSCAR In this new beta version of HAOSCAR 2.0, there have been several enhancements, and new features are introduced as compared to the earlier versions. HAOSCAR 2.0 project's primary goal is to remove several OSCAR dependencies in HAOSCAR and to reintegrate these into its core making it a truly stand alone High Availability solution for any cluster or server platform. The new features in this beta release include 1. Head node redundancy : This beta version of HAOSCAR 2.0 supports Active/Hot Standby for the head node, whereas the active-warm standby was the initial model of choice for the earlier versions. We also plan to support the Active-Active multi head architecture in future release. The active-active model enables both performance and availability, because both head nodes can provide services simultaneously. However, its implementation is quite complicated and leads to data inconsistency when failures occur. 2. Heartbeat Integration : Heartbeat provides a primary node outage detection and fail over and fail back mechanism through serial line and UDP connectivity. In this beta release, heartbeat uses a set of virtual IP address to make the fail back mechanism possible whereas in the earlier versions the heartbeat was setup to use the primary server IP address which was preventing fail back from occurring. The current version of heartbeat only supports a pair of node. In the future release we plan to use HAL-R2( HA Linux Release 2) which is the major revision of the entire Linux system, to extends heartbeat's functionality to support multiple nodes for monitoring resources for correct operation. We also plan to integrate HAL-R2 with Active-Active multi-node architecture so that this model enables high performance and high availability for efficient computing. 3. Networking Interfaces : In this beta version of HA-OSCAR 2.0, we are able to select multiple virtual IPs through local and external networks to support multiple networking interfaces. We can assign multiple IPs when setting up the primary server during the installation whereas the earlier versions could only assign single IPs during the installation and so it did not had support for multiple interfaces.

4. Services : In this beta version, a service called hafilemon which is a daemon monitors changes made in the given directory trees and call rsync accordingly. To prevent calling rsync too frequently, a user can set the smallest synchronization time interval threshold so that the hafilemon daemon will not call rsync if the current time - last sync time < the threshold. It will call rsync later when the threshold criteria is met. Hafilemon has been implemented using Sys:Gamin for conceptual testing, and it will be implemented using inotify in C in near future. This module is a new module. Hafilemon service was not available in earlier versions of HA- OSCAR. 5. Cross Platform Support : Earlier versions of HAOSCAR 2.0 only supported Debian based system so fewer member of the Linux community were able to use HAOSCAR. But this beta release of HAOSCAR 2.0 supports both Debian and Red Hat based system making it available to the whole Linux community. HAOSCAR 2.0 Application Web Application HAOSCAR provides a high availability to mission critical application that are running on a web server. There are number of factors that makes a web application unavailable that are running on web server such as hardware/software failure, power issues and routine maintenance in web server. HA OSCAR supports many key feature of high availability that makes a web server achieve maximum uptime. During the installation process of a HA-OSCAR 2.0 on a web server, it makes a clone of primary web server which act as a standby server, and we have to specify a path of folder where web application files and MYSQL database table are kept for data synchronization between the primary and standby server. The primary and standby server should have a homogeneous hardware and network card should support a PXE boot. Primary web server receives a request from clients and serve the requests, when failure occurs on primary server, standby web server will take over primary server and it will configure same IP address as primary web server so that all the requests is redirected to stand by server. Any changes that is made in database when standby server is active is synchronized with the primary server to ensure the data consistency. The synchronization is done with rsync, rsync synchronizes directories, it copies the content of one directory and makes it look exactly like the other one. Rsync works by getting a list of files in your source and destination directories, comparing them as per specified criteria (file size, creation/modification date or checksum) and then making the destination directory reflect all the changes which happened to the source since the last synchronization session. When the primary web server is available again, and the repair is completed, by default this server will be the standby server. If users need the fixed server to be the primary server, then they have to run the failback script to let the fixed server be the primary server. Patient Data Exchange Server An application that is currently in development to have HA-Capabilities is the Patient Data Exchange Server. It is a application that is used to exchange and manage a patient's record and data among health care providers with secure and reliable process. It act like a broker among health care provider and

provides a secure and reliable services such as health care provider registration, patient data exchange brokerage, patient data transfer in push and pull mode, and data exchange support via various open standards such as HL7, DICOM and future standard protocol, as shown in Figure 4. Backed up by high availability infrastructure from HAOSCAR, it would not only have maximum uptime but in cases of downtime, it would seamlessly synchronize all the data from the primary server to the standby server. It would in turn get the primary server's ip address, thus making the Patient Data Exchange Server highly available. To make it all easy and one time process for the specific system administrator dealing with the exchange server, we have decided that it would be more helpful if HAOSCAR and the exchange server were to be combined and integrated into one single package. Doing so would not only install the exchange server but it would make it HA enabled as well. The package would first install all the components related with the exchange server in the appropriate destinations and then after completion, start the HAOSCAR installation,during installation process we have to specify the web application directories and databases utilized by the exchange server which would automatically be picked for synchronization by HAOSCAR. Figure 4 Patient Data Exchange Server with HA Cybertools Petashare Petashare is the project in the Louisiana Optical Network Initiative (LONI) that supports the need to collaborate and share scientific large scale data in seven Louisiana campuses. The petashare project provides data management and scheduling and storage tools for supporting the collaborative researches of scientists. Petashare storage is managed by the Integrated Rule-Oriented Data System (irods). Every campus will have irods for sharing and handling the data across the network. irods is composed of two servers, Data server and Metadata server. Data server is the server that stores large scale scientific data. Metadata keeps the location of every file in data servers in the network. Metadata servers are replicated across the network. To make Petashare highly available, HAOSCAR 2.0 creates a secondary server for the Metadata server. If the Metadata server failed, the secondary server will take over the responsibility of the primary server.

Users can still access the petashare storage across the network. In the case of Data server failure, the system does not support replicating the data in the same site yet. If files that a user needs are in the server that failed, the user cannot get the files. Rocks Cluster Distribution The previous versions o HAOSCAR 2.0 just supported only OSCAR cluster but the new HAOSCAR supports Rocks also. After the user finishes setting up the Rocks cluster, HAOSCAR will clone the head node of Rocks to create a standby head node. When the primary head node has failed, the secondary head one will take over the responsibility. After the primary head node is fixed, it will be the standby node. If a user want to change the fixed head node to be primary node, HAOSCAR provides manual script to do so. High Availability Tools Configuration and Installation (HATCI) Customization Service Monitoring (Monit) Configuration HAOSCAR 2.0 uses Monit to monitor and maintain services that need to be Highly Available. If users are familiar with Monit, they may configure Monit manually. Listed below are some basic configuration options. In /etc/monit/monitrc: set daemon 120 set httpd port 2188 and use address localhost allow localhost # allow admin:monit Set daemon 120 sets Monit to run as a service daemon that polls each watched process/file every 120 seconds. Set httpd port 2188 and use address localhost / allow localhost spawns a httpd daemon that Monit may use to access its daemon functionality during runtime operation. Allow admin:monit allows anyone to remotely login to the HTTPD service as "admin" with the password "Monit". This allows for remote maintenence of the monitored services. For each system critical service, Monit needs a pair of checks. First is a process check that is used by Monit to maintain the process. Second is a pid file check that is used to coordinate between Monit and hearbeat fail-over. In the example below, HAOSCAR 2.0 monitors a sshd service. If the service is not running or could not comunicate via port 22, it will restart the service up to five times. If the service still does not work, HAOSCAR will run the fail-over script and give up trying to restart the service. For example: check process sshd with pidfile /var/run/sshd.pid start program = "/etc/init.d/ssh start" stop program = "/etc/init.d/ssh stop" if 5 restarts within 5 cycles then timeout

if failed port 22 protocol ssh then restart check file sshdpid with path /var/run/sshd.pid if changed timestamp for 5 cycles then exec "/bin/sh /usr/bin/fail-over" Please refer to http://mmonit.com/wiki/monit/configurationexamples for additional examples of service configurations, but be sure to include the pid file check for Monit-Heartbeat interaction. IP Availability (Heartbeat) Configuration In /etc/ha.d/ha.cf: logfile /var/log/haoscar/heartbeat.log udpport 694 logfacility local0 keepalive 2 deadtime 30 initdead 120 bcast eth0 Udpport defines which port the primary and secondary servers communicate across. Keepalive specifies the polling interval the Primary server uses to reassert itself to the Secondary server. Deadtime is how long the Secondary server will wait without reassertion from the Primary server before taking over the IPs. Initdead is the deadtime used when first bringing the system online. Bcast defines which NIC heartbeat uses to communicate between nodes. In /etc/ha.d/haresources: Primary-Server 192.168.0.9 192.168.1.9 Provides virtual IPs to be used by Heartbeat, and which system they belong to by default. Both files must remain synchronised across both servers. Data Syncronization (HA-OSCAR filemon) Configuration In /etc/init.d/ha-oscar-filemon: start-stop-daemon --start --pidfile $PID_FILE --background --make-pidfile --exec $DAEMON -- --recursive --period 120 --primary=$primary --secondary=$secondary $watch_dirs --period defines how rsync will wait to transmit new sets of changes to the Secondary server.