HAVmS: Highly Available Virtual machine Computer System Fault Tolerant with Automatic Failback and close to zero downtime Memmo Federici INAF - IAPS, Bruno Martino CNR - IASI
The basics Highly available (HA) Computer Systems: aim to the continuity of services; in other words, in case of faults, in a fault tolerant system the services provisions must be restored in a very short time (tending to 0); a properly designed Hw and Sw architecture can reduce the probability of faults and their effects
System design Design requirements for our HA Computing System: general purpose; multitask; multiuser; cheap; ease of management; highly recyclable. and, in case of faults, guarantees: a very short time (tending to 0) to restore the services provisioning the system guarantees the preservation of services functionality and the consistency of the data an automatic failback (after repair the fault)
The context Acquisition and processing of data from satellite We need to: manage processes of uninterrupted analysis and data acquisition; keep the intermediate results of the computation; keep the result of the computation for long periods and share them with the scientific community of reference; minimize the human resources needed to manage the system.
The alternatives Windows Server Failover Clustering expensive Sw licenses are required (about $ 900 for each processor); twin Hw and Sw is required for the primary and the secondary servers; introduces a potential loss of data due to discrete-time mirroring every 5 to 10 minutes; automatic failback is not supported. VMware vsphere expensive Sw licenses are required (starting from $ 4000); requires certified Hw to run; the management node requires a dedicated server; It is not easy to set up.
The alternatives Red Hat Cluster Manager (Red Hat Enterprise Linux Server) Sw licenses are required ($ 500 per year) ; in case of faults the services are restored through the use of snapshots; the system has to be restarted in case of faults; the automatic failback is not supported;
Our solution: automatic failover and failback HAVmS is a HA system that guarantees the continuity of service through effective automatic mechanisms of failover and failback It is a two server system based on: XEN VMs synchronized with each other through DRBD (protocol D) and Remus Applications e.g.: Matlab, IDL, Compilers, Web server ecc. From the point of view of users the two servers are viewed as a single server each VM has its own IP
Server Sw Hw architecture Software totally open source: Linux Ubuntu Server 10.04 64 (kernel customized to support XEN) XEN Hypervisor DRBD (protocol D) Remus Our custom scripts for automatic Failback Hardware: I7 8cores Intel Processor 8 GB RAM DDR3 Services and Applications Virtual Machines Remus XEN DRBD Ubuntu Server 2 TB Hard Disk 2 network interfaces
Xen Hypervisor XEN: a virtualization platform of wide spread packaged into all major Linux distributions also adopted by commercial Sw
Distributed Replicated Block Device(DRBD) What it is: distributed mass storage system for the GNU/Linux platform. Makes it possible, among other things: the implementation of HA storage systems consisting of two (or more) servers connected by a dedicated link. Operating logic: Assigns to one of the two servers the role of primary, secondary to the other; the primary server is the only authorized to write on disks; DRBD synchronizes the data among the active server and the sleeper server at every checkpoint (40 msec)
Remus What it does: realizes a HA system through the control of virtual machines "VM managed by Xen. performs a checkpoint every 40 msec among the two machines and gives the trigger to DRBD sync. on the primary server keeps running its active VMs; the sleeper server is ready to take over in case of failure. The VMs on the sleeper server are an exact copy of the VMs running on the active server. In case of failure of the active server: the VMs continues to run on the sleeping server as if failure had never occurred (thanks to the TCP protocol the data flow is kept alive without losses).
Fault
Failover On both servers: DRBD stops the synchronization of the storage systems. On the active server: the execution of Remus stops On the sleeping server: VMs resume their execution in a state and with a storage consistent with the last committed checkpoint As result of the failover: all applications that were running on the VMs on the active server, continue to run on the sleeping server without any interruption; all active connections from the outside to the VMs are preserved.
Failback The problem: Is not automatically managed by Remus The criticality: avoid potential inconsistencies between the state of the VMs and their storage The solution: after repairing the fault a set of custom scripts that, without any action by the system manager, create the conditions for a live migration of the VMs to the active server and reestablish the conditions prior to fault
The data storage of INTEGRAL in HA INTEGRAL: INTEGRAL (INTErnational GammaRay Astrophysics Laboratory) is an European satellite dedicated to the observation from space in the energy range between 15 kev and 10 MeV AVES cluster for INTEGRAL Data Analisys (IAPS Distributed computing laboratory)
The data storage of INTEGRAL in HA Data storage subsystem (DSS): downloads data from IDSC backups these data in real time provides these data to AVES In case of fault: the dowload of data is interrupted the backup is no longer synchronized HAVmS guarantees to users the continuous availability of data (16TB) the running scientific analysis are interrupted
Continuity of service at IASI The services provided by the systems of IASI: attendance control; centralized computing; centralized storage; centralized backup; printing services management. The solution: the management of VMs is provided by HAVmS HAVmS guarantees to users the continuity of IASI s services each service is run on a dedicated VM.
Potentiality: high end solution cost : 1500 $ motherbard 1600 $ processors RAM 8 $ X GB hypothesis: 32 core 120 GB ram about 5000 Euro With 10000 $ is possible to buy the computing power necessary to provide high reliability in the services for the operation of a hospital supermicro motherboard series mp Xeon X9Qxxx
Further work Implementing a 3 nodes HA system, one active, one sleeping, the third one wake on LAN reactivable: in case of fault, the reactivable server is automatically (Wake on LAN) awakened and replaces the the damaged server. This allows for a repair with a more relaxed timing. The migration of the system toward Ubuntu 12.04 LTS recent version of Xen and DRDB (Xen 4.1.4, DRBD 8.3.11) does not require recompiling the kernel of the OS
A little gem: HAVmS 3 nodes system Solar PU Wake on LAN Raspberry PI * 700 MHz Low Power ARM1176JZ-F Applications Processor Dual core GPU Sensors *Single unit cost: 35 $ sleeper active RF unit low power ( 3 W ) unattended measurement system, for extreme environments using a three server configuration. Network switch
Conclusions HAVmS : provides an active redundancy, without interruption of services, completely automatic failover and failback and intervention times close to zero; is very cheap, it needs to carry out its functions HA only doubling hardware is extremely versatile: supports any operating system installable on XEN (including Mac OSX); supports the execution of multiple VMs: the limit it is imposed only by the number of cores and the RAM size; is general purpose and recyclable.
Thanks to Franco Giovannelli Pietro Ubertini IAPS Giuliano Sabatino IAPS Fabio Guglietta IASI Carlo Gaibisso IASI