Highly Reliable Linux HPC Clusters: Self-awareness Approach



Similar documents
REM-Rocks: A Runtime Environment Migration Scheme for Rocks based Linux HPC Clusters

A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster

Asymmetric Active-Active High Availability for High-end Computing

HAOSCAR 2.0: an open source HA-enabling framework for mission critical systems

Fault Tolerant Servers: The Choice for Continuous Availability on Microsoft Windows Server Platform

Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview

Simple Introduction to Clusters

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

Whitepaper Continuous Availability Suite: Neverfail Solution Architecture

Neverfail Solutions for VMware: Continuous Availability for Mission-Critical Applications throughout the Virtual Lifecycle

Fault Tolerant Servers: The Choice for Continuous Availability

NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions

Building Reliable, Scalable AR System Solutions. High-Availability. White Paper

Blackboard Managed Hosting SM Disaster Recovery Planning Document

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

Linux clustering. Morris Law, IT Coordinator, Science Faculty, Hong Kong Baptist University

Microsoft SQL Server on Stratus ftserver Systems

LSKA 2010 Survey Report Job Scheduler

High Availability and Disaster Recovery Solutions for Perforce

Eliminate SQL Server Downtime Even for maintenance

Pervasive PSQL Meets Critical Business Requirements

Creating A Highly Available Database Solution

Analysis and Implementation of Cluster Computing Using Linux Operating System

Disaster Recovery for Oracle Database

Cisco Active Network Abstraction Gateway High Availability Solution

Neverfail for Windows Applications June 2010

An Oracle White Paper January A Technical Overview of New Features for Automatic Storage Management in Oracle Database 12c

Veritas Cluster Server from Symantec

Integrated Application and Data Protection. NEC ExpressCluster White Paper

MAKING YOUR VIRTUAL INFRASTUCTURE NON-STOP Making availability efficient with Veritas products

What s New with VMware Virtual Infrastructure

WHITE PAPER. Best Practices to Ensure SAP Availability. Software for Innovative Open Solutions. Abstract. What is high availability?

ACHIEVING 100% UPTIME WITH A CLOUD-BASED CONTACT CENTER

HAVmS: Highly Available Virtual machine Computer System Fault Tolerant with Automatic Failback and close to zero downtime

Kingston Communications Virtualisation Platforms

Improving availability with virtualization technology

Fault Tolerance in the Internet: Servers and Routers

Planning and Administering Windows Server 2008 Servers

BUILDING THE CARRIER GRADE NFV INFRASTRUCTURE Wind River Titanium Server

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

High Availability with Windows Server 2012 Release Candidate

Leveraging Virtualization in Data Centers

Oracle Database Solutions on VMware High Availability. Business Continuance of SAP Solutions on Vmware vsphere

Evaluation of Dell PowerEdge VRTX Shared PERC8 in Failover Scenario

Executive Summary WHAT IS DRIVING THE PUSH FOR HIGH AVAILABILITY?

R3: Windows Server 2008 Administration. Course Overview. Course Outline. Course Length: 4 Day

Enhancing Exchange Server 2010 Availability with Neverfail Best Practices for Simplifying and Automating Continuity

Astaro Deployment Guide High Availability Options Clustering and Hot Standby

CA Cloud Overview Benefits of the Hyper-V Cloud

Server Virtualization with Windows Server Hyper-V and System Center

Oracle Databases on VMware High Availability

Real Time Network Server Monitoring using Smartphone with Dynamic Load Balancing

The Penguin in the Pail OSCAR Cluster Installation Tool

Module 14: Scalability and High Availability

Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008

Comparing TCO for Mission Critical Linux and NonStop

Powering Converged Infrastructures

RackWare Solutions Disaster Recovery

Veritas Cluster Server by Symantec

Pervasive PSQL Vx Server Licensing

Symantec Storage Foundation and High Availability Solutions Microsoft Clustering Solutions Guide for Microsoft SQL Server

Availability Digest. Stratus Avance Brings Availability to the Edge February 2009

High Availability for Citrix XenApp

Fax Server Cluster Configuration

CA ARCserve Replication and High Availability Deployment Options for Hyper-V

7 Real Benefits of a Virtual Infrastructure

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

Deployment Options for Microsoft Hyper-V Server

The Promise of Virtualization for Availability, High Availability, and Disaster Recovery - Myth or Reality?

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

Announcing the product launch of a Mitel Virtual Mitel Communication Director (Virtual MCD)

Enhancements of ETERNUS DX / SF

Disaster Recovery for MESSAGEmanager

MS Design, Optimize and Maintain Database for Microsoft SQL Server 2008

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Syslog Analyzer ABOUT US. Member of the TeleManagement Forum

Module: Business Continuity

Vess A2000 Series HA Surveillance with Milestone XProtect VMS Version 1.0

Connect Converge / Converged Infrastructure

Synology High Availability (SHA)

Red Hat Enterprise linux 5 Continuous Availability

How To Write A Checkpoint/Restart On Linux On A Microsoft Macbook (For Free) On A Linux Cluster (For Microsoft) On An Ubuntu 2.5 (For Cheap) On Your Ubuntu 3.5.2

Microsoft Private Cloud Fast Track

7 Best Practices When SAP Must Run 24 x 7

Technical Paper. Leveraging VMware Software to Provide Failover Protection for the Platform for SAS Business Analytics April 2011

System Software for High Performance Computing. Joe Izraelevitz

Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V

Real-Time Analysis of CDN in an Academic Institute: A Simulation Study

Transcription:

Highly Reliable Linux HPC Clusters: Self-awareness Approach Chokchai Leangsuksun 1, Tong Liu 2,Yudan Liu 1, Stephen L. Scott 3, Richard Libby 4 and Ibrahim Haddad 5 1 Computer Science Department, Louisiana Tech University 2 Enterprise Platforms Group, Dell Corp. 3 Oak Ridge National Laboratory, 4 Intel Corporation, 5 Ericsson Research 1 {box, yli010}@latech.edu 2 Tong_Liu@dell.com 3 scottsl@ornl.gov 4 rml@hpc.intel.com 5 ibrahim.haddad@ericsson.com Abstract. Current solutions for fault-tolerance in HPC systems focus on dealing with the result of a failure. However, most are unable to handle runtime system configuration changes caused by transient failures and require a complete restart of the entire machine. The recently released HA-OSCAR software stack is one such effort making inroads here. This paper discusses detailed solutions for the high-availability and serviceability enhancement of clusters by HA- OSCAR via multi-head- failover and a service level fault tolerance mechanism. Our solution employs self-configuration and introduces Adaptive Self Healing (ASH) techniques. HA-OSCAR availability improvement analysis was also conducted with various sensitivity factors. Finally, the paper also entails the details of the system layering strategy, dependability modeling, and analysis of an actual experimental system by a Petri net-based model, Stochastic Reword Net (SRN). 1 Introduction One of the challenges in a clustered environment is to keep system failure to a minimum and to provide the highest possible level of system availability. If not resolved in a timely fashion, such failures can often result in service unavailability/outage that may impact businesses, productivity, national security, and our everyday lives. High- Availability (HA) computing strives to avoid the problems of unexpected failures through redundancy and preemptive measures. Systems that have the ability to hot-swap hardware components can be kept alive by an OS runtime environment that understands the concept of dynamic system configuration. Furthermore, multiple head-s (sometimes called service s) can be used to distribute workload while consistently replicating OS runtime and critical services, as well as configuration information for fault-tolerance. As long as one or more head-s survive, the system can be kept consistent, accessible and manageable. 1 Research supported by Center for Entrepreneurship and Information Technology, Louisiana Tech University. 3 Research supported by U. S. Department of Energy grant, under contract No. DE-AC05-00OR22725 with UT- Battelle, LLC.

2. HA-OSCAR Architecture A typical Beowulf cluster consists of two types: a head server and multiple identical client s. A server or head is responsible for serving user requests and distributing them to clients via scheduling/queuing software. Clients or compute s are normally dedicated to computation [4]. However, this single head- architecture is a single-point-of-failure-prone of which the head- outage can render the entire cluster unusable. Figure 1 HA-OSCAR architecture There are various techniques to implement cluster architecture with high-availability. These techniques include /, /hot-standby, and /cold standby. In the /, both head s simultaneously provide services to external requests and once one head is down, the other will take over total control. Whereas, a hot standby head monitors system health and only takes over control when there is an outage at the primary head. The cold standby architecture is very similar to the hot standby, except that the backup head is activated from a cold start. Our key effort focused on simplicity by supporting a self-cloning of cluster master (redundancy and automatic failover). Although, the aforementioned failover concepts are not new, HA-OSCAR simple installation, combined HA and HPC architecture are unique and its 1.0 beta release is the first known field-grade HA-Beowulf cluster release. 3. HA-OSCAR Serviceability Core. Our existing HA-OSCAR and OSCAR installation and deployment mechanism employs Self-build- a self-configuration approach through an open source OS image

capture and configuration tool, SystemImager, to clone and build images for both compute and standby head s. Cloned images can be stored in a separate image servers (see Figure 1) which facilitate upgrade and improve reliability with potential rollback and disaster recovery. 3.1 Head-Node Cloning As stated in Section 3, our approach of removing the head single-point-offailure is to provide a hot-standby for the head. The hot-standby head is a mirror of the and will process user requests when the head fails. To simplify the construction of this environment, we begin with a standard OSCAR cluster build on the and then clone or duplicate that to the hot-standby head. As shown in Figure 3, the network configuration differs between the and hot-standby by the public IP address, thus they are not exact duplicates of one another. Presently, hardware must be identical between the and standby machine. Once cloned, both head s will contain the Linux operating system and all the necessary components for an OSCAR cluster [4] including: C3, LAM/MPI, LUI, Maui PBS Scheduler, MPICH, OpenSSH, OpenSSL, PBS, PVM, and System Installation Suite (SIS). previous state, # counter, recovery failure sw itc h ov er & tak e c ontrol a t the standby D etect working threshold reached Alert. previous state, # counter, recovery after # retry Failover After the primary repair, then optional Fallback Figure 2: Adaptive Self-Healing State Diagram 3.2 Adaptive Self-Healing (ASH) technique HA-OSCAR addresses service level faults via the ASH technique, whereby some service level failures are handled in a more graceful fashion. Figure 2 illustrates an ASH state-transition diagram in order to achieve service level fault tolerance. In a perfect condition, all the services are functioning correctly. The ASH MON daemon monitors service availability and health at every tunable interval and triggers alerts upon failure detection. Our current monitoring implementation is a polling approach in which a default interval is 5 seconds. However, this polling interval is tunable for

IDC IDC IDC IDC the faster detection time. Section 6 entails impacts and analysis of different polling interval times. 3.3 Server Failover Figure 3 shows an example of the HA-OSCAR initial network interface configuration during a normal operation. In our example, HA-OSCAR assigns the primary server public IP address Eth0 p : with 138.47.51.126 and its private IP address Eth1 p : as 10.0.0.200. We then configure an alias private IP address Eth1:0 p : as 10.0.0.222. For the standby server, the public IP address Eth0 s : is initially unassigned and its private IP address is Eth1 s : 10.0.0.150. Primary Head Public Network (Internet) Ether0 p :138.47.51.126 Ether0 p :unassigned Standby Server Primary Head outage Public Network (Internet) Ether0 p : Ether0 p :138.47.51.126 Standby Server Ether1 p :10.0.0.200 Ether1:0 p :10.0.0.222 Private Network (to compute s) Ether1 s :10.0.0.150 Ether1 p : Ether1 s :10.0.0.200 Ether1:0 p : Private Network (to compute s) Figure 3: Network Interface Configuration during normal condition and after failover When a primary server failure occurs, all its network interfaces will drop. The standby server takes over and clones the primary server network configuration shown in Figure 3 (on the right). The standby server will automatically configure its public network interface Eth0 s : to the original public IP address 138.47.51.126 and private network interface Eth1 s : to 10.0.0.200. This IP cloning only takes 3-5 seconds. 4. Implementation and Experimental Results HA-OSCAR should support any Linux Beowulf cluster. We have successfully verified a HA-OSCAR cluster system test with the OSCAR release 3.0 and RedHat 9.0. The experimental cluster consisted of two dual xeon server head s, each with 1 GB RAM, 40 GB HD with at least 2GB of free disk space and two network interface cards. There were 16 client s that were also Intel dual xeon servers with 512 MB RAM and a 40 GB hard drive. Each client was equipped with at least one network interface card; Head and client s are connected to dual switches as shown in Figure 1.

5. Modeling and Availability Prediction In order to gauge an availability improvement based on the experimental cluster, we evaluated our architecture, its system failure and recovery with a modeling approach. We also studied the overall cluster uptime and the impact of different polling interval sizes in our fault monitoring mechanism. Stochastic Reward Nets (SRN) technique has been successfully used in the availability evaluation and prediction for complicated systems, especially when the time-dependent behavior is of great interest [5]. We utilized Stochastic Petri Net Package (SPNP) [6] to build and solve our SRN model. Availability 100.00% 99.00% 98.00% 97.00% 96.00% 95.00% 94.00% 93.00% 92.00% 91.00% 90.00% 90.580% 99.9684% HA-OSCAR solution vs traditional Beowulf T otal Availability im pacted by service s 99.9896% 91.575% 99.9951% 99.9962% 99.9966% 99.9968% 92.081% 92.251% 92.336% 92.387% 1000 2000 4000 6000 8000 10000 Beowulf 0.905797 0.915751 0.920810 0.922509 0.923361 0.923873 HA-oscar 0.999684 0.999896 0.999951 0.999962 0.999966 0.999968 Noda-wise mean time to failure (hr) 100.000% 99.995% 99.990% 99.985% 99.980% 99.975% 99.970% 99.965% 99.960% Model assumption: 99.955% - scheduled downtime=200 hrs - nodal MTTR = 24 hrs 99.950% - failover time=10s - During maintainance on the head, standby acts as primary Figure 4: HA-OSCAR and traditional Linux HPC: the total availability improvement analysis We calculated instantaneous availabilities of the system and its parameters. Details can be founded in [1]. We obtained a steady-state system availability of 99.993%, which was a significant improvement when compared to 99.65%, from a similar Beowulf cluster with a single head. Furthermore, higher serviceability such as the abilities to incrementally upgrade and hot-swap cluster OS, services, applications and hardware, will further improve planned downtime which undoubtedly benefits the overall aggregate performance. Figure 4 illustrates the total availability (planned and unplanned downtime) improvement analysis of our HA-OSCAR dual-heads vs. a single service Beowulf clusters when exploiting redundant service s for both fault tolerance and hot and incremental upgrade. 6. Conclusions and Future Work Our proof-of-concept implementation, experimental and analysis results [1, 2, and here] suggest that HA-OSCAR solution is a significant enhancement and promising solution to providing a high-availability Beowulf cluster class architecture. The availability of our experimental system improves substantially from 99.65% to 99.9935%. The polling interval for ASH failure detection indicates a linear behavior to the total cluster availability. The introduction of the hot-standby server is clearly

cost-effective method when compared with an investment of a typical cluster since a double outage of the servers are very unlikely, therefore, clusters with HA-OSCAR are likely much more available than a typical Beowulf cluster. We have furthered our investigation from outage detection to prediction techniques [10], - multihead failover and grid-enabled HA-OSCAR In addition, we have recently investigated a job queue migration mechanism based on Sun Grid Engine (SGE) and experimented with Berkley Lab Checkpoint/Restart (BLCR) and LAM/MPI [7] for an automated checkpoint and restart mechanism to enhance fault tolerance in HA- OSCAR framework. Our initial findings [9] are very promising. We will extend transparent failure recovery mechanisms for more options, including a sophisticated rule-based recovery and integration with openmpi (a unified MPI platform development effort from LA/MPI, FT/MPI [8] and LAM/MPI [7] teams). 7. References 1. C. Leangsuksun, L. Shen, T. Liu, H. Song, S. Scott, Availability Prediction and Modeling of High Availability OSCAR Cluster, IEEE International Conference on Cluster Computing (Cluster 2003), Hong Kong, December 2-4, 2003. 2. C. Leangsuksun, L. Shen, T. Liu, H. Song, S. Scott, Dependability Prediction of High Availability OSCAR Cluster Server, The 2003 International Conference on Parallel and Distributed Processing Techniques and Applications (PDPTA'03), Las Vegas, Nevada, USA, June 23-26, 2003. 3. B Finley, D Frazier, A Gonyou, A Jort, etc., SystemImager v3.0.x Manual, February 19, 2003. 4. M J. Brim, T G. Mattson, S L. Scott, OSCAR: Open Source Cluster Application Resources, Ottawa Linux Symposium 2001, Ottawa, Canada, 2001 5. J Muppala, G Ciardo, K. S. Trivedi, Stochastic Reward Nets for Reliability Prediction, Communications in Reliability, Maintainability and Serviceability: An International Journal published by SAE International, Vol. 1, No. 2, pp. 9-20, July 1994. 6. G Ciardo, J. Muppala, K. Trivedi, SPNP: Stochastic Petri net package. Proc. Int. Workshop on Petri Nets and Performance Models, pp 142-150, Los Alamitos, CA, Dec. 1989. IEEE Computer Society Press. 7. S. Sankaran, J. M. Squyres, B. Barrett, A. Lumsdaine, Jason Duell, Paul Hargrove, and Eric Roman. The LAM/MPI Checkpoint/Restart Framework: System-Initiated Checkpointing. In LACSI Symposium, Santa Fe, NM, October 27-29 2003. 8. G.E. Fagg, E.Gabriel, Z. Chen, T.Angskun, G. Bosilca, A.Bukovsky and J.J.Dongarra: Fault Tolerant Communication Library and Applications for High Performance Computing, LACSI Symposium 2003, Santa Fe, NM, October 27-29, 2003. 9. C.V Kottapalli, Intelligence based Checkpoint Placement for Parallel MPI programs on Linux Clusters, Master Thesis Report, Computer Science Program, Louisiana Tech University, August 2004 (In preparation) 10. C. Leangsuksun et al, A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster, The 5th LCI International Conference on Linux Clusters: The HPC Revolution 2004, Austin, TX, May 18-20, 2004.