Asymmetric Active-Active High Availability for High-end Computing
|
|
|
- Dominick Goodwin
- 10 years ago
- Views:
Transcription
1 Asymmetric Active-Active High Availability for High-end Computing C. Leangsuksun V. K. Munganuru Computer Science Department Louisiana Tech University P.O. Box 10348, Ruston, LA 71272, USA Phone: Fax: T. Liu Dell Inc. Tong S. L. Scott C. Engelmann Computer Science and Mathematics Division Oak Ridge National Laboratory ABSTRACT Linux clusters have become very popular for scientific computing at research institutions world-wide, because they can be easily deployed at a fairly low cost. However, the most pressing issues of today s cluster solutions are availability and serviceability. The conventional Beowulf cluster architecture has a single head node connected to a group of compute nodes. This head node is a typical single point of failure and control, which severely limits availability and serviceability by effectively cutting off healthy compute nodes from the outside world upon overload or failure. In this paper, we describe a paradigm that addresses this issue using asymmetric active-active high availability. Our framework comprises of n + 1 head nodes, where n head nodes are active in the sense that they provide services to simultaneously incoming user requests. One standby server monitors all active servers and performs a fail-over in case of a detected outage. We present a prototype implementation based on a solution and discuss initial results. Keywords Scientific computing, clusters, high availability, asymmetric active-active, hot-standby, fail-over This work was supported by the U.S. Department of Energy under Contract No. contract No. DE-FG02-05ER This research is sponsored by the Mathematical, Information, and Computational Sciences Division; Office of Advanced Scientific Computing Research; U.S. Department of Energy. The work was performed at the Oak Ridge National Laboratory, which is managed by UT-Battelle, LLC under Contract No. De-AC05-00OR INTRODUCTION With their competitive price/performance ratio, COTS computing solutions have become a serious challenge to traditional MPP-style supercomputers. Today, Beowulf-type clusters are being used to drive the race for scientific discovery at research institutions and universities around the world. Clusters are popular, because they can be easily deployed at a fairly low cost. Furthermore, cluster management systems (CMSs) like, OSCAR [6, 18] and ROCKS [21], allow uncomplicated system installation and management without the need to individually configure each component separately. HPC cluster computing is undoubtedly an eminent stepping-stone for future ultra-scale high-end computing systems. However, the most pressing issues of today s cluster architectures are availability and serviceability. The single head node architecture of Beowulf-type systems itself is an origin for these predicaments. Because of the single head node setup, clusters are vulnerable, as the head node represents a single point of failure affecting availability of its services. Furthermore, the head node represents also a single point of control. This severely limits access to healthy compute nodes in case of a head node failure. In fact, the entire system is inaccessible as long as the head node is down. Moreover, a single head node also impacts system throughput performance as it becomes a bottleneck. Overloading the head node can become a serious issue for high throughput oriented systems. To a large extent, the single point of failure issue is addressed by active/hot-standby turnkey tools like HA-OSCAR [4, 5, 10], which minimize unscheduled downtimes due to head node outages. However, sustaining throughput at large scale remains an issue due to the fact that active/hot-standby solutions still run one active head node only. In this paper, we describe a paradigm that addresses this issue using asymmetric active-active high availability. Unlike typical Beowulf architectures (with or without hot-standby node(s)), our framework comprises of n + 1 servers. n head nodes are active in the sense that they provide services to simultaneously incoming user requests. One hot-standby server mon-
2 itors all the active servers and performs a fail-over in case of a detected outage. Since our research also focuses on the batch job scheduler that typically runs on the head node, our proposed high availability architecture effectively transforms a single scheduler system into a cluster running multiple schedulers in parallel without maintaining global knowledge. These schedulers run their jobs on the same compute nodes or on individual partitions, and fail-over to a hot-standby server in case of a single server outage. Sharing compute nodes among multiple schedulers without coordination is a very uncommon practice in cluster computing, but has its value for high throughput computing. Furthermore, our solution leads the path towards the more versatile paradigm of symmetric active-active high availability for high-end scientific computing using the virtual synchrony model for head node services [3]. In this model, the same service is provided by all head nodes using group communication at the back-end for coordination. Head nodes may be added, removed or fail at any time, while no processing power is wasted by keeping an idle backup server and no service interruption occurs during recoveries. Symmetric active-active high availability is an ongoing research effort and our asymmetric solution will help to understand the concept of running the same service on multiple nodes and the necessary coordination. However, in contrast to the symmetric active-active high availability paradigm with its consistent symmetric replication of global knowledge among all participating head nodes, our asymmetric paradigm maintains only backups on one standby head node for all active head nodes. This paper is organized as follows: First, we provide a review of related past and ongoing research activities. Second, we describe our asymmetric active-active high availability solution in more detail and show how the system handles multiple jobs simultaneously while enforcing high availability. Third, we present some initial results from our prototype implementation. Finally, we conclude with a short summary of the presented research and a brief description of future work. 2. RELATED WORK Related past and ongoing research activities include cluster management systems as well as active/hot-standby cluster high availability solutions. Cluster management systems allow uncomplicated system installation and management, thus improving availability and serviceability by reducing scheduled downtimes for system management. Examples are OSCAR and Rocks. The Open Source Cluster Application Resources (OSCAR [6, 18]) toolkit is a turnkey option for building and maintaining a high performance computing cluster. OSCAR is a fully integrated software bundle, which includes all components that are needed to build, maintain, and manage a mediumsized Beowulf cluster. OSCAR was developed by the Open Cluster Group, a collaboration of major research institutions and technology companies led by Oak Ridge National Laboratory (ORNL), the National Center for Supercomputing Applications (NCSA), IBM, Indiana University, Intel, and Louisiana Tech University. OSCAR has significantly reduced the complexity of building and managing a Beowulf cluster by using a user-friendly graphical installation wizard as front-end and by providing necessary management tools at the back-end. Similar to OSCAR, NPACI Rocks [21] is a complete cluster on a CD solution for x86 and IA64 Red Hat Linux COTS clusters. Building a Rocks cluster does not require any experience in clustering, yet a cluster architect will find a flexible and programmatic way to redesign the entire software stack just below the surface (appropriately hidden from the majority of users). The NPACI Rocks toolkit was designed by the National Partnership for Advanced Computational Infrastructure (NPACI). The NPACI facilitates collaboration between universities and research institutions to build cutting-edge computational environments for future scientific research. The organization is led by the University of California (UCSD), San Diego, and the San Diego Supercomputer Center (SDSC). Numerous ongoing high availability computing projects, such as LifeKeeper [11], Kimberlite [8], Linux Failsafe [12] and Mission Critical Linux [16], focus their research on solutions for clusters. However, they do not reflect the Beowulf cluster architecture model and fail to provide availability and serviceability support for scientific computing, such as a highly available job scheduler. Most solutions provide highly available business services, such as data storage and data bases. They use cluster of servers to provide high availability locally and enterprise-grade wide-area disaster recovery solutions with geographically distributed server cluster farms. HA-OSCAR tries to bridge the gap between scientific cluster and traditional high availability computing. High Availability Open Source Cluster Application Resources (HA- OSCAR [4, 5, 10]) is production-quality clustering software that aims toward non-stop services for Linux HPC environments. In contrast to previously discussed HA applications, HA-OSCAR strategies combine both the high availability and performance aspects making its methodology and infrastructure to be the first field grade HA Beowulf cluster solution that provides high availability, critical failure prediction and analysis capability. The project s main objectives focus on Reliability, Availability and Serviceability (RAS) for the HPC environment. In addition, the HA-ORCAR approach provides a flexible and extensible interface for customizable fault management, policy-based failover operation, and alert management. An active/hot-standby high availability variant of Rocks has been proposed [14] and is currently under development. Similar to HA-OSCAR, HA-Rocks is sensitive to the level of failure and aims to provide mechanisms for graceful recovery to a standby master node. Active/hot-standby solutions for essential services in scientific high-end computing include resource management systems, such as the Portable Batch System Professional (PB- SPro [19]) and the Simple Linux Utility for Resource Man-
3 agement (SLURM [22]). While the commercial PBSPro service can be found in the Cray RAS and Management System (CRMS [20]) of the Cray XT3 [24] computer system, the Open Source SLURM is freely available for AIX, Linux and even Blue Gene [1, 7] platforms. The asymmetric active-active architecture presented in this paper is an extension of the HA-OSCAR framework developed at Louisiana Tech University. 3. ASYMMETRIC ACTIVE-ACTIVE ARCHI- TECTURE The conventional Beowulf cluster architecture (see Figure 1) has a single head node connected to a group of compute nodes. The fundamental building block of the Beowulf architecture is the head node, usually referred to as primary server, which serves user requests and distributes submitted computational jobs to the compute nodes aided by job launching, scheduling and queuing software components [2]. Compute nodes, usually referred to as clients, are simply dedicated to run these computational jobs. We also note that preemptive measures for application fault tolerance, such as checkpointing, can introduce significant overhead even during normal system operation. Such overhead should be counted as down time as well, since compute nodes are not efficiently utilized. We implemented a prototype of a asymmetric activeactive high availability solution [13] that consists of three different layers (see Figure 2). The top layer has two identical active head nodes and one redundant hot-standby node, which simultaneously monitors both active nodes. The middle layer is equipped with two network switches to provide redundant connectivity between head nodes and compute nodes. A set of compute nodes installed at the bottom layer are dedicated to run computational jobs. In this configuration, each active head node is required to have at least two network interface cards (NICs). One NIC is used for public network access to allow users to schedule jobs. The other NIC is connected to the respective redundant private local network providing communication between head and compute nodes. The hot-standby server uses three NICs to connect to the outside world and to both redundant networks. Compute nodes need to have two NICs for both redundant networks. We initially implemented our prototype using a asymmetric active-active HA-OSCAR solution that consists of different job managers (see Figure 3), the Open Portable Batch System (OpenPBS [17]) and the Sun Grid Engine (SGE [23]), independently running on multiple identical head nodes at the same time. Additionally, one identical head node is configured as a hot-standby server, ready to takeover when one of the two active head node servers fails. Figure 1: Conventional Beowulf Architecture The overall availability of a cluster computing system depends solely on the health of its primary node. Furthermore, this head node may also become a bottleneck in largescale high throughput use-case scenarios. In high availability terms, the single head node of a Beowulf-type cluster computing system is a typical single point of failure and control, which severely limits availability and serviceability by effectively cutting off healthy compute nodes from the outside world upon overload or failure. Running two, or more, primary servers simultaneously and an additional monitoring hot-standby server for fail-over purpose, or asymmetric active-active high availability, is a promising solution, which can be deployed to improve system throughput and to reduce system down times. Figure 3: Normal Operation of Asymmetric Active-Active HA-OSCAR Solution Under normal system operating conditions, head node A runs OpenPBS and head node B runs SGE, both simultaneously serving users requests at tandem. Both active head nodes effectively employ the same compute nodes using redundant interconnects. Each active head node creates a different home environment for each of its resource manager, and prevents conflicts during job submission. Upon failure (see Figure 4) of one active head node, the hot-standby head node will assume the IP address and host name of the failed head node. Additionally, the same set
4 of services will transfer control to the standby node with respect to the same job management tool activated on the failed node masking users from this failure. same, if not better, availability behavior compared to an active/hot-standby HA-OSCAR system, if we consider the degraded operating mode of our prototype with two outages at the same time as downtime. Earlier theoretical assumptions and practical results (see Figure 5) using reliability analysis and tests of an active/hot-standby HA-OSCAR system could be validated for the asymmetric active/active solution. We obtained a steady-state system availability of %, which is a significant improvement when compared to 99.65%, from a similar Beowulf Linux cluster with a single head node. If we consider the degraded operating mode with two outages at the same time not as downtime, availability of our prototype is even better than the standard HA-OSCAR solution. We are currently in the process of validating our results using reliability analysis. Furthermore, we also experienced an improved throughput capacity for scheduling jobs. Figure 4: Fail-Over Procedure of Asymmetric Active-Active HA-OSCAR Solution The asymmetric active-active HA-OSCAR prototype is capable of masking one failure at a time. As long as one head node is down, a second head node failure will result in a degraded operating mode. Even under this rare condition of two simultaneous head node failures, our high availability solution provides the same capability as a regular Beowulftype cluster without high availability features. To ensure that the system operates correctly without unfruitful failovers, the system administrator must define a failover policy in the PBRM (Policy Based Recovery Management [9]) module, which allows selecting a critical head node (A or B). The critical head node has a higher priority and will be handled first by the hot-standby head node in case the DGAE (Data Gathering and Analysis Engine) detects any failures. In the rare double head node outage event, there will not be a service failover from the lower priority server to the hotstandby head node. This policy ensures that critical services will not be disrupted by failures on the high priority head node. For example, if OpenPBS job management is the most critical service, we suggest setting the server running OpenPBS as the higher priority head node. 4. PRELIMINARY RESULTS In our experimental setup, the head node A runs OpenPBS and Maui [15] and is assigned as the critical head node. The services on the head node A have the priority failover to the hot-standby head node. In case of a failure of head node A, the hot-standby head node takes over as head node A. Once head node A is repaired, the hot-standby head node will disable OpenPBS and Maui to allow those services failback to the original head node A. If head node B fails while head node A is in normal operation, the hot-standby head node will simply fail-over the SGE until the head node B is recovered and back in service again. With our lab grade prototype setup, we experienced the One of our initial concerns and one of the major challenges we encountered during prototype implementation and testing was that most if not all services in a high-end computing environment are not active-active aware, i.e. schedulers such OpenPBS and SGE do not support multiple instances on different nodes. Automatic replication of changes in job queues is not very well supported. The experience gained during implementation will be applied to future work on symmetric active-active high availability. 5. CONCLUSIONS Our lab grade prototype of an asymmetric active-active HA- OSCAR variant showed some promising results and showed that the presented architecture is a significant enhancement to the standard Beowulf-type cluster architecture in satisfying requirements of high availability and serviceability. We currently support only asymmetric active-active high availability. However, ongoing work is currently investigating the extension of the implementation to n+1 active-active architectures. Future work will be more focused on symmetric active-active high availability for high-end scientific computing using the virtual synchrony model for head node services [3]. In this architecture, services on multiple head nodes maintain common global knowledge among participating processes. If one head node fails, the surviving ones continue to provide services without interruption. Head nodes may be added or removed at any time for maintenance or repair. As long as one head node is alive, the system is accessible and manageable. While the virtual synchrony model is well understood, its application to individual services on head (and service) nodes in scientific high-end computing environments still remains an open research question. 6. REFERENCES [1] ASCII Blue Gene/L Computing Platform at Lawrence Livermore National Laboratory, Livermore, CA, USA. [2] C. Bookman. Linux Clustering: Building and Maintaining Linux Cluster. New Readers Publishing, Boston, 2003.
5 [3] C. Engelmann and S. L. Scott. High availability for ultra-scale high-end scientific computing. Proceedings of 2nd International Workshop on Operating Systems, Programming Environments and Management Tools for High-Performance Computing on Clusters (COSET-2), [4] HA OSCAR at Louisiana Tech University, Ruston, LA, USA. At [5] I. Haddad, C. Leangsuksun, and S. Scott. HA-OSCAR: Towards highly available Linux clusters. Linux World Magazine, March [6] J. Hsieh, T. Leng, and Y. C. Fang. OSCAR: A turnkey solution for cluster computing. Dell Power Solutions, pages , [7] IBM Blue Gene/L Computing Platform at IBM Research. [8] Kimberlite at Mission Critical Linux. At [9] C. Leangsuksun, T. Liu, S. L. Scott T. Rao, and Richard Libby. A failure predictive and policy-based high availability strategy for Linux high performance computing cluster. Proceedings of 5th LCI International Conference on Linux Clusters, [10] C. Leangsuksun, L. Shen, T. Liu, H. Song, and S. L. Scott. Availability prediction and modeling of high availability OSCAR cluster. Proceedings of IEEE Cluster Computing (Cluster), pages , [11] LifeKeeper at SteelEye Technology, Inc., Palo Alto, CA, USA. At [12] Linux FailSafe at Silicon Graphics, Inc., Mountain View, CA, USA. At [13] T. Liu. High availability and performance Linux cluster. Master Thesis Report at Louisiana Tech University, Ruston, LA, USA, [14] T. Liu, S. Iqbal, Y. C. Fang, O. Celebioglu, V. Masheyakhi, and R. Rooholamin. HA-Rocks: A cost-effective high-availability system for Rocks-based Linux HPC cluster, April [15] Maui at Cluster Resources, Inc., Spanish Fork, UT, USA. At [16] Mission Critical Linux. At [17] OpenPBS at Altair Engineering, Troy, MI, USA. At [18] OSCAR at Sourceforge.net. At [19] PBSPro at Altair Engineering, Inc., Troy, MI, USA. [20] PBSPro for the Cray XT3 at Altair Engineering, Inc., Troy, MI, USA. Cray.pdf. [21] ROCKS at National Partnership for Advanced Computational Infrastructure (NPACI), University of California, San Diego, CA, USA. At [22] SLURM at Lawrence Livermore National Laboratory, Livermore, CA, USA. [23] Sun Grid Engine Project at Sun Microsystems, Inc, Santa Clara, CA, USA. At [24] XT3 Computing Platform at Cray Inc., Seattle, WA, USA.
6 Figure 2: Asymmetric Active-Active High Availability Architecture Figure 5: Total Availability Improvement Analysis (Planned and Unplanned Downtime): Comparison of HA-OSCAR with Traditional Beowulf-type Linux HPC Cluster
REM-Rocks: A Runtime Environment Migration Scheme for Rocks based Linux HPC Clusters
REM-Rocks: A Runtime Environment Migration Scheme for Rocks based Linux HPC Clusters Tong Liu, Saeed Iqbal, Yung-Chin Fang, Onur Celebioglu, Victor Masheyakhi and Reza Rooholamini Dell Inc. {Tong_Liu,
A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster
A Failure Predictive and Policy-Based High Availability Strategy for Linux High Performance Computing Cluster Chokchai Leangsuksun 1, Tong Liu 1, Tirumala Rao 1, Stephen L. Scott 2, and Richard Libby³
Highly Reliable Linux HPC Clusters: Self-awareness Approach
Highly Reliable Linux HPC Clusters: Self-awareness Approach Chokchai Leangsuksun 1, Tong Liu 2,Yudan Liu 1, Stephen L. Scott 3, Richard Libby 4 and Ibrahim Haddad 5 1 Computer Science Department, Louisiana
The Best- Practices in HPC Systems
Symmetric Active/Active High Availability for High-Performance Computing System Services: Accomplishments and Limitations C. Engelmann 1,2, S. L. Scott 1, C. Leangsuksun 3, X. He 4 1 Computer Science and
On Programming Models for Service-Level High Availability
On Programming Models for Service-Level High vailability C. Engelmann 1,2, S. L. Scott 1, C. Leangsuksun 3, X. He 4 1 Computer Science and Mathematics Division Oak Ridge National Laboratory, Oak Ridge,
LSKA 2010 Survey Report Job Scheduler
LSKA 2010 Survey Report Job Scheduler Graduate Institute of Communication Engineering {r98942067, r98942112}@ntu.edu.tw March 31, 2010 1. Motivation Recently, the computing becomes much more complex. However,
Simple Introduction to Clusters
Simple Introduction to Clusters Cluster Concepts Cluster is a widely used term meaning independent computers combined into a unified system through software and networking. At the most fundamental level,
Fault Tolerant Servers: The Choice for Continuous Availability on Microsoft Windows Server Platform
Fault Tolerant Servers: The Choice for Continuous Availability on Microsoft Windows Server Platform Why clustering and redundancy might not be enough This paper discusses today s options for achieving
OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available
Phone: (603)883-7979 [email protected] Cepoint Cluster Server CEP Cluster Server turnkey system. ENTERPRISE HIGH AVAILABILITY, High performance and very reliable Super Computing Solution for heterogeneous
High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper
High Availability with Postgres Plus Advanced Server An EnterpriseDB White Paper For DBAs, Database Architects & IT Directors December 2013 Table of Contents Introduction 3 Active/Passive Clustering 4
High Availability and Clustering
High Availability and Clustering AdvOSS-HA is a software application that enables High Availability and Clustering; a critical requirement for any carrier grade solution. It implements multiple redundancy
Fault Tolerant Servers: The Choice for Continuous Availability
Fault Tolerant Servers: The Choice for Continuous Availability This paper discusses today s options for achieving continuous availability and how NEC s Express5800/ft servers can provide every company
Red Hat Enterprise linux 5 Continuous Availability
Red Hat Enterprise linux 5 Continuous Availability Businesses continuity needs to be at the heart of any enterprise IT deployment. Even a modest disruption in service is costly in terms of lost revenue
HAOSCAR 2.0: an open source HA-enabling framework for mission critical systems
HAOSCAR 2.0: an open source HA-enabling framework for mission critical systems Rajan Sharma, Thanadech Thanakornworakij { tth010,rsh018}@latech.edu High availability is essential in mission critical computing
Lustre Networking BY PETER J. BRAAM
Lustre Networking BY PETER J. BRAAM A WHITE PAPER FROM CLUSTER FILE SYSTEMS, INC. APRIL 2007 Audience Architects of HPC clusters Abstract This paper provides architects of HPC clusters with information
PolyServe Matrix Server for Linux
PolyServe Matrix Server for Linux Highly Available, Shared Data Clustering Software PolyServe Matrix Server for Linux is shared data clustering software that allows customers to replace UNIX SMP servers
HRG Assessment: Stratus everrun Enterprise
HRG Assessment: Stratus everrun Enterprise Today IT executive decision makers and their technology recommenders are faced with escalating demands for more effective technology based solutions while at
High Availability Database Solutions. for PostgreSQL & Postgres Plus
High Availability Database Solutions for PostgreSQL & Postgres Plus An EnterpriseDB White Paper for DBAs, Application Developers and Enterprise Architects November, 2008 High Availability Database Solutions
Ecomm Enterprise High Availability Solution. Ecomm Enterprise High Availability Solution (EEHAS) www.ecommtech.co.za Page 1 of 7
Ecomm Enterprise High Availability Solution Ecomm Enterprise High Availability Solution (EEHAS) www.ecommtech.co.za Page 1 of 7 Ecomm Enterprise High Availability Solution Table of Contents 1. INTRODUCTION...
High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation
High Availability & Disaster Recovery Development Project Concepts, Design and Implementation High Availability & Disaster Recovery Development Project CONCEPTS Who: Schmooze Com Inc, maintainers, core
Eliminate SQL Server Downtime Even for maintenance
Eliminate SQL Server Downtime Even for maintenance Eliminate Outages Enable Continuous Availability of Data (zero downtime) Enable Geographic Disaster Recovery - NO crash recovery 2009 xkoto, Inc. All
The Promise of Virtualization for Availability, High Availability, and Disaster Recovery - Myth or Reality?
B.Jostmeyer Tivoli System Automation [email protected] The Promise of Virtualization for Availability, High Availability, and Disaster Recovery - Myth or Reality? IBM Systems Management, Virtualisierung
High Availability Cluster for RC18015xs+
High Availability Cluster for RC18015xs+ Shared Storage Architecture Synology Inc. Table of Contents Chapter 1: Introduction... 3 Chapter 2: High-Availability Clustering... 4 2.1 Synology High-Availability
The Penguin in the Pail OSCAR Cluster Installation Tool
The Penguin in the Pail OSCAR Cluster Installation Tool Thomas Naughton and Stephen L. Scott Computer Science and Mathematics Division Oak Ridge National Laboratory Oak Ridge, TN {naughtont, scottsl}@ornl.gov
Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com
Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...
Deploying Global Clusters for Site Disaster Recovery via Symantec Storage Foundation on Infortrend Systems
Deploying Global Clusters for Site Disaster Recovery via Symantec Storage Foundation on Infortrend Systems Application Notes Abstract: This document describes how to apply global clusters in site disaster
System Software for High Performance Computing. Joe Izraelevitz
System Software for High Performance Computing Joe Izraelevitz Agenda Overview of Supercomputers Blue Gene/Q System LoadLeveler Job Scheduler General Parallel File System HPC at UR What is a Supercomputer?
LinuxWorld Conference & Expo Server Farms and XML Web Services
LinuxWorld Conference & Expo Server Farms and XML Web Services Jorgen Thelin, CapeConnect Chief Architect PJ Murray, Product Manager Cape Clear Software Objectives What aspects must a developer be aware
Executive Brief Infor Cloverleaf High Availability. Downtime is not an option
Executive Brief Infor Cloverleaf High Availability Downtime is not an option Gain value from Infor by: Delivering continuous access to mission-critical systems and data Providing uninterrupted continuity
Red Hat Cluster Suite
Red Hat Cluster Suite HP User Society / DECUS 17. Mai 2006 Joachim Schröder Red Hat GmbH Two Key Industry Trends Clustering (scale-out) is happening 20% of all servers shipped will be clustered by 2006.
Early Cloud Experiences with the Kepler Scientific Workflow System
Available online at www.sciencedirect.com Procedia Computer Science 9 (2012 ) 1630 1634 International Conference on Computational Science, ICCS 2012 Early Cloud Experiences with the Kepler Scientific Workflow
VERITAS Business Solutions. for DB2
VERITAS Business Solutions for DB2 V E R I T A S W H I T E P A P E R Table of Contents............................................................. 1 VERITAS Database Edition for DB2............................................................
Introduction History Design Blue Gene/Q Job Scheduler Filesystem Power usage Performance Summary Sequoia is a petascale Blue Gene/Q supercomputer Being constructed by IBM for the National Nuclear Security
High-Availablility Infrastructure Architecture Web Hosting Transition
High-Availablility Infrastructure Architecture Web Hosting Transition Jolanta Lapkiewicz MetLife Corporation, New York e--mail. [email protected]]LPLHU])UF]NRZVNL 7HFKQLFDO8QLYHUVLW\RI:URFáDZ'HSDUWDPent
High Availability with Windows Server 2012 Release Candidate
High Availability with Windows Server 2012 Release Candidate Windows Server 2012 Release Candidate (RC) delivers innovative new capabilities that enable you to build dynamic storage and availability solutions
Fax Server Cluster Configuration
Fax Server Cluster Configuration Low Complexity, Out of the Box Server Clustering for Reliable and Scalable Enterprise Fax Deployment www.softlinx.com Table of Contents INTRODUCTION... 3 REPLIXFAX SYSTEM
WHITE PAPER. Best Practices to Ensure SAP Availability. Software for Innovative Open Solutions. Abstract. What is high availability?
Best Practices to Ensure SAP Availability Abstract Ensuring the continuous availability of mission-critical systems is a high priority for corporate IT groups. This paper presents five best practices that
Informatica MDM High Availability Solution
Informatica MDM High Availability Solution 1 Executive Summary Informatica MDM Hub supports a number of different approaches to providing a highly available solution. Our use of the term highly available
Understanding Neo4j Scalability
Understanding Neo4j Scalability David Montag January 2013 Understanding Neo4j Scalability Scalability means different things to different people. Common traits associated include: 1. Redundancy in the
Clustering in Parallels Virtuozzo-Based Systems
Parallels Clustering in Parallels Virtuozzo-Based Systems Copyright 1999-2009 Parallels Holdings, Ltd. ISBN: N/A Parallels Holdings, Ltd. c/o Parallels Software, Inc. 13755 Sunrise Valley Drive Suite 600
High Availability for Citrix XenApp
WHITE PAPER Citrix XenApp High Availability for Citrix XenApp Enhancing XenApp Availability with NetScaler Reference Architecture www.citrix.com Contents Contents... 2 Introduction... 3 Desktop Availability...
Integration of PRIMECLUSTER and Mission- Critical IA Server PRIMEQUEST
Integration of and Mission- Critical IA Server V Masaru Sakai (Manuscript received May 20, 2005) Information Technology (IT) systems for today s ubiquitous computing age must be able to flexibly accommodate
Data Sheet: Disaster Recovery Veritas Volume Replicator by Symantec Data replication for disaster recovery
Data replication for disaster recovery Overview Veritas Volume Replicator provides organizations with a world-class foundation for continuous data replication, enabling rapid and reliable recovery of critical
be architected pool of servers reliability and
TECHNICAL WHITE PAPER GRIDSCALE DATABASE VIRTUALIZATION SOFTWARE FOR MICROSOFT SQL SERVER Typical enterprise applications are heavily reliant on the availability of data. Standard architectures of enterprise
The Benefits of Virtualizing
T E C H N I C A L B R I E F The Benefits of Virtualizing Aciduisismodo Microsoft SQL Dolore Server Eolore in Dionseq Hitachi Storage Uatummy Environments Odolorem Vel Leveraging Microsoft Hyper-V By Heidi
Online Remote Data Backup for iscsi-based Storage Systems
Online Remote Data Backup for iscsi-based Storage Systems Dan Zhou, Li Ou, Xubin (Ben) He Department of Electrical and Computer Engineering Tennessee Technological University Cookeville, TN 38505, USA
IBM Software Information Management. Scaling strategies for mission-critical discovery and navigation applications
IBM Software Information Management Scaling strategies for mission-critical discovery and navigation applications Scaling strategies for mission-critical discovery and navigation applications Contents
A High Performance Computing Scheduling and Resource Management Primer
LLNL-TR-652476 A High Performance Computing Scheduling and Resource Management Primer D. H. Ahn, J. E. Garlick, M. A. Grondona, D. A. Lipari, R. R. Springmeyer March 31, 2014 Disclaimer This document was
Windows Server Failover Clustering April 2010
Windows Server Failover Clustering April 00 Windows Server Failover Clustering (WSFC) is the successor to Microsoft Cluster Service (MSCS). WSFC and its predecessor, MSCS, offer high availability for critical
HA / DR Jargon Buster High Availability / Disaster Recovery
HA / DR Jargon Buster High Availability / Disaster Recovery Welcome to Maxava s Jargon Buster. Your quick reference guide to Maxava HA and industry technical terms related to High Availability and Disaster
Comparing TCO for Mission Critical Linux and NonStop
Comparing TCO for Mission Critical Linux and NonStop Iain Liston-Brown EMEA NonStop PreSales BITUG, 2nd December 2014 1 Agenda What do we mean by Mission Critical? Mission Critical Infrastructure principles
FlexNetwork Architecture Delivers Higher Speed, Lower Downtime With HP IRF Technology. August 2011
FlexNetwork Architecture Delivers Higher Speed, Lower Downtime With HP IRF Technology August 2011 Page2 Executive Summary HP commissioned Network Test to assess the performance of Intelligent Resilient
Parallels. Clustering in Virtuozzo-Based Systems
Parallels Clustering in Virtuozzo-Based Systems (c) 1999-2008 2 C HAPTER 1 This document provides general information on clustering in Virtuozzo-based systems. You will learn what clustering scenarios
Open Source High Availability on Linux
Open Source High Availability on Linux Alan Robertson [email protected] OR [email protected] Agenda - High Availability on Linux HA Basics Open Source High-Availability Software for Linux Linux-HA Open Source
Feature Comparison. Windows Server 2008 R2 Hyper-V and Windows Server 2012 Hyper-V
Comparison and Contents Introduction... 4 More Secure Multitenancy... 5 Flexible Infrastructure... 9 Scale, Performance, and Density... 13 High Availability... 18 Processor and Memory Support... 24 Network...
Active-Active and High Availability
Active-Active and High Availability Advanced Design and Setup Guide Perceptive Content Version: 7.0.x Written by: Product Knowledge, R&D Date: July 2015 2015 Perceptive Software. All rights reserved. Lexmark
TIBCO StreamBase High Availability Deploy Mission-Critical TIBCO StreamBase Applications in a Fault Tolerant Configuration
TIBCO StreamBase High Availability Deploy Mission-Critical TIBCO StreamBase Applications in a Fault Tolerant Configuration Richard Tibbetts, CTO, TIBCO StreamBase Table of Contents 3 TIBCO StreamBase High
PARALLELS CLOUD STORAGE
PARALLELS CLOUD STORAGE Performance Benchmark Results 1 Table of Contents Executive Summary... Error! Bookmark not defined. Architecture Overview... 3 Key Features... 5 No Special Hardware Requirements...
Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options
This product is protected by U.S. and international copyright and intellectual property laws. This product is covered by one or more patents listed at http://www.vmware.com/download/patents.html. VMware
High Availability for Databases Protecting DB2 Databases with Veritas Cluster Server
WHITE PAPER: P customize T E C H N I C A L Confidence in a connected world. High Availability for Databases Protecting DB2 Databases with Veritas Cluster Server Eric Hennessey, Director Technical Product
INUVIKA TECHNICAL GUIDE
--------------------------------------------------------------------------------------------------- INUVIKA TECHNICAL GUIDE FILE SERVER HIGH AVAILABILITY OVD Enterprise External Document Version 1.0 Published
Clustering and Queue Replication:
Clustering & Queue Replication Clustering and Queue Replication: How WatchGuard XCS Provides Fully Redundant Messaging Security Technical Brief WatchGuard Technologies, Inc. Published: March 2011 Introduction
High Availability Solutions for the MariaDB and MySQL Database
High Availability Solutions for the MariaDB and MySQL Database 1 Introduction This paper introduces recommendations and some of the solutions used to create an availability or high availability environment
Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module
Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module WHITE PAPER May 2015 Contents Advantages of NEC / Iron Mountain National
SanDisk ION Accelerator High Availability
WHITE PAPER SanDisk ION Accelerator High Availability 951 SanDisk Drive, Milpitas, CA 95035 www.sandisk.com Table of Contents Introduction 3 Basics of SanDisk ION Accelerator High Availability 3 ALUA Multipathing
Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008
Course 50400A: Designing, Optimizing and Maintaining a Database Administrative Solution for Microsoft SQL Server 2008 Length: 5 Days Language(s): English Audience(s): IT Professionals Level: 300 Technology:
Whitepaper Continuous Availability Suite: Neverfail Solution Architecture
Continuous Availability Suite: Neverfail s Continuous Availability Suite is at the core of every Neverfail solution. It provides a comprehensive software solution for High Availability (HA) and Disaster
SAN Conceptual and Design Basics
TECHNICAL NOTE VMware Infrastructure 3 SAN Conceptual and Design Basics VMware ESX Server can be used in conjunction with a SAN (storage area network), a specialized high speed network that connects computer
Veritas Cluster Server from Symantec
Delivers high availability and disaster recovery for your critical applications Data Sheet: High Availability Overview protects your most important applications from planned and unplanned downtime. Cluster
Veritas Cluster Server by Symantec
Veritas Cluster Server by Symantec Reduce application downtime Veritas Cluster Server is the industry s leading clustering solution for reducing both planned and unplanned downtime. By monitoring the status
Fault Tolerant Solutions Overview
Fault Tolerant Solutions Overview Prepared by: Chris Robinson Sales & Technical Director 24/7 Uptime Ltd October 2013 This document defines and highlights the high level features, benefits and limitations
A High Availability Clusters Model Combined with Load Balancing and Shared Storage Technologies for Web Servers
Vol.8, No.1 (2015), pp.109-120 http://dx.doi.org/10.14257/ijgdc.2015.8.1.11 A High Availability Clusters Model Combined with Load Balancing and Shared Storage Technologies for Web Servers A. B. M. Moniruzzaman,
Disaster Recovery Solutions for Oracle Database Standard Edition RAC. A Dbvisit White Paper
Disaster Recovery Solutions for Oracle Database Standard Edition RAC A Dbvisit White Paper Copyright 2011-2012 Dbvisit Software Limited. All Rights Reserved v2, Mar 2012 Contents Executive Summary... 1
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes. Anthony Kenisky, VP of North America Sales
Appro Supercomputer Solutions Best Practices Appro 2012 Deployment Successes Anthony Kenisky, VP of North America Sales About Appro Over 20 Years of Experience 1991 2000 OEM Server Manufacturer 2001-2007
Synology High Availability (SHA)
Synology High Availability (SHA) Based on DSM 4.3 Synology Inc. Synology_SHAWP_ 20130910 Table of Contents Chapter 1: Introduction Chapter 2: High-Availability Clustering 2.1 Synology High-Availability
Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module
Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module June, 2015 WHITE PAPER Contents Advantages of IBM SoftLayer and RackWare Together... 4 Relationship between
Integrated Application and Data Protection. NEC ExpressCluster White Paper
Integrated Application and Data Protection NEC ExpressCluster White Paper Introduction Critical business processes and operations depend on real-time access to IT systems that consist of applications and
TECHNICAL WHITE PAPER: DATA AND SYSTEM PROTECTION. Achieving High Availability with Symantec Enterprise Vault. Chris Dooley January 3, 2007
TECHNICAL WHITE PAPER: DATA AND SYSTEM PROTECTION Achieving High Availability with Symantec Enterprise Vault Chris Dooley January 3, 2007 Technical White Paper: Data and System Protection Achieving High
Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module
Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module June, 2015 WHITE PAPER Contents Advantages of IBM SoftLayer and RackWare Together... 4 Relationship between
SteelEye Protection Suite for Linux v8.2.0. Network Attached Storage Recovery Kit Administration Guide
SteelEye Protection Suite for Linux v8.2.0 Network Attached Storage Recovery Kit Administration Guide October 2013 This document and the information herein is the property of SIOS Technology Corp. (previously
Contingency Planning and Disaster Recovery
Contingency Planning and Disaster Recovery Best Practices Guide Perceptive Content Version: 7.0.x Written by: Product Knowledge Date: October 2014 2014 Perceptive Software. All rights reserved Perceptive
New Storage System Solutions
New Storage System Solutions Craig Prescott Research Computing May 2, 2013 Outline } Existing storage systems } Requirements and Solutions } Lustre } /scratch/lfs } Questions? Existing Storage Systems
Disaster Recovery (DR) Planning with the Cloud Desktop
with the Cloud Desktop [email protected] (866) 796-0310 www.os33.com In preparing for the unexpected, most companies put specific disaster recovery plans in place. Without planning, recovering from a disaster
Ingres High Availability Option
Ingres High Availability Option May 2008 For information contact Product Management at [email protected] This presentation contains forward-looking statements that are based on management s expectations,
Linux Cluster Computing An Administrator s Perspective
Linux Cluster Computing An Administrator s Perspective Robert Whitinger Traques LLC and High Performance Computing Center East Tennessee State University : http://lxer.com/pub/self2015_clusters.pdf 2015-Jun-14
High Availability & Disaster Recovery. Sivagopal Modadugula/SAP HANA Product Management Session # 0506 May 09, 2014
High Availability & Disaster Recovery Sivagopal Modadugula/SAP HANA Product Management Session # 0506 May 09, 2014 Legal Disclaimer The information in this document is confidential and proprietary to SAP
Enterprise Linux Business Continuity Solutions for Critical Applications
Enterprise Linux Business Continuity Solutions for Critical Applications Introducing SteelEye Protection Suite for Linux Multi-site Cluster May 15 2009 EMEA DataKeeper Our Expanding Family of Solutions
