P03 Linux on POWER OSS High Availability



Similar documents
Open Source High Availability on Linux

High Availability Low Dollar Clustered Storage

High Availability Solutions for MySQL. Lenz Grimmer DrupalCon 2008, Szeged, Hungary

High Availability with DRBD & Heartbeat. Chris Barber

Red Hat Cluster Suite

Red Hat Enterprise linux 5 Continuous Availability

High Availability Databases based on Oracle 10g RAC on Linux

Open-Xchange Server High Availability

MySQL High Availability Solutions. Lenz Grimmer OpenSQL Camp St. Augustin Germany

High availability infrastructures for TYPO3 Websites

Multiple Public IPs (virtual service IPs) are supported either to cover multiple network segments or to increase network performance.

Scalable Linux Clusters with LVS

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

Synology High Availability (SHA)

Implementing the SUSE Linux Enterprise High Availability Extension on System z

High Availability Storage

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Tushar Joshi Turtle Networks Ltd

short introduction to linux high availability description of problem and solution possibilities linux tools

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Comparing TCO for Mission Critical Linux and NonStop

High Availability Low Dollar Load Balancing

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

3 Red Hat Enterprise Linux 6 Consolidation

High Availability Solutions for the MariaDB and MySQL Database

Achieving High Availability

Integrated Application and Data Protection. NEC ExpressCluster White Paper

Red Hat Enterprise Linux as a

Linux Cluster. Administration

Availability Digest. Redundant Load Balancing for High Availability July 2013

HP Serviceguard Cluster Configuration for HP-UX 11i or Linux Partitioned Systems April 2009

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

DISASTER RECOVERY WITH AWS

LISTSERV in a High-Availability Environment DRAFT Revised

Twin Peaks Software High Availability and Disaster Recovery Solution For Linux Server

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Parallels. Clustering in Virtuozzo-Based Systems

How to Choose your Red Hat Enterprise Linux Filesystem

High-Availability Using Open Source Software

WHITE PAPER. Best Practices to Ensure SAP Availability. Software for Innovative Open Solutions. Abstract. What is high availability?

Red Hat Global File System for scale-out web services

Database High Availability. Solutions 2010

Veritas Cluster Server from Symantec

Enterprise Linux Business Continuity Solutions for Critical Applications

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

Quorum DR Report. Top 4 Types of Disasters: 55% Hardware Failure 22% Human Error 18% Software Failure 5% Natural Disasters

Implementing the SUSE Linux Enterprise High Availability Extension on System z Mike Friesenegger

SAN Conceptual and Design Basics

Load Balancing and Clustering in EPiServer

Scalability of web applications. CSCI 470: Web Science Keith Vertanen

IBM Virtualization Engine TS7700 GRID Solutions for Business Continuity

Synology High Availability (SHA)

Clustering in Parallels Virtuozzo-Based Systems

IBM Security QRadar SIEM Version High Availability Guide IBM

Cisco Active Network Abstraction Gateway High Availability Solution

SUSE Linux Enterprise High Availability Extension. Piotr Szewczuk konsultant

How To Run A Web Farm On Linux (Ahem) On A Single Computer (For Free) On Your Computer (With A Freebie) On An Ipv4 (For Cheap) Or Ipv2 (For A Free) (For

HP Serviceguard Cluster Configuration for Partitioned Systems

PostgreSQL Clustering with Red Hat Cluster Suite

Registry Update. John Dickinson. Nominet UK

Building Reliable, Scalable AR System Solutions. High-Availability. White Paper

SAP HANA Operation Expert Summit BUILD - High Availability & Disaster Recovery

STORAGE CENTER. The Industry s Only SAN with Automated Tiered Storage STORAGE CENTER

Delivering High Availability Solutions with Red Hat Cluster Suite

SysAid Cloud Architecture Including Security and Disaster Recovery Plan

SQL Server 2012/2014 AlwaysOn Availability Group

TSMCluster. High Availability for TSM Server on UNIX. April Version 5.2 Subtle distinction

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

TIBCO StreamBase High Availability Deploy Mission-Critical TIBCO StreamBase Applications in a Fault Tolerant Configuration

A High Availability Clusters Model Combined with Load Balancing and Shared Storage Technologies for Web Servers

IBM Storwize Rapid Application Storage solutions

Veritas InfoScale Availability

Symantec Cluster Server powered by Veritas

Big data management with IBM General Parallel File System

Integration of PRIMECLUSTER and Mission- Critical IA Server PRIMEQUEST

Open-Xchange Server High availability Daniel Halbe, Holger Achtziger

HAOSCAR 2.0: an open source HA-enabling framework for mission critical systems

Data Protection with IBM TotalStorage NAS and NSI Double- Take Data Replication Software

Availability and Disaster Recovery: Basic Principles

The ECHO - Avaya Connection ECHO, and how it interacts with Avaya s Telephony Solutions

Cloud Computing Disaster Recovery (DR)

EXTENDED ORACLE RAC with EMC VPLEX Metro

WebGUI Load Balancing

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

Configuring Failover

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Solution for private cloud computing

VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014

Business Continuity: Choosing the Right Technology Solution

29/07/2010. Copyright 2010 Hewlett-Packard Development Company, L.P.

Astaro Deployment Guide High Availability Options Clustering and Hot Standby

Transcription:

IBM GLOBAL SERVICES P03 Linux on POWER OSS High Availability Alan Robertson IBM Linux Technology Center alanr@unix.sh pse r i e s a nd Li nux Sept. 20-24, 2004 Las Vegas, Nevada

Agenda HA On Linux HA Basics The Linux-HA project DRBD disk replication The Linux Virtual Server Project A Short Linux-HA tutorial

The Desire for HA Systems Who wants low- availability systems? Why are so few systems High- Availability?

Cost Barriers to HA Systems Complexity

What would be the result? Increased hardware, software, services opportunities Drastically multiplying customers multiplies experience - products mature faster (especially in OSS model) OSS developers grow with customers Low-end customers are less prone to sue than high-end customers OSS Clustering is a disruptive technology

IBM's HA Strategy for Linux IBM provides and supports two HA solutions on Linux: Tivoli System Automation for Linux (closed source) New product based on HACMP and mainframe technologies Sophisticated rule-based configuration Linux-HA (open source) Available on Linux for ~ 5 years Many thousands of installations Active Open Community effort coordinated by IBM Freely available

High-Availability Means Redundancy All HA products take the same basic approach: configure redundancy to eliminate single points of failure reduces cost of both planned and unplanned downtime

Workload Failover on Failure All HA products take basically the same approach: Configure redundancy to eliminate single points of failure Automatically detect failures in application software, server hardware, storage or networks Manage failover to backup/alternate components

IP Address Failover

What is the Linux-HA project? Provides basic HA failover capabilities Supports pseries especially well (STONITH) Very simple to install, manage, understand Active, open development community Thousands of production sites worldwide In production for mission-critical sites > 3 years Wide variety of industries, applications Shipped with most Linux distributions Supports DB2, MQ-Series, ServeRAID, etc. Easily extended to support nearly anything

IBM/Linux-HA Successes Emageon medical imaging services (with xseries hardware, and Tivoli SW) ADC - telco provisioning manager product (w/ x330/335) Oxford county schools 300+ clusters for authentication, file serving, web proxy, filtering, etc. Contraloria General de la Republica (Colombian government) Incredimail bases their mail service on Linux-HA on IBM hardware The Rose F. Kennedy Center for Research in Mental Retardation and Developmental Disabilities uses Linux-HA on x340s with 2 TB of storage Bavarian Radio Station (Munich) used Linux-HA and xseries for coverage of 2002 Olympics in Salt Lake City Citysavings Bank in Munich (infrastructure) University of Toledo (Ohio, US) (online education distance learning ) Autostrada (Italy) (currently installing 230 clusters - custom applications)

Linux-HA Successes The Weather Channel (weather.com) Sony (manufacturing) Intuit (firewalls) ISO New England manages the New England power grid using 20 Linux-HA clusters See also: http://linux-ha.org/heartbeat/users.html and http://linuxha.trick.ca/successstories

User Comments "Setting up heartbeat was cake and pie (piece of cake, easy as pie). I can't believe how easy it is to set up and use. Oh yeah, it works perfectly too." "... it works very well indeed! Fast and simple, it is a far cry from the lumbering cluster software I was using under [my old OS] "...I aggressively removed the power cables of an inproduction x330 running DB2. Then I watched customer's smile getting bigger and bigger seeing DB2 and all resources becoming available in the second machine." (IBM Brazil) "... just yesterday I got a chance to test it when someone pulled the power cable from the active box - worked flawlessly."

Designing A Swing :-) As Proposed by Marketing As Architected Beta Version

Designing a Swing (cont'd) Production Version As Installed What The Customer Really Wanted

Current Developments Release 2 due out 4Q2004 Integrated resource monitoring Larger clusters (16-32 nodes) Implementation of SAF AIS APIs

DRBD RAID over the LAN Block-device (filesystem) level replication Clever synchronization methods make resyncs faster, decrease latency, preserve integrity Useful for both HA and Disaster Recovery NO single point of failure Extremely cost-effective $200 (max) instead of $30,000 (min) ($USD) Probably not suitable for some high-end writeintensive applications Supportable by IBM Support Line

LVS The Linux Virtual Server Project LVS is the standard Linux Load Balancer Called "ipvs" in the standard Linux kernel Stable, fast, flexible Especially suitable for large "server farms"

LVS In Action

"Plays Well With Others" Each of these independent services can work together to scale to large systems All single points of failure can be eliminated High-Availability, Load Balancing work together nicely

Linux Virtual Server, Linux-HA and DRBD

HA Clustering Tutorial What is HA Clustering? What can you achieve with HA Clustering? How is HA Clustering different from Disaster Recovery? Basic Principles of HA Clustering (SPOFs 3 R's ) Redundant Communications Fencing Data Sharing Configurations Security Considerations

What Can HA Clustering Do For You? It cannot achieve 100% availability nothing can. HA Clustering designed to recover from single faults It can make your outages very short From about a second to a few minutes It is like a Magician's (Illusionist's) trick: When it goes well, the hand is faster than the eye When it goes not-so-well, it can be reasonably visible A good HA clustering system adds a 9 to your base availability 99->99.9, 99.9->99.99, 99.99->99.999, etc. Complexity is the enemy of reliability!

Lies, Damn Lies, and Statistics Availability percentage Yearly downtime 100% 0 99.99999% 3s 99.9999% 30 sec 99.999% 5 min 99.99% 52 min 99.9% 9 hr 99% 3.5 day

How is HA Clustering Different from Disaster Recovery? HA: Failover is cheap Failover times measured in seconds Reliable inter-node communication DR: Failover is expensive Failover times often measured in hours Unreliable inter-node communication assumed

Non-Obvious SPOFs Replication links are rarely single points of failure The system may fail if another failure occurs Some disk controllers have SPOFs inside them which aren't obvious without schematics Redundant links buried in the same wire run have a common SPOF Non-Obvious SPOFs can require deep expertise to spot

The Three R's of High-Availability Redundancy Redundancy Redundancy If this sounds redundant, that's probably appropriate... Most SPOFs are eliminated by redundancy HA Clustering is a good way of providing and managing redundancy

Redundant Communications Intra-cluster communication is critical to HA system operation Most HA clustering systems provide mechanisms for redundant internal communication for heartbeats, etc. External communications is usually essential to provision of service Exernal communication redundancy is usually accomplished through routing tricks Having an expert in BGP or OSPF is a help

Fencing Guarantees resource integrity in the case of certain difficult cases Three Common Methods: FiberChannel Switch lockouts SCSI Reserve/Release (difficult to make reliable) Self-Fencing (like IBM ServeRAID) STONITH Shoot The Other Node In The Head Linux-HA currently supports the last two models

Data Sharing - None Strangely enough, some HA configurations don't need any formal disk data sharing Firewalls Load Balancers (Caching) Proxy Servers Static web servers whose content is copied from a single source

Data Sharing Replication Some applications provide their own replication DNS, DHCP, LDAP, DB2, etc. Linux has excellent disk replication methods available DRBD is my favorite DRBD-based HA clusters are shockingly cheap Some environments can live with less precise replication methods rsync, etc. Generally does not support parallel access Fencing usually required EXTREMELY cost effective

Data Sharing ServeRAID IBM ServeRAID disk is self-fencing This helps integrity in failover environments This makes cluster filesystems, etc. impossible No Oracle RAC, no GPFS, etc. ServeRAID failover requires a script to perform volume handover Linux-HA provides such a script in open source Linux-HA is ServerProven with ServeRAID

Data Sharing FiberChannel The most classic data sharing mechanism Allows for failover mode Allows for true parallel access Oracle RAC, Cluster filesystems, etc. Fencing always required with FiberChannel

Data Sharing Back-End Network Attached Storage can act as a data sharing method Existing Back End databases can also act as a data sharing mechanism Both make reliable and redundant data sharing Somebody Else's Problem (SEP). If they did a good job, you can benefit from them. Beware SPOFs in your local network

Security Considerations A cluster is a computer whose backplane is the Internet If this isn't frightening, you don't understand... You may think you have a secure cluster network You're probably mistaken now You will be in the future

Secure Networks are Difficult Because... Security is not often well-understood by admins Security is well-understood by black hats Network security is easy to breach accidentally Users bypass it Hardware installers don't fully understand it Most security breaches come from trusted staff Staff turnover is often a big issue Virus/Worm/P2P technologies will create new holes especially for Windows machines

IBM GLOBAL SERVICES Security Advice Good HA software should be designed to assume insecure networks Not all HA software assumes insecure networks Good HA installation architects use dedicated (secure?) networks for intra-cluster HA communication Crossover cables are reasonably secure all else is suspect

Other HA-related Technologies Journalling filesytems EXT3 (can also journal data) ReiserFS JFS XFS Logical Volume Managers LVM EVMS LVM-2

Volume Management Capabilities Online volume growth Online volume movement Volume Snapshots

Sample DRBD / Linux-HA Configuration

Basic ha.cf configuration file keepalive 200ms warntime 500ms deadtime 1s # One second failure detection initdeadtime 30 auto_failback off baud 38400 serial /dev/ttys0 mcast eth0 225.0.0.1 694 1 0 bcast eth1 stonith_host node1 rps10 /dev/ttys1 node2 0 stonith_host node2 rps10 /dev/ttys1 node1 0 node node1 node2 ping 10.10.10.254 respawn hacluster /usr/lib/heartbeat/ipfail

haresources and authkeys example haresources node1 10.10.10.11 datadisk::drbd0 \ Filesystem::/dev/nb0::/ha/apache::reis erfs apache node2 10.10.10.12 datadisk::drbd1 \ Filesystem::/dev/nb1::/ha/mysql::ext3 mysql authkeys must be root:root, mode 0600! auth 1 1 sha1 MySeCrritPassword

drbd.conf (pre-0.7) Resource drbd0 { protocol = C fsckcmd = /bin/true inittimeout=0 disk { do-panic disk-size = 4194304k } net { sync-min = 40M sync-max = 90M tl-size = 5000 timeout = 60 connect-int = 10 ping-int = 1 } on node1 { device = /dev/nb0 disk = /dev/hdc1 address = 192.168.1.1 port = 7788 } on node2 { device = /dev/nb0 disk = /dev/hdc1 address = 192.168.1.2 port = 7788 } } resource drbd1 {...

Conclusions Both OSS and Closed HA options are available on Linux Open Source HA is alive and well on Linux Open Source HA will have very significant volumes Opportunities abound for using it IBM has a leadership position in HA on Linux

References! " # $ % %'& (! $ *) %,+.- $ ( / 0 10 2 2 3 465 0 2

Legal Statements IBM is a trademark of International Business Machines Corporation. Linux is a registered trademark of Linus Torvalds. Other company, product, and service names may be trademarks or service marks of others. This work represents the views of the author and does not necessarily reflect the views of the IBM Corporation.