SHARE Orlando August 2011 SuSE Linux High Availability Extensions Hands-on Workshop Richard F. Lewis IBM Corp rflewis@us.ibm.com
Trademarks The following are trademarks of the International Business Machines Corporation in the United States and/or other countries. APPN* CICS* DB2* DFSMSMVS DFSMS/VM* DirMaint Distributed Relational Database Architecture* DRDA* e-business logo* ECKD Enterprise Storage Server* Enterprise Systems Architecure/390* ESCON* FICON* GDDM* * Registered trademarks of IBM Corporation GDPS* Geographically Dispersed Parallel Sysplex HiperSockets HyperSwap IBM* IBM eserver IBM logo* IBMlink Language Environment* MQSeries* Multiprise* On demand business logo OS/390* Parallel Sysplex* Performance Toolkit for VM POWER5 The following are trademarks or registered trademarks of other companies. POWERPC* PR/SM Processor Resource/Systems Manager QMF RACF* Resource Link RMF RS/6000* S/390* S/390 Parallel Enterprise Server System 370 System 390* System z System z9 Tivoli* Tivoli Storage Manager TotalStorage* Virtual Image Facility Virtualization Engine VisualAge* VM/ESA* VSE/ESA VTAM* WebSphere* z/architecture z/os* z/vm* z/vse zseries* zseries Entry License Charge Java and all Java-related trademarks and logos are trademarks of Sun Microsystems, Inc., in the United States and other countries Linux is a trademark of Linus Torvalds in the united States and other countries.. UNIX is a registered trademark of The Open Group in the United States and other countries. SuSE is a registered trademark of SuSE Linux AG, a Novell company * All other products may be trademarks or registered trademarks of their respective companies. Notes: Performance is in Internal Throughput Rate (ITR) ratio based on measurements and projections using standard IBM benchmarks in a controlled environment. The actual throughput that any user will experience will vary depending upon considerations such as the amount of multiprogramming in the user's job stream, the I/O configuration, the storage configuration, and the workload processed. Therefore, no assurance can be given that an individual user will achieve throughput improvements equivalent to the performance ratios stated here. IBM hardware products are manufactured from new parts, or new and serviceable used parts. Regardless, our warranty terms apply. All customer examples cited or described in this presentation are presented as illustrations of the manner in which some customers have used IBM products and the results they may have achieved. Actual environmental costs and performance characteristics will vary depending on individual customer configurations and conditions. This publication was produced in the United States. IBM may not offer the products, services or features discussed in this document in other countries, and the information may be subject to change without notice. Consult your local IBM business contact for information on the product or services available in your area. All statements regarding IBM's future direction and intent are subject to change or withdrawal without notice, and represent goals and objectives only. Information about non-ibm products is obtained from the manufacturers of those products or their published announcements. IBM has not tested those products and cannot confirm the performance, compatibility, or any other claims related to non-ibm products. Questions on the capabilities of non-ibm products should be addressed to the suppliers of those products. Prices subject to change without notice. Contact your IBM representative or Business Partner for the most current pricing in your geography.
What is High Availability A system design approach Implementation that removes or reduces single points of failure Typically involves redundancy Achieves an agreed upon level of operational performance Addresses planned and unplanned outages A comprehensive set of processes and controls Adequate testing Change control and documentation Monitoring and alerts Many times involves computer clusters Multiple computers or nodes Cluster management software to make nodes work in concert Applications or resources run on nodes
What is SuSE Linux Enterprise High Availability Extension An affordable, integrated suite of robust open source clustering technologies that enables you to implement highly available physical and virtual Linux clusters Used with SuSE Linux Enterprise Server, it helps you maintain business continuity, protect data integrity, and reduce unplanned downtime for your mission critical Linux workloads System z Bundled with base SuSE Linux Enterprise Server at no additional charge Support level inherited by base SuSE Linux Enterprise Server Delivered via separate.iso image Based upon open source technologies Pacemaker cluster resource manager designed to achieve maximum availability of cluster resources by detecting and recovering from node and resource-level failures Corosync Cluster infrastructure for application high availability maintaining redundant copy of state on every server (distributed state machine) OpenAIS plugin using Corosync to implement failover and restarting of resources
Pacemaker Clustering Supports nearly any redundancy configuration Active / Passive fully redundant nodes where backup is brought online when primary fails Active / Active all nodes processing, traffic routed to remaining nodes when one fails N+1 single extra node brought online to take over role of a node that failed, usually in cases of cluster with multiple different services running N+M same as above except multiple standby servers N to 1 a standby node becomes active temporarily until original node restored at which point services failed back to original node N to N combination of Active / Active and N+M clusters Shared Failover where several active / passive clusters are combined and share a common backup node
Terminology Node system that is part of cluster and a potential host for resources Primitive resource a single entity managed by the cluster Virtual IP Address Web Server File system Distributed lock manager STONITH (Shoot The Other Node In The Head) a method of fencing to ensure integrity Collections of resources Group multiple related primitive resources E.g. web server plus a VIPA Clone primitive resource or Group instantiated on all N nodes of cluster simultaneously Distributed lock manager daemon for cluster file system Master-slave superset of Clone where instances of primitive resource or group can be in one of two states (state definition is specific to resource) Constraints policies that define resource location, start order and collocation with other resources
Lab Environment Each team will have two SuSE SLES 11 SP1 systems running in z/vm virtual machines on the zenterprise located in the Technology Exchange LINLABnn (where nn is the team number on your tent card) LINLABnn+15 (where nn+15 is your team number plus 15) We will implement the following resources snipl STONITH primitive resource that allows for activate and deactivate of an LPAR or z/vm virtual machine OCFS2 cluster file system each node will share a single ECKD DASD that will be managed by the OCFS2 software and mounted r/w on each node via the cluster management software Web Server with VIPA a simple Apache server that can run on either node. It will obtain it s configuration and content from the OCFS2 cluster file system. We will not be running active/active or distributing traffic across two instances, although we could easily do that
Lab Environment Cont. z/vm LPAR Cluster Management Network 10.10.10.1 10.10.10.16 LINLAB01 LINLAB16 9.82.56.91 9.82.56.61 9.82.56.106 9.82.56.1 tcpip LINLABnn LINLABnn+15 9.82.56.(nn+90) 9.82.56.(nn+90+15) 9.82.56.(nn+60)
Resources Implementing the SuSE Linux Enterprise High Availability Extension on System z Monday August 8, 2011 11:00 AM -12:00 PM Room Ocean 6 Walt Disney World Dophin Resort, Speaker: Michael Friesenegger (Novell, Inc.) SuSE Linux Enterprise High Availability Extension http://www.novell.com/products/highavailability http://www.novell.com/documentation/sle_ha/pdfdoc/book_sleha/book_sleha.pdf Open Source Projects Pacemaker http://www.clusterlabs.org OpenAIS http://www.openais.org Corosync http://www.corosync.org Linux HA project http://www.linux-ha.org