Pacemaker. A Scalable Cluster Resource Manager for Highly Available Services. Owen Le Blanc. I T Services University of Manchester



Similar documents
Overview: Clustering MySQL with DRBD & Pacemaker

OpenLDAP in High Availability Environments

PRM and DRBD tutorial. Yves Trudeau October 2012

Implementing the SUSE Linux Enterprise High Availability Extension on System z

High-Availability Using Open Source Software

Highly Available NFS Storage with DRBD and Pacemaker

ZCP trunk (build 50384) Zarafa Collaboration Platform. Zarafa HA Manual

Open Source High Availability Writing Resource Agents for your own services. Lars Marowsky-Brée Team Lead SUSE Labs

Agenda Zabawę 2 czas zacząć Co to jest corosync? Co to jest pacemaker? Zastosowania Podsumowanie

SUSE Linux Enterprise High Availability Extension. Piotr Szewczuk konsultant

Zen and The Art of High-Availability Clustering Lars Marowsky-Brée

SAP HANA System Replication on SUSE Linux Enterprise Server for SAP Applications

Administration Guide. SUSE Linux Enterprise High Availability Extension 12 SP1

High Availability Low Dollar Clustered Storage

HighlyavailableiSCSIstoragewith DRBDandPacemaker

Building a High-Availability PostgreSQL Cluster

How To Use Suse Linux High Availability Extension 11 Sp (Suse Linux)

SAP HANA System Replication on SLES for SAP Applications. 11 SP3

NOCTUA by init.at THE FLEXIBLE MONITORING WEB FRONTEND

Open Source High Availability on Linux

Best Current Practices for SUSE Linux Enterprise High Availability 12

High Availability Solutions for MySQL. Lenz Grimmer DrupalCon 2008, Szeged, Hungary

PVFS High Availability Clustering using Heartbeat 2.0

Pacemaker Remote. Scaling High Availablity Clusters. David Vossel

High Availability Storage

MySQL High Availability Solutions. Lenz Grimmer OpenSQL Camp St. Augustin Germany

Implementing the SUSE Linux Enterprise High Availability Extension on System z Mike Friesenegger

An Efficient Failover Enabling Mechanism in OpenStack

Clusters from Scratch - Apache, DRBD and GFS2

Peter Ruissen Marju Jalloh

Red Hat Enterprise Linux 7 High Availability Add-On Administration. Configuring and Managing the High Availability Add-On

Achieving High Availability on Linux for System z with Linux-HA Release 2

CentOS Cluster Server. Ryan Matteson

Chapter 7. Using Hadoop Cluster and MapReduce

Introduction to Computer Administration. System Administration

High Availability Solutions for the MariaDB and MySQL Database

SuSE Linux High Availability Extensions Hands-on Workshop

A new Cluster Resource Manager for heartbeat

Red Hat Enterprise Linux 7 High Availability Add-On Overview

SUSE Linux Enterprise Server

SYNNEFO: A COMPLETE CLOUD PLATFORM OVER GOOGLE GANETI WITH OPENSTACK APIs VANGELIS KOUKIS, TECH LEAD, SYNNEFO

High availability infrastructures for TYPO3 Websites

Lustre failover experience

IBRIX Fusion 3.1 Release Notes

SkySQL Data Suite. A New Open Source Approach to MySQL Distributed Systems. Serge Frezefond V

High Availability and Backup Strategies for the Lustre MDS Server

Architecture and Mode of Operation

Parallels Virtuozzo Containers 4.7 for Linux Readme

High Availability Low Dollar Load Balancing

Linux High Availability

Acronis Backup & Recovery 10 Server for Linux. Quick Start Guide

High Availability Solutions with MySQL

Red Hat Cluster Suite

SanDisk ION Accelerator High Availability

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

Pro Puppet. Jeffrey McCune. James TurnbuII. Apress* m in

PARALLELS SERVER BARE METAL 5.0 README

Unit 10 : An Introduction to Linux OS

FleSSR Project: Installing Eucalyptus Open Source Cloud Solution at Oxford e- Research Centre

CRM CLI (command line interface) tool

High Availability with PostgreSQL and Pacemaker

Week Overview. Installing Linux Linux on your Desktop Virtualization Basic Linux system administration

Scalable Linux Clusters with LVS

The deployment of OHMS TM. in private cloud

A SURVEY OF POPULAR CLUSTERING TECHNOLOGIES

Zend Server 4.0 Beta 2 Release Announcement What s new in Zend Server 4.0 Beta 2 Updates and Improvements Resolved Issues Installation Issues

PARALLELS SERVER 4 BARE METAL README

Work Environment. David Tur HPC Expert. HPC Users Training September, 18th 2015

Acronis Backup & Recovery 10 Server for Linux. Installation Guide

INUVIKA TECHNICAL GUIDE

IdP Clustering. You want to prevent service outages. High Availability and Load Balancing. Possible problems: HW failures

Measurably reducing risk through collaboration, consensus & practical security management CIS Security Benchmarks 1

Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF

Cloud UT. Pay-as-you-go computing explained

Cluster Configuration Manual Cluster configuration on Database Servers

Open-Xchange Server High availability Daniel Halbe, Holger Achtziger

TUT8155 Best Practices: Linux High Availability with VMware Virtual Machines

NAS Storage needs to be purchased; Will not be offered IAAS - Utility SMTP Per SMTP account Per server

Architecture and Mode of Operation

Analyzing Network Servers. Disk Space Utilization Analysis. DiskBoss - Data Management Solution

Acronis Backup & Recovery 10 Server for Linux. Installation Guide

Delivering High Availability Solutions with Red Hat Cluster Suite

Acronis Backup & Recovery 10 Server for Linux. Update 5. Installation Guide

Resource Manager Corosync/DRBD HA Installation Guide

short introduction to linux high availability description of problem and solution possibilities linux tools

Delivering High Availability Solutions with Red Hat Cluster Suite

Sistemi Operativi e Reti. Clusters

How To Install Eucalyptus (Cont'D) On A Cloud) On An Ubuntu Or Linux (Contd) Or A Windows 7 (Cont') (Cont'T) (Bsd) (Dll) (Amd)

KonyOne Server Installer - Linux Release Notes

Parallels Cloud Server 6.0 Readme

OpenLDAP Configuration and Tuning in the Enterprise

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Highly Available NFS Storage with DRBD and Pacemaker

Red Hat Enterprise Linux 7 High Availability Add-On Overview

How To Manage A Network Backup On An Arkeia Network Suite Version 8

Transcription:

Pacemaker A Scalable Cluster Resource Manager for Highly Available Services Owen Le Blanc I T Services University of Manchester

C V 1980, U of Manchester since 1985 CAI, CDC Cyber 170/730, Prime 9955 HP 300, HP 700, Xenix, FreeBSD, NetBSD, AIX, IRIX, Minix Linux 1991, mostly Debian

Outline What is pacemaker? How do you get it? How does it work? Where did it come from? Documentation How do you configure it? How do you manage it? Real examples

What is Pacemaker? If a cluster is a group of machines of any size, and If a resource is any application or process that can be started by a script, Then pacemaker allows you to manage that resource across that cluster in a way that makes high availability possible

Examples IPVS Load Balancer DRBD network disk DB server Web server

Features Does not require particular storage type. Quorate and resource-driven clusters Supports various types of redundancy: Active/Passive, Active/Active, Master/Slave, Multiple Can manage any resource controlled by shell scripts. Supports applications which must run on multiple machines.

More Features Detects and recovers from machine and application failures. Scriptable shell (command line) Graphical Interface(s) Fencing Ordering, location constraints

Active/Passive Machine 1 Machine 2 Service X Service Y Pacemaker

Multiple Services M 1 M 2 M 3 M 4 W W X X Y Y Y Z Pacemaker

Distributions Fedora, RHEL, Centos OpenSUSE, SLES, SLES HA Debian, Ubuntu Build from source (BSD, MacOS X, etc.)

resource scripts lrmd pengine cib crmd stonithd cluster abstraction layer openais / corosync

Internals, explained LMRD Local Resource Mgmt Daemon CIB Cluster Information Base CRMD Cluster Resource Mgmt Daemon Pengine Policy Engine STONITHD Fencing subsystem Corosync Communications, including cluster membership

Parts History Heartbeat Old Linux-HA cluster manager Alan Robertson, 1998 2007 OpenAIS Application Interface Spec 2005 ongoing; Corosync Cluster Engine Pacemaker Cluster Resource Manager Lars Marowsky-Brée, 2003 ongoing Other bits E.g., DRBD support from LINBIT Philipp Reisner, Lars Ellenberg, 1998 ongoing

Old Heartbeat Maximum of 2 nodes Highly coupled design and implementation Simplistic group-based resource model Inability to detect and recover from resource-level failures

New Pacemaker Much more configurable Much more flexible Much more powerful Much easier to control You can update pacemaker without interrupting the managed service(s)

Documentation www.clusterlabs.org/wiki Clusterbau: Hochverfügbarkeit mit pacemaker, OpenAIS, heartbeat und LVS by Michael Schwartzkopff, O'Reilly A Practical Guide to XEN High Availability by Sander von Vugt www.drbd.org/users-guide/

Configuration CRM shell Command line File based configuration GUI Easier use, syntax Not suited to copying whole configuration

Example Configuration primitive res Apache ocf:heartbeat:apache \ meta is managed="true" \ operations $id="res apache operations" \ op monitor interval="10" timeout="20s" \ params \ configfile="/etc/apache2/apache2.conf" \ httpd="/usr/sbin/apache2"

Mysql Cluster

crm status (1) Last updated: Fri Jan 21 11:25:59 2011 Stack: openais Current DC: punnet - partition with quorum Version: 1.0.9-unknown 2 Nodes configured, 2 expected votes 3 Resources configured.

crm status (2) Online: [ pannier punnet ] Resource Group: grp-ip-fs-msyql res-wmysql3-ip (ocf::heartbeat:ipaddr2): Started punnet res-wmysql3-fs (ocf::heartbeat:filesystem): Started punnet res-mysql (ocf::heartbeat:mysql): Started punnet Master/Slave Set: ms-drbd-r0 Masters: [ punnet ] Slaves: [ pannier ] Clone Set: clo-ping Started: [ punnet pannier ]

Experiences (Heartbeat) 0.4.6 (2000) solid, but very limited Later versions became less stable, as our usage moved away from what Alan planned Not easy to manage (down one node, update software, etc.) Good at ensuring at least one node up Poor at ensuring at most one node up

Experiences (corosync) The corosync layer and basic pacemaker seems rock solid totem token configuration wrong in Debian Complexity: using multicast addresses Our problematic networking Once working, it goes on Need to enable encryption for security Major improvements over heartbeat

Experiences (resources) 75 agents (at least) in Debian install Some duplication Varying quality Includes Dummy scripts Combining and tuning resources is an art Many ways to do things, sometimes (apparently) not quite equivalent Schwartzkopff's book

Experiences (STONITH) STONITH is not configurable enough Where possible, we use application level fencing For some reason, we have problems with false positives See http://www.ourobengr.com/ha

Experiences (new RA script) Dummy resource very helpful Debugging difficult For some errors, logging is helpful Insert debugging commands into script

Managment CRM crm_gui