short introduction to linux high availability description of problem and solution possibilities linux tools

Similar documents
High Availability and Backup Strategies for the Lustre MDS Server

High Availability Solutions for the MariaDB and MySQL Database

High availability infrastructures for TYPO3 Websites

LISTSERV in a High-Availability Environment DRAFT Revised

Multiple Public IPs (virtual service IPs) are supported either to cover multiple network segments or to increase network performance.

Peter Ruissen Marju Jalloh

High Availability Solutions for MySQL. Lenz Grimmer DrupalCon 2008, Szeged, Hungary

High Availability & Disaster Recovery Development Project. Concepts, Design and Implementation

Eloquence Training What s new in Eloquence B.08.00

This document may be freely distributed provided that it is not modified and that full credit is given to the original author.

HA clustering made simple with OpenVZ

MySQL High Availability Solutions. Lenz Grimmer OpenSQL Camp St. Augustin Germany

Step-by-Step Guide. to configure Open-E DSS V7 Active-Active iscsi Failover on Intel Server Systems R2224GZ4GC4. Software Version: DSS ver. 7.

IBM Security QRadar SIEM Version High Availability Guide IBM

Open-Xchange Server High Availability

Synology High Availability (SHA)

Implementing SAN & NAS with Linux by Mark Manoukian & Roy Koh

Delivering High Availability Solutions with Red Hat Cluster Suite

Network Attached Storage. Jinfeng Yang Oct/19/2015

Tushar Joshi Turtle Networks Ltd

Cray DVS: Data Virtualization Service

High Availability Storage

Synology High Availability (SHA)

Delivering High Availability Solutions with Red Hat Cluster Suite

Integrated Application and Data Protection. NEC ExpressCluster White Paper

IdP Clustering. You want to prevent service outages. High Availability and Load Balancing. Possible problems: HW failures

Red Hat Cluster Suite

Synology High Availability (SHA): An Introduction Synology Inc.

ORACLE DATABASE HIGH AVAILABILITY STRATEGY, ARCHITECTURE AND SOLUTIONS

Exam : EE : F5 BIG-IP V9 Local traffic Management. Title. Ver :

IBRIX Fusion 3.1 Release Notes

HAOSCAR 2.0: an open source HA-enabling framework for mission critical systems

Astaro Deployment Guide High Availability Options Clustering and Hot Standby

Deployment Guide. How to prepare your environment for an OnApp Cloud deployment.

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

One Solution for Real-Time Data protection, Disaster Recovery & Migration

Network File System (NFS) Pradipta De

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Integration of PRIMECLUSTER and Mission- Critical IA Server PRIMEQUEST

PVFS High Availability Clustering using Heartbeat 2.0

Implementing the SUSE Linux Enterprise High Availability Extension on System z

Preparation Guide. How to prepare your environment for an OnApp Cloud v3.0 (beta) deployment.

High Availability. Vyatta System

Fault Tolerance in the Internet: Servers and Routers

MySQL Reference Architectures for Massively Scalable Web Infrastructure

Linux High Availability

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Introduction 1 Performance on Hosted Server 1. Benchmarks 2. System Requirements 7 Load Balancing 7

Step-by-Step Guide to Open-E DSS V7 Active-Active Load Balanced iscsi HA Cluster

10GbE Ethernet for Transfer Speeds of Over 1150MB/s

HAVmS: Highly Available Virtual machine Computer System Fault Tolerant with Automatic Failback and close to zero downtime

Datacenter Operating Systems

EXPLORING LINUX KERNEL: THE EASY WAY!

PRM and DRBD tutorial. Yves Trudeau October 2012

Mirror File System for Cloud Computing

SAP Applications Made High Available on SUSE Linux Enterprise Server 10

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

Creating Web Farms with Linux (Linux High Availability and Scalability)

Data Replication INSTALATION GUIDE. Open-E Data Storage Server (DSS ) Integrated Data Replication reduces business downtime.

Centralize AIX LPAR and Server Management With NIM

High Availability. Vyatta System

The functionality and advantages of a high-availability file server system

High Availability Low Dollar Clustered Storage

CERN local High Availability solutions and experiences. Thorsten Kleinwort CERN IT/FIO WLCG Tier 2 workshop CERN

High Availability with DRBD & Heartbeat. Chris Barber

Backup Software Technology

NEC Corporation of America Intro to High Availability / Fault Tolerant Solutions

Advanced Linux System Administration on Red Hat

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

PES. Ermis service for DNS Load Balancer configuration. HEPiX Fall Aris Angelogiannopoulos, CERN IT-PES/PS Ignacio Reguero, CERN IT-PES/PS

Cisco Active Network Abstraction Gateway High Availability Solution

POWER ALL GLOBAL FILE SYSTEM (PGFS)

Migration and Building of Data Centers in IBM SoftLayer with the RackWare Management Module

Cloud Implementation using OpenNebula

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

EXPRESSCLUSTER X for Windows Quick Start Guide for Microsoft SQL Server Version 1

1 Storage Devices Summary

How To Run A Web Farm On Linux (Ahem) On A Single Computer (For Free) On Your Computer (With A Freebie) On An Ipv4 (For Cheap) Or Ipv2 (For A Free) (For

Oracle Cluster File System on Linux Version 2. Kurt Hackel Señor Software Developer Oracle Corporation

Eliminate SQL Server Downtime Even for maintenance

Building Storage Service in a Private Cloud

SUSE Linux Enterprise High Availability Extension. Piotr Szewczuk konsultant

Disaster Recovery Checklist Disaster Recovery Plan for <System One>

Contents. SnapComms Data Protection Recommendations

760 Veterans Circle, Warminster, PA Technical Proposal. Submitted by: ACT/Technico 760 Veterans Circle Warminster, PA

A Filesystem Layer Data Replication Method for Cloud Computing

High Availability and Load Balancing Cluster for Linux Terminal Services

A High Availability Clusters Model Combined with Load Balancing and Shared Storage Technologies for Web Servers

Highly Available NFS Storage with DRBD and Pacemaker

Transcription:

High Availability with Linux / Hepix October 2004 Karin Miers 1 High Availability with Linux Using DRBD and Heartbeat short introduction to linux high availability description of problem and solution possibilities linux tools heartbeat drbd mon implementation at GSI experiences during test operation

High Availability with Linux / Hepix October 2004 Karin Miers 2 High Availability reduction of downtime of critical services (name service, file service...) Hot Standby - automatical failover Cold Standby - exchange of hardware reliable / special hardware components (shared storage, redundant power supply...) special software, commercial and Open Source (FailSafe, LifeKeeper/Steeleye Inc., heartbeat...)

High Availability with Linux / Hepix October 2004 Karin Miers 3 Problem central NFS service and administration: all linux clients mount the directory /usr/local from one central server central administration including scripts, config files... clients: lxg0??? /usr/local/ NFS lxb0?? NFS /usr/local/ lxfs01 nfs server /usr/local/...gsimgr NFS lxdv?? /usr/local/

High Availability with Linux / Hepix October 2004 Karin Miers 4 In Case of Failure... if the central nfs server is down: no access of /usr/local most clients cannot work anymore administration tasks are delayed or hang after work continues: stale nfs mounts

Solution hot-standby / shared nothing: 2 identical servers with individual storage (instead of shared storage) NFS-Server A /usr/local...gsimgr/ NFS-Server NFS-Server B /usr/local/...gsimgr ---> advantage: /usr/local exists twice NFS Client 1 /usr/local/ NFS NFS Client 3 /usr/local/ USW. ---> problems: synchronisation of file system information about nfs mounts Client 2 /usr/local/ High Availability with Linux / Hepix October 2004 Karin Miers 5

High Availability with Linux / Hepix October 2004 Karin Miers 6 heartbeat drbd mon Linux Tools communication between the two nodes starts the services synchronisation of the file system (/usr/local) system monitoring all tools are OpenSource, GPL or similar

High Availability with Linux / Hepix October 2004 Karin Miers 7 Heartbeat how does the slave server knows that the master node is dead? both nodes are connected by ethernet or serial line both nodes exchange pings in regular time intervals if all pings are missing for a certain dead time the slave assumes that the master failed slave takes over the IP and starts the service

High Availability with Linux / Hepix October 2004 Karin Miers 8 Heartbeat server 1 hello -> <- hello hello -> <- hello server 2 normal operation: server 2 - master for server 1 - slave for server 1 server 2 failure: server 2 fails hello -> hello -> heartbeat-ping stops server 1 takes over

High Availability with Linux / Hepix October 2004 Karin Miers 9 Heartbeat Problems heartbeat only checks whether the other node replies to ping heartbeat does not investigate the operability of the services even if ping works, the service could be down heartbeat could fail, but the services still run To reduce this problems: special heartbeat features stonith, watchdog and monitoring

High Availability with Linux / Hepix October 2004 Karin Miers 10 Watchdog special heartbeat feature - system reboots as soon as the own heartbeat stops server 1 hello -> hello -> server 2 heartbeat reboot

High Availability with Linux / Hepix October 2004 Karin Miers 11 Stonith Shoot the other Node in the Head - in case a failover happens the slave triggers a reboot of the master node using ssh or special hardware (remotely controlled power switch) server 1 hello -> hello -> server 2 heartbeat stonith device reboot

High Availability with Linux / Hepix October 2004 Karin Miers 12 Network Connectivity Check ipfail - checks the network connectivity to a certain PingNode if the PingNode cannot be reached service is switched to the slave PingNode PingNode slave master slave master eth1 eth1 eth1 eth1

High Availability with Linux / Hepix October 2004 Karin Miers 13 DRDB Distributed Replicated Block Device kernel patch which forms a layer between block device (hard disc) and file system over this layer the partitions are mirrored over a network connection in principle: RAID-1 over network

High Availability with Linux / Hepix October 2004 Karin Miers 14 DRBD - How it Works server1 server2 file system file system DRBD TCP/IP TCP/IP DRBD disk driver NIC driver NIC driver disk driver network connection hard disk hard disk

High Availability with Linux / Hepix October 2004 Karin Miers 15 protocol A: Write Protocols write IO is reported as completed, if it has reached local disk and local TCP send buffer protocol B: write IO is reported as completed, if it has reached local disk and remote buffer cache protocol C: write IO is reported as completed, if it has reached both local and remote disk

(Dis-)Advantages of DRBD data exist twice real time update on slave (--> in opposite to rsync) consistency guaranteed by drbd: data access only on master - no load balancing fast recovery after failover overhead of drbd: needs cpu power write performance is reduced (but does not affect read perfomance) High Availability with Linux / Hepix October 2004 Karin Miers 16

System Monitoring with Mon service monitoring daemon: monitoring of resources,network, server problems monitoring is done with individual scripts in case of failure mon triggers an action (e-mail, reboot... works local and remote (on other node and on a monitoring server): drbd, heartbeat running? nfs directory reachable? who is lxha01? triggers a reboot and sends information messages High Availability with Linux / Hepix October 2004 Karin Miers 17

High Availability with Linux / Hepix October 2004 Karin Miers 18 Network Configuration master slave lxha03 eth2 192.168.10.20 heartbeat eth2 192.168.10.30 eth1 192.168.1.2 heartbeat,drbd eth1 192.168.1.3 140.181.67.228 140.181.67.230 :0 140.181.67.76 lxha02 network connectivity network connectivity lxha01 :0 140.181.67.76 NFS lxha03 client /usr/local PingNode (nameserver)

High Availability with Linux / Hepix October 2004 Karin Miers 19 Configuration Drbd lxha02 HW raid5, ~270 GB / /var /usr /tmp /data /data/var/lib/nfs lxha03 HW raid5, ~270 GB / /var /usr /tmp /data /data/var/lib/nfs ext3 / ext2 xfs NFS /drbd/usr/local eth1 eth1 NFS /drbd/usr/local client lxha01 /usr/local drbd storage ~260 GB NFS: lxha01:/drbd/usr/local

High Availability with Linux / Hepix October 2004 Karin Miers 20 Experiences in Case of Failure in case of failure the nfs service is taken over by the slave server (test -> switch off the master) watchdog, stonith (ssh) and ipfail work as designed in general clients only see a short interruption and continue to work without disturbance down time depends on heartbeat and drbd configuration example: heartbeat 2 s, dead time 10 s = > interruption ~20 s

Replication DRBD full sync takes approximately 5 h (for 260 GB) only necessary during installation or if a in case of a complete overrun happens normal sync duration depends on the change of the file system during down time example: drbd stopped, 1 GB written - sync: 26s until start up, 81s for synchronisation 1 GB deleted, 27 s until start up, synchronisation time ~ 0 High Availability with Linux / Hepix October 2004 Karin Miers 21

High Availability with Linux / Hepix October 2004 Karin Miers 22 Write Performance with iozone, 4GB file size xfs file system without drbd, single thread: 28,9 MB/s with drbd (connected): 17,4 MB/s --> 60 % unconnected: 24,2 MB/s --> 84 % 4 threads: 15,0 MB/s with drbd (connected), but protocol A: 21,4 MB/s --> 74 % unconnected: 24,2 MB/s --> 84 %