The Promise of Virtualization for Availability, High Availability, and Disaster Recovery - Myth or Reality?

Similar documents
Veritas InfoScale Availability

Veritas Cluster Server by Symantec

SAP Solutions on VMware Business Continuance Protecting Against Unplanned Downtime

Symantec Cluster Server powered by Veritas

Veritas Cluster Server from Symantec

MAKING YOUR VIRTUAL INFRASTUCTURE NON-STOP Making availability efficient with Veritas products

SAP HANA virtualized Technology Roadmap. Arne Arnold, SAP HANA Product Management September, 2014

WHITE PAPER: HIGH CUSTOMIZE AVAILABILITY AND DISASTER RECOVERY

Microsoft SharePoint 2010 on VMware Availability and Recovery Options. Microsoft SharePoint 2010 on VMware Availability and Recovery Options

Building disaster-recovery solution using Azure Site Recovery (ASR) for Hyper-V (Part 1)

Building an Internal Cloud that is ready for the external Cloud

EMC VPLEX FAMILY. Continuous Availability and data Mobility Within and Across Data Centers

Building your Server for High Availability and Disaster Recovery. Witt Mathot Danny Krouk

HA for Enterprise Clouds: Oracle Solaris Cluster & OpenStack

Comparing TCO for Mission Critical Linux and NonStop

Veritas Storage Foundation High Availability for Windows by Symantec

HA / DR Jargon Buster High Availability / Disaster Recovery

Optimization, Business Continuity & Disaster Recovery in Virtual Environments. Darius Spaičys, Partner Business manager Baltic s

Pervasive PSQL Meets Critical Business Requirements

SQL Server High Availability: After Virtualization SQL PASS Virtualization Virtual Chapter September 11, 2013

IBM Cloud Computing for SAP IBM Corporation

High Availability Guide for Distributed Systems

Mastering Disaster Recovery: Business Continuity and Virtualization Best Practices W H I T E P A P E R

Building Private Cloud Architectures

Solution Brief Availability and Recovery Options: Microsoft Exchange Solutions on VMware

iseries Recovery Options Pro s & Cons

Availability Guide for Deploying SQL Server on VMware vsphere. August 2009

Stretched Clusters and VMware

Module: Business Continuity

Oracle on System z Linux- High Availability Options Session ID 252

Exam : IBM : IBM NEDC Technical Leader. Version : R6.1

High Availability with Windows Server 2012 Release Candidate

Marco Mantegazza WebSphere Client Technical Professional Team IBM Software Group. Virtualization and Cloud

Getting the Most Out of Virtualization of Your Progress OpenEdge Environment. Libor Laubacher Principal Technical Support Engineer 8.10.

High Availability for Virtualized Environment. NEC Corporation

Storage System High-Availability & Disaster Recovery Overview [638]

Strategies to Solve and Optimize Management of Multi-tiered Business Services

DeltaV Virtualization High Availability and Disaster Recovery

Skelta BPM and High Availability

Rethink Disaster Recovery with Microsoft

Disaster Recovery As A Service Storage by CloudGrid and Zerto Virtual Replication Disaster Recovery and Business Continuity Platform

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

HIGH AVAILABILITY STRATEGIES

COMPARISON OF VMware VSHPERE HA/FT vs stratus

CA ARCserve Replication and High Availability Deployment Options for Hyper-V

Implementing a Holistic BC/DR Strategy with VMware

This presentation provides an overview of the architecture of the IBM Workload Deployer product.

Learn How to Leverage System z in Your Cloud

HRG Assessment: Stratus everrun Enterprise

OVERVIEW. CEP Cluster Server is Ideal For: First-time users who want to make applications highly available

Running Oracle Databases in a z Systems Cloud environment

Our Cloud Backup Solution Provides Comprehensive Virtual Machine Data Protection Including Replication

<Insert Picture Here> Oracle VM and Cloud Computing

End-to-End Availability for Microsoft SQL Server

IBM Cloud Computing for SAP

Rajesh Gupta Best Practices for SAP BusinessObjects Backup & Recovery Including High Availability and Disaster Recovery Session #2747

Symantec Storage Foundation High Availability for Windows

IBM SAP International Competence Center. The Home Depot moves towards continuous availability with IBM System z

Getting Even More Out of OpenEdge in a Virtualized Environment

Deployment Options for Microsoft Hyper-V Server

Confidently Virtualize Business-Critical Applications in Microsoft

An Oracle White Paper November Oracle Real Application Clusters One Node: The Always On Single-Instance Database

Symantec and VMware: Virtualizing Business Critical Applications with Confidence WHITE PAPER

Data Center Optimization. Disaster Recovery

DB2 9 for LUW Advanced Database Recovery CL492; 4 days, Instructor-led

Leveraging Virtualization in Data Centers

Cisco Active Network Abstraction Gateway High Availability Solution

Asigra Cloud Backup V13.0 Provides Comprehensive Virtual Machine Data Protection Including Replication

VMware Virtual Infrastucture From the Virtualized to the Automated Data Center

BDR TM V3.0 DEPLOYMENT AND FEATURES

Private Cloud for WebSphere Virtual Enterprise Application Hosting

EMC VPLEX FAMILY. Continuous Availability and Data Mobility Within and Across Data Centers

SAP HANA Operation Expert Summit BUILD - High Availability & Disaster Recovery

Planning for the Worst SAS Grid Manager and Disaster Recovery

MaximumOnTM. Bringing High Availability to a New Level. Introducing the Comm100 Live Chat Patent Pending MaximumOn TM Technology

How To Run Apa Hadoop 1.0 On Vsphere Tmt On A Hyperconverged Network On A Virtualized Cluster On A Vspplace Tmter (Vmware) Vspheon Tm (

CompTIA Cloud+ 9318; 5 Days, Instructor-led

In addition to their professional experience, students who attend this training should have technical knowledge in the following areas.

SAP NetWeaver High Availability and Business Continuity in Virtual Environments with VMware and Hyper-V on Microsoft Windows

Chapter 1 - Web Server Management and Cluster Topology

Volvo IT s Mainframe journey

CompTIA Cloud+ Course Content. Length: 5 Days. Who Should Attend:

Disaster Recovery Solution Achieved by EXPRESSCLUSTER

High Availability for Databases Protecting DB2 Databases with Veritas Cluster Server

VMware Business Continuity & Disaster Recovery Solution VMware Inc. All rights reserved

Leveraging Virtualization for Disaster Recovery in Your Growing Business

Disaster Recovery with EonStor DS Series &VMware Site Recovery Manager

Transcription:

B.Jostmeyer Tivoli System Automation bjost@de.ibm.com The Promise of Virtualization for Availability, High Availability, and Disaster Recovery - Myth or Reality? IBM Systems Management, Virtualisierung und Storage Symposium 15.- 17. November 2010, Marriott Hotel Heidelberg

The Promise of Virtualization for Availability, High Availability, and Disaster Recovery Myth or Reality? Virtualization technologies play an important role in datacenters especially in service oriented ( cloud ) environments. There is a lot of focus on virtualization technologies for distributed server platforms like VMware, System p s, SUN Solaris Zones, and others. Of course, virtualization provides several benefits nevertheless we want to concentrate in this presentation on the aspects of availability, high availability and disaster recovery. In the next hour I want to provide an overview about some existing virtualization technologies and their benefits to increase application availability by reducing planned downtimes. Furthermore I want to discuss limitations of virtualization technologies in comparison to traditional high availability solutions and how overall availability can be enhanced by a combination of both worlds 2

Agenda Part I: Introduction Part II: Usage of Virtualization Technology for Availability and High Availability Capabilities and Limitations Part III: Usage of Virtualization Technology for Disaster Recovery Capabilities and Limitations 3

Introduction to Virtualization Technologies Virtualization in simple words: Abstracting from the hardware Empowering a single piece of hardware to run multiple independent systems Primary value of virtualization is to enhance the overall utilization of hardware. VM VM VM (HW) Different virtualization areas: virtualization Provides multiple virtual machines (VM) on one physical server hardware as host for operating systems. Storage Virtualization not in the scope of this talk Network Virtualization is the term for a component used to manage virtual machine with its resources on one physical server 4

Virtualization and High Availability More than 80% of enterprises have adopted server virtualization, but only 20% of all server workload is on virtual machines Lack of confidence when it comes to high availability of virtual infrastructure Better management tools predict increase in adoption rate to 48% by 2012 Virtualized landscapes have the same high availability needs - stay in business 24x7x365 Failures causing service outages happen on hardware as well as on software stack Whenever maintenance is required if possible avoid service interruption Be prepared for the worst recover the business in another site 5

Business Continuity Definitions High Availability Continuous Availability Continuous Operations High Availability - A system to provide service during defined periods, at acceptable or agreed upon levels and masks UNPLANNED OUTAGES from endusers. Continuous Operations - A system to continuously operate and mask PLANNED OUTAGES from end-users. Continuous Availability - Attribute of a system to deliver non disruptive service to the end user 7 days a week, 24 HOURS A DAY (there are NO outages). 6

Business Relevance of Availability Commerce is handled over the internet and computing centers are growing At the same time businesses need to ensure that their systems are available 24/7 Downtime can be directly translated into loss of revenue Average cost of 1 hour downtime: 42.000$, but cost can be much higher Overall Availability situation has to consider planned and unplanned outages Availability 90% 95% 99% 99.9% 99.99% 99.9999% Downtime per year 36.5 days 18.25 days 3.65 days 8.76 hours 52.6 minutes 31.5 seconds 7

Business Continuity Issues 40 % Operations Errors 20 % Environmental Factors, HW,, Power, Disasters 40 % Application Failures Source: Gartner Group, 2007 Reasons for planned downtime Maintenance Tests Reasons for unplanned downtime Operator errors Application failures Environmental failures failures HW failures Disasters Additional Challenges caused by dynamically created services ( IaaS, PaaS, SaaS ) Loss of business Loss of customers the competition is just a mouse click away Loss of credibility, brand image and stock value 8

Virtualization Marketing Messages... 9

Part II: Usage of Virtualization Technology for Availability and High Availability - Capabilities and Limitations

Agenda Part I: Introduction Part II: Usage of Virtualization Technology for Availability and High Availability Capabilities and Limitations Virtual Mobility Automatic Restart of Virtual Added value through combination of virtualization features with HA software Fault Tolerance Part III: Usage of Virtualization Technology for Disaster Recovery Capabilities and Limitations 11

Virtual Mobility Virtual Mobiliy can move complete, running VM images (hosting and applications) from one virtual server to another virtual server with no downtime of the service. Examples: VMware vmotion, POWER Live Partition Mobility,... Customer benefit: mobility can help to reduce the planned down time for maintenance steps (e.g. HW maintenance). After move of guests a server can be shut down Limitation: Guest Mobility cannot be used in unplanned failure situations (HW/SW) VM VM VM VM (OFFLINE) I II 12

Automatic Restart of Virtual (s) s can detect unplanned VM outages (e.g failure), unplanned hypervisor outages, or HW failures and restart failed images. In case of failure the hosting hypervisor detects the failure and restarts image In case of hypervisor or server HW failure a backup hypervisor detects the outage and restarts all unavailable virtual servers. Example: VMware High Availability VM VM VM VM I VM VM VM VM (OFFLINE) II 13

Sample Classification for Business Applications Class 1: Unimportant Business Application Unplanned application downtime can be longer than a day (RTO > 1 day) Very long service windows (planned downtime) are accepted IT Configuration: No redundand components, No monitoring Class 2: Important Business Application Unplanned application downtime can be serveral hours to a day (RTO < 1 day) Long service windows (planned downtime) are accepted. Configuration: Mostly no redundand components, application is monitored and in failure situation manually recovered Class 3: Mission Critical Business Application Unplanned application downtime has to be avoided (RTO < x mins) Service windows have to be avoided and should be very short. Configuration redundant components (HW and SW) Usage of technology for automated recovery (e.g. like Tivoli System Automation) (high) availability features are extremely attractive for class 2 business applications. Reason: Simple/easy to use ( with one mouse click ) (high) availability features provide significant added value for class 3 business applications, but are not sufficient. (limitations are explained on ff pages) 14

Technology Limitation - Overview The management scope of a hypervisor is the set of virtual servers. A hypervisor has no knowledge/awareness about the business applications hosted inside the virtual servers. technology limitation: No application awareness: 1. No detection of application failures (SW failures) If the business application within the VM fails, this is not detected by virtualization technology 2. Automatic restart of virtual server does not always guarantee that application is working properly afterwards Application type can cause restart problems 3. No awareness of application dependencies Virtualization technology is not aware of dependencies between different application components running in different VMs 15

Limitation 1: s do not detect of application failures An unplanned outage of a business applications running inside a virtual servers will result in business service interruption when no other high availability product has been configured to observe the status of these applications VM VM VM VM? I 16

Excursus - High Availability for different Application Types Stateless Application Application / Component Type Stateless Application Multiple Instances Recommendation No failover required Stateless Recommendation: Provides implicit HA by running multiple instances in parallel Web Web Web System I System II System III Warm-Standby Warm Standby and Hot Standby (Stateful Application Component Single Instance) Warm-Standby (Type I) Recommendation: Use SA MP for Warm-Standby (Example: DB2 ) Hot Standby (Type II) Recommendation (for existing proprietary Hot Standby solutions): Use SA MP for split-brain resolution and automation (Example: DB2 HADR) Warm standby Hot Standby DB2 System I System II DB2 HADR System I Hot-Standby DB2 HADR System II Active-Active (Stateful Application Component Single Instance) Active/Active (Type III) No failover required, implementation requires a very complicated, infrastructure to support data integrity and resiliency (e.g. DB2 pure Scale) Active Active Recommendation: Use SA MP for split-brain resolution and automation Active / Active DB2 DB2 DB2 System I System II System III 17

Limitation 2: Automatic Virtual Restart is not always sufficient Automatic Virtual Restart for applications of Type II Hot Standby does not work. Sample scenarios for Hot Standby applications: SAP Central Service, DB2 HADR SAP Enqueue LPAR SAP Enqueue Replication LPAR SAP core component SAP Central Service (SCS) consisting of enqueue server and enqueue replication server will hang after a simple restart of the virtual server running the enqueue server. Automation logic is required to start enqueue server on virtual server where enqueue replication server is already running DB2 Primary LPAR DB2 Secondary LPAR To exploit DB2 HADR feature, role of DB2 Secondary has to be changed to primary after failure of DB2 Primary. 18

Limitation 3: No awareness of application dependencies across virtual servers Relationships between business application components are not known by any hypervisor and can cause that application does not work after recovery Sample scenario: Recovery of DB2 node (via Automatic Virtual Restart) requires J2EE container recycle. Since virtual server is not aware of application dependency application hangs Recovery of database after failure often requires recycle of J2EE application Web Web Web WAS DB2 LPAR LPAR 19

Agenda Part I: Introduction Part II: Usage of Virtualization Technology for Availability and High Availability Capabilities and Limitations Virtual Mobility Automatic Restart of Virtual Added value through combination of virtualization features with HA software Part III: Usage of Virtualization Technology for Disaster Recovery Capabilities and Limitations 20

System Automation for Multiplatforms an Overview Tivoli System Automation for Multiplatforms Provides a High Availability Cluster Automates startup and shutdown in correct sequence of complex, statefull applications Heartbeat Actively monitors all resources and reacts on outages of SW and HW components by automatic restart in correct context shared Disk Automation Policies define the Automation Scope of System Automation Describe resources, groups and relationships Define the desired target availability situation No need to develop automation scripts / workflows / actions 21

Application Automation & High Availability Automation and Availability are two major functional aspects provided by the SA Product Family Automation Automate complex operations reduce skill requirements Applications skills Operation System skills Focus on dependencies between business relevant applications Support changing automation goals Runs On Depends On Depends On SA monitors application, systems, file systems, networks SA choreographs startup and shutdown of these resources High Availability for Applications: Avoid downtime - keep business critical applications Running 24 x 7 SA provides HA cluster for redundancy SA uses automation aspect to re-assure availability Heartbeat shared Disk 22

System Automation for Multiplatforms Usage in Virtualized Environments Value Statement: Collaboration of virtualization technology and classical HA clustering provides best of both. Benefits ( Best of both ): SA provides recovery for application failures (hypervisor limitation 1) SA provides recovery automation for hot standby applications (hypervisor limitation 2) DB2 DB2 HA Cluster SA MP SA MP 1 2 I II DB2 Primary DB2 Secondary Primary SA MP HA Cluster SA MP 1 2 23 I II

Reduced Planned Downtime for Clustered Mission Critical Application Value Statement: Collaboration of virtualization technology and classical HA clustering provides best of both. Benefits ( Best of both ): Virtualization avoids planned downtime production workload SA provides recovery of unplanned HW/SW failures S1 S2 Scenario description 1. Operator moves guest running SAP production system to another server without impacting HA redundancy 2. System Automation recognizes guest move and assures application high availability through application standby/secondary 3. System Automation detects application failure and recovers application on standby server 2 HA Cluster (SA) 1 S3 24

Recovery of unplanned HW/SW failures for Mission Critical Applications Value Statement: Collaboration of virtualization technology and classical HA clustering provides best of both. Benefits ( Best of both ): Virtualization avoids planned downtime production workload SA provides recovery of unplanned HW/SW failures S1 1 HA Cluster (SA) S2 Scenario description 1. Hardware failure of the system where the application is running 2. System Automation detects node failure and recovers application on standby server 3. Guest is moved to spare server 4. System Automation re-establishes cluster 2 HA Cluster (SA) HA Cluster (SA) 1 S3 25

Tivoli System Automation Application Manager The Problem: Business applications are complex and difficult to manage. The reason for this is caused by...... a multi-tiered SW stack (application components) which builds up the overall business application.... application components running in a heterogeneous platforms environment... start/stop dependencies between application components (which are also often not documented) The Solution: Tivoli SA Application Manager allows to operate on business applications as a single instance. Tivoli System Automation Application Manager...... allows to aggregate a multi-tiered SW stack to a single business application instance... provides various adapters for the heterogeneous platform environment... knows the start/stop dependencies between the application components... can automatically restart after application failures AIX HA Cluster AIX Linux HA Cluster Linux 26

SA Application Manager: Automation of Multi-tiered Applications Value Statement: Collaboration of virtualization technology and classical HA clustering provides best of both. Benefits ( Best of both ): SA Application Manager can manage relationships in multitiered business applications (hypervisor limitation 3) SA Application Manager Automation Policy Web Portal HTTP Ref StartsAfter WAS Ref StartsAfter DB2 Ref Recovery of database after failure requires often recycle of J2EE application Web Web Web WAS DB2 LPAR LPAR LPAR KVM VMware PowerVM 27

Outlook: Automated Maintenance with no Application Downtime Value Statement: Automation (SA AppMan) will allow to perform server evacuation in a single step. Today, best practice is a stepwise evacuation of virtual servers using guest mobility. This is a time-consuming manual operation task. System Automation Benefits: Automated, step-wise relocation of virtual servers (SA) Application impact assessment of evacuation operation S1 S2 2 4 HA Cluster HA Cluster Scenario description 1. Operator initiates server evacuation 2. System Automation automates the guest mobility of virtual servers (stepwise guest mobility) and eventually also stop virtual servers 3. Operator turns server S3 off 1 3 5 6 S3 28

Part III: Usage of Virtualization Technology for Disaster Recovery - Capabilities and Limitations

Disaster Recovery for Virtualized Platforms Replication of Virtual images across sites. Site I Site II Classical DR solutions replicate application data Examples: VMware SiteRecovery Manager, Tivoli Productivity Center for Replication DB2-A DB2-B DB2-C DB2-A DB2-B DB2-C Limitations No application awareness Linux A Linux B Win C Linux A Linux B Win C Linux A Linux B Win C Linux A Linux B Win C Storage-based replication of -images and data 30

Disaster Recovery with System Automation Application Manager Disaster Recovery environments are multi-site datacenter setups Metro/Global Mirror replication technologies are employed to ensure that all business relevant data are available on the failover site. The RPO in such environment is typically in the range of hours. A DR plan exists that contains instructions in case of a disaster With System Automation Application Manager a DR solution can be created integrating replicated storage setups with multi-tiered business applications. Stuttgart (Production) Böblingen (Backup) 31

SA Application Manager & Disaster Recovery Components View SA Application Manager Operations Console Automation JEE Framework Manages Applications Websphere Automation Engine Adapter Adapter Adapter Adapter SA MP HACMP SA MP HACMP Manages Data Replication TPC-R Replication 32 Site I Site II

Outlook - SA Disaster Recovery Manager Manage Business Applications Manage site-relocation for multi-tiered Applications controlling different s 1. Operator wants to move multi-tiered, cross platform SAP Production to Site II 2. System Automation stops all SAP Applications in correct sequence and starts virtual guests on Site II 3. With help of TPC-R System Automation switches Replication Direction for SAP DB 4. System Automation starts SAP again on Site II SAP Prod SAP Prod SA App Man Web VMWare WAS VMWare Web VMWare SAP Dev. VMWare SAP DB Prod SAP DB Prod TPCR LPAR LPAR DB2 LPAR LPAR LPAR SAP DB2 Dev LPAR LPAR LPAR PowerVM PowerVM PowerVM PowerVM Site I Replication Session A Site II Replication Session B 33 DSxxxx DSxxxx

Outlook: Manage Availability in Hybrid Cloud Environments Operators System Automation Application Manager HA Cluster A HA Cluster B HA Cluster C IaaS / PaaS Physical Virtualized Public Cloud Service Resources / Applications Dependencies On Premise Off Premise 34

Summary - Value of HA Clustering in Virtualized Environments Virtualization in a data center improves utilization and thus reduces HW costs. Vitualization technologies can help to enhance availability for planned outages High Availability can only be ensured by having redundancy of HW and SW. Availability of Business Applications can only be ensured by having a true HA cluster, since do not manage Applications. VM VM VM SA Cluster VM VM VM (HW) (HW) 35

Summay - Value of Virtualization & System Automation for Datacenters Disaster Recovery Solutions for virtualized environments require the replication of your business relevant data to a remote site. SA Application Manager provides a way for coordinated site fail-overs of your business applications even when running clustered, in virtualized environments. System Automation Application Manager VM VM VM SA Cluster VM VM VM VM VM VM (HW) (HW) (HW) Data Replication 36

More about High Availability and System Automation in Cloud Infrastructures The Promise of Virtualization for High Availability Cloud Resiliency Business Continuity for heterogeneous Infrastructures 37

Thank YOU!!! Need More Information? New WIKI on developerworks: https://www.ibm.com/developerworks/wikis/display/tivoli/tivoli+system+automation Contact: Thomas Lumpp, STSM - SA AM / SA MP thomas.lumpp@de.ibm.com +49-7031-16-3057 Bernd Jostmeyer. Lead Developer SA AM bjost@de.ibm.com +49-7031-16-4106 38