VMware Business Continuity & Disaster Recovery Solution 2009 VMware Inc. All rights reserved
Agenda Introducing vcenter Site Recover Manager 5.0 What s New In Site Recovery Manager 5.0 SRM Architecture & Workflows vsphere Replication Running DR Drills & Testing with SRM 5 SRM Recovery & Planned Migration SRM Advanced Settings SRM Editions & Licensing 2 Confidential
Tradeoffs Of Traditional Business Continuity Solutions Application-level availability silos: Complex and expensive Middleware / Java Oracle RAC Oracle DataGuard MS Clustering DB Mirroring DB Access Groups CCR / SCR App Server Cluster Session State Replication Data protection services: Longer RTOs and RPOs Backup Data replication 3 Confidential
VMware Improves Business Continuity At All Levels Local Site Failover Site vsphere vsphere vsphere vsphere vsphere Local Availability vsphere High Availability vsphere Fault Tolerance vmotion and Storage vmotion Data Protection vsphere Data Recovery Storage APIs for Data Protection Improved in 2011 Improved in 2011 Disaster Recovery vcenter Site Recovery Manager Includes vsphere Replication New in 2011 Improved in 2011 4 Confidential
Challenges of Traditional Disaster Recovery Expensive Software Hosts Storage Facilities >$10K per app Complex Recovery Plans Apps???? Storage?? Hosts? Network? Unreliable Failovers Failure to meet business requirements Long RTOs days to weeks Too much time and resources consumed 5
vsphere Provides The Best Foundation For Disaster Recovery Consolidation vsphere Cost-Efficient Infrastructure Reduced hardware requirements at recovery site Use recovery hardware to run low-priority apps Flexible Infrastructure Hardware Independence vsphere vsphere Eliminate need for identical hardware across sites Enable waterfalling of equipment to recovery site Simple Application Protection Encapsulation Entire system including application, OS, and data is stored as virtual machine files Entire system can be protected with data protection tools 6
7 Simple and Reliable DR with vsphere and SRM
vcenter Site Recovery Manager Ensures Simple, Reliable DR Site Recovery Manager Complements vsphere to provide the simplest and most reliable disaster protection and site migration for all applications Site A (Primary) VMware vcenter Server VMware vsphere Site Recovery Manager Site B (Recovery) VMware vcenter Server VMware vsphere Site Recovery Manager Provide cost-efficient replication Built-in vsphere Replication Broad support for storage-based replication Simplify management of recovery and migration plans Replace manual runbooks with centralized recovery plans From weeks to minutes to set up new plan Servers Servers Automate failover and migration processes Enable frequent non-disruptive testing Ensure automated failover and migration Automate failback processes 8
How SRM Works vcenter Server SRM Server VM s Protect Placeholders Replicated Invoke Recovered Recovery VM s Created Storage Activated vcenter Server SRM Server Storage Array Replication Protected Site Recovery Site Copyright 2009 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents.
What s New In Site Recovery Manager 5.0? vsphere Replication Expand DR coverage to Tier 2 apps and smaller sites Automated failback Planned migration Streamline planned migrations (for disaster avoidance, planned maintenance, ) Others More granular control over VM startup order Protection-side APIs IPv6 support 10
Key Components Of SRM 5 Required at both protected and recovery sites vcenter Server Virtual Machines VMware vsphere Servers Storage Site Recovery Manager Site Recovery Manager Manages recovery plans Automates failovers and failbacks Tightly integrated with vcenter and replication Choice of replication options vsphere Replication Bundled with SRM Replicates virtual machines between vsphere clusters Storage-Based Replication (3 rd party) Provided by replication vendor Integrated via replication adapters created, certified and supported by replication vendor 11
SRM Provides Broad Choice of Replication Options vcenter Server Site A (Primary) Site Recovery Manager Site B (Recovery) vcenter Server Site Recovery Manager vsphere vsphere Replication vsphere Storage-based replication vsphere Replication Simple, cost-efficient replication for Tier 2 applications and smaller sites Storage-based Replication High-performance replication for business-critical applications in larger sites 12
vsphere Replication Complements Storage-Based Replication Replication Provider Cost Management Performance vsphere Replication VMware Low-end storage supported No additional replication software VM granularity Managed directly in vcenter 15 min RPOs Scales to 500 VMs File-level consistency No automated failback, FT, linked clones, physical RDM Storage-based Replication Higher-end replicating storage Additional replication software LUN VM layout Storage team coordination Synchronous replication High data volumes Application consistency possible 13
Planned Migrations For App Consistency & No Data Loss Planned Migration Overview 1 Shut down production VMs 3 Recover appconsistent VMs Two workflows can be applied to recovery plans: DR failover Planned migration Site A vsphere Site B vsphere Planned migration ensures application consistency and no data-loss during migration Graceful shutdown of production VMs in application consistent state Data sync to complete replication of VMs Recover fully replicated VMs Replication 2 Sync data, stop replication and present LUNs to vsphere Benefits Better support for planned migrations No loss of data during migration process Recover application-consistent VMs at recovery site 14
Automated Failback To Streamline Bi-Directional Migrations Automated Failback Reverse original recovery plan Overview Re-protect VMs from Site B to Site A Reverse replication Apply reverse resource mapping Automate failover from Site B to Site A Reverse original recovery plan Site A vsphere Site B vsphere Restrictions Does not apply if Site A has undergone major changes / been rebuilt Not available with vsphere Replication Reverse Replication Benefits Simplify failback process Automate replication management Eliminate need to set up new recovery plan Streamline frequent bi-directional migarations 15
Scalability Maximum Enforced Protected virtual machines total 1000 No Protected virtual machines in a single protection group 500 No Protection groups 150 No Simultaneous running recovery plans 10 No vsphere Replicated virtual machines 500 No 16
17 SRM Architecture
SRM Architecture Protected Site vsphere Client SRM Plug-In Recovery Site vsphere Client SRM Plug-In DB DB DB DB SRM Server vcenter Server vcenter Server SRM Server VRMS DB ESX ESX ESX VRA VRA VRA VRS ESX ESX VRMS DB VMFS VMFS Storage Replication VMFS VMFS Storage 18
Overall Solution Components vcenter must be 5.0 and licensed and running on each site vsphere must be 3.5 or later and running on each site SRM Server Requires a Windows 64 bit OS. Storage Replication must be on our compatibility list, and have the snapshot or clone technology licensed for SRM tests SRA Storage Replication Adapter is the connection between VMware and the storage environment VRMS vsphere Replication Management Server VRA vsphere Replication Agent VRS vsphere Replication Server ESXi 5.0 Mandatory for vsphere Replication 19
Storage Array Integration Storage Replication Adapters (SRAs): Discover arrays Determine which LUNs are replicated Assist in initiating tests, recovery New capabilities in SRAs for version 5.0 include - Reprotect - Synchronization - Planned Migration SRM Server Replication Manager Array Manager SRA Vendor Management Interface Array Manager SRA Vendor Management Interface SRM 5 will require new SRA s Array Array Array SRM Compatibility Matrix:http://www.vmware.com/pdf/srm_storage_partners.pdf 20
21 vsphere Replication
vsphere Replication Architecture Tightly Integrated With SRM, vcenter and ESX Protected Site Recovery Site vcenter Server Site Recovery Manager Site Recovery Manager vcenter Server vsphere Replication Management Server vsphere Replication Management Server VSR Agent ESX ESXi vsphere Replication Server ESXi Any storage supported by vsphere Any storage supported by vsphere 22
vsphere Replication Details Replication options may be set per Virtual Machine Can opt to replicate all or a subset of the VM s disks You can create the initial copy in any way you want - even via sneaker net! You have the option to place the replicated disks where you want. Disks are replicated in group consistent manner Simplified Replication Management User selects destination location for target disks User selects Recovery Point Objective (RPO) User can supply initial copy to save on bandwidth Replication Specifics Changes on the source disks are tracked by ESX Deltas are sent to the remote site Does not use VMware snapshots 23
vsphere Replication UI Select VMs to replicate from within the vsphere client by right-click options Can configure for an individual VM, or multiple VMs simultaneously! 24
vsphere Replication 1.0 Limitations Focus on virtual disks of powered-on VMs ISOs and floppy images are not replicated Powered-off/suspended VMs not replicated Non-critical files not replicated (e.g. logs, stats, swap, dumps) VR works at the virtual device layer Independent of disk format specifics Independent of primary-side snapshots Snapshots work with VR, snapshot is replicated, but VM is recovered with collapsed snapshots Physical RDMs are not supported FT, linked clones, VM templates are not supported with VR Automated failback of VR-protected VMs will come later that the initial 5.0 release, but will be supported in the future. Virtual Hardware 7 or later is required for VMs to be protected by VR. 25
User Interface SRM s interface is new and able to manage the entire SRM framework from one GUI. Both sides visible without Linked Mode! 26
User Interface Site-specific Networking settings for VMs New icons for shadow VMs 27
28 SRM Use Cases
Use Cases 3 typical Unplanned Failover Preventive Failover Planned Migration Recover from unexpected site failure Full or partial site failure The most critical but least frequent use-case Unexpected site failures do not happen often When they do, fast recovery is critical to the business Anticipate potential datacenter outages For example: in case of planned hurricane, floods, forced evacuation, etc. Initiate preventive failover for smooth migration Graceful shutdown of VMs at protected site Leverage SRM planned migration capability to ensure no data-loss Most frequent SRM use case Planned datacenter maintenance Global load balancing Ensure smooth site migrations Test to minimize risk Execute partial failovers Use SRM planned migration to minimize data-loss Automated Failback enables bi-directional migrations 29 Highly scalable 500 virtual machines
SRM Reduces Recovery Risk With Frequent Testing Recovery Risk Traditional Disaster Recovery Recovery Risk DR Test TESTING GAP Lack of confidence in DR process DR Test During the testing gap, organizations can t be sure that they can recover the current IT environment A failover scenario may take days or weeks to complete, leaving the business at extreme risk DR Test Frequent DR Testing Site Recovery Manager DR Test Time Time SRM provides assurance that DR objectives will be met. 30
Running a Test Recovery Plan API 31
Testing Protected Site In-Sync Recovery Site Storage Array Replication Test Continue Copyright 2009 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents.
Testing and Executing Recovery Plans Steps in recovery plan Status and time stamps When to execute User confirmation message Copyright 2009 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents.
Testing a Recovery Plan VM s are ready to be used now 34
Cleaning up a Test Recovery After testing is complete, the environment is easily cleaned up. Following cleanup, no test resources are in use at the recovery site Test or recovery is now ready to be run once again 35
36 SRM Recovery & Planned Migration
SRM Provides Broad Application Coverage RTO: 30 minutes to hours RPO: Flexible based on storage replication Continuous App-level geo-clustering / load balancing Tier 1 Apps RTO Hours Site Recovery Manager Tier 2 Apps Days Tier 3 Apps Days Hours RPO Synchronous 37
SRM Supports Flexible Topologies Active-Passive Failover Active-Active Failover Bi-directional Failover Shared Recovery Sites Production Production Production Recovery Recovery Production Most common traditional scenario Expensive dedicated resources Leverage recovery infrastructure for test, development, training Utilize sunk cost of recovery site Production applications at both sites Each site acts as the recovery site for the other Many-to-one failover Particularly useful for Remote Office / Branch Office 38
Simple Setup And Management of Recovery And Migration Plans From Complex Runbooks to Simple Recovery Plans Weeks or months to set up Error-prone Quickly falls out of sync with apps and infrastructure changes Simple recovery plan set up in minutes Fewer steps means far less room for errors Simple to keep in sync with changes 39
Five Simple Steps To Create Recovery And Migration Plans Create Recovery Plans in 5 Steps And Eliminate Manual Steps of Traditional Recovery Step 1 Step 2 Step 3 Step 4 Step 5 Map production site resources to recovery site Resource pools vswitches VM folders Select virtual machine protection groups to include in recovery Select low-priority VMs to suspend at recovery site Specify boot sequence of recovered VMs Customize IP addresses of recovered VMs Reconfigure individual hosts Recover entire systems including OS and application binaries Coordinate storage and replication processes for recovery Stop replication and make replicated LUNs writable Present data to applications Present VMs to vsphere Reconfigure physical switching infrastructure Optional Add messages and custom scripts 40
Running a Recovery Plan API 41
Failover Protected Site In-Sync Recovery Site Storage Array Replication Run SRM Prepare Storage steps Rescan HBAs Refresh storage views across ESX hosts Copyright 2009 VMware, Inc. All rights reserved. This product is protected by U.S. and international copyright and intellectual property laws. VMware products are covered by one or more patents listed at http://www.vmware.com/go/patents.
Planned Migration Will shutdown protected VM s, and than synchronize them! Will stop on errors and let you fix them! 43
Disaster Recovery Will shutdown protected VM s, and than synchronize them IF it can! Will NOT stop on errors and let you fix them! 44
Running a Recovery Plan Storage Layer Protected Site Recovery Site Replication 45
Recovery The production workloads are now working on the recovery site. 46
Failback Failback is a use case that combines other SRM capabilities Failback is a failover, a reprotect, and a subsequent failover Process is shown started below with a successful planned migration. 47
Failback - continued Replication now goes in reverse to the protected side 48
Failback - continued Following a reprotection, the environment may be failed back to the original primary site. 49
History Reports Each workflow operation has an associated history report 50
History Reports - continued 51
52 SRM Advanced Settings
Advanced IP Customization The GUI shows IP customization for manual customization of IP addresses IP Customization information can now be configured for both protected and recovery sites Command line bulk IP customization includes support for both IPv6 addresses, and dual-site IP information No more Sysprep, or Customization Specifications required Performance of IP customization much faster 53
Advanced IP Customization UI 54
Advanced VM Dependency Management SRM has 5 priority levels Within a priority group all virtual machines will start simultaneously 55
Advanced VM Dependency Management continued Dependencies may be defined to dictate start sequence of VMs. This provides the ability to manage sophisticated start order of virtual machines so that it is easier to recover multi-tier apps. 56
Advanced VM Dependency Management continued Group 1 Group 2 Group 3 Group 4 Group 5 Master Database Database App Server 1 Apache Apache Desktop Desktop App Server 2 Desktop Exchange Mail Sync Desktop 57
58 SRM Edition & Licensing
SRM 5 Editions Lineup SRM 5 Standard Enterprise Scalability Limits Maximum protected VMs 75 virtual machines (1) Unlimited (2) Features Support for storage-based replication Centralized recovery plans Non-disruptive testing Automated DR failover vsphere Replication Automated failback Planned migration 1. Maximum of 75 VMs per site and per SRM instance 2. Subject to the product s technical scalability limits New in SRM 5.0 59
Purchasing & Licensing Site Recovery Manager 5.0 Supported Versions and Editions Licensing Metric Licensing Requirements Site Recovery Manager 5.0 Per VM One license per protected VM Includes powered off protected VMs vcenter Server vcenter 5.0 vcenter Standard or Foundation Per instance Two licenses required one for the protected site, one for the recovery site vsphere vsphere 4 or 5 vsphere Enterprise Plus, Enteprise, Advanced or Standard Per proc Need to license all the hosts powered on across both protected and recovery sites 60
Questions? 2009 VMware Inc. All rights reserved