VMware Virtualization for Business Continuity and Disaster Recovery Guy Bowers Senior Systems Engineer Q1 2010
Agenda How VMware Addresses Business Continuity Considerations for Disaster Recovery Site Recovery Manager Next Steps
Business Continuity: The Big Picture Business Continuity = Minimizing Downtime Availability expectations continue to increase Cost of Downtime Per Hour RTO s decreasing from >24 hours to <12 hours Cost of downtime continues to rise Increasing dependence on x86 infrastructure per Hour $Millions 2.5 2.0 1.5 1.0 0.5 0.0 nergy E Te elecom Manufacturing Average Fi inancial Svc cs IT In nsurance Retail P harma Source: META Group Almost 60% of surveyed companies incurred significant financial damage as a result of systems failure in the past year -- Economist Intelligence Unit
Requirements for Building Business Continuity Solutions Built on a reliable platform Independent d of physical infrastructure Over 85% of customers using for production applications No reliance on OS or arbitrary drivers Hardware-independent protection Protection across operating systems and applications Application and OS independent protection Protection against a broad spectrum of downtime causes Protection against planned and unplanned downtime Protection against component, server, data, and site failures
Transforming the Cost and Complexity of fbusiness Continuity it Cos st Traditional solutions are costly and complex Point solutions tied to HW, OS, or applications VMware reduces cost and complexity at each business continuity level Integrated with the vsphere platform HW, OS, app independent Failover Cluster High-End Server Entry-Level Server Encapsulation Isolation Fault-Tolerant Configurations VMware HA VMotion DRS Shared Redundancy Mirrored Sites VMware FT Site Recovery Manager Uptime
VMware aeoffers esprotection otecto At Every eylevel e Protection against hardware failures Planned maintenance with zero downtime Protection against unplanned downtime and disasters VMware Fault Tolerance, High Availability, DRS Maintenance Mode, VMotion Storage VMotion Site Recovery Manager NIC Teaming, Multipathing VMware Data Recovery, VMware Ready Data Protection solutions from third-party partners Component Server Storage Data Site
Protection Against Planned Downtime Server Maintenance VMotion & DRS Maintenance Mode Migrate running VMs to other servers in the pool Automatically distribute workloads for optimal performance Storage Maintenance Storage VMotion Migrate VM disks to other storage targets t without t disruption Key Benefits Eliminate downtime for common maintenance No application or end user impact Freedom to perform maintenance whenever desired
Protection Against Unplanned Downtime Component Failure Leverage redundant network and storage connections Share redundancy across workloads Server Failure Automatic restart of virtual machines; VMware High Availability automatically restarts VMs on other servers in the pool Continuous protection with VMware Fault Tolerance X Site Failure Automated failover with VMware vcenter Site Recovery Manager
Transforming Availability Hardw ware Fa ailure Tol lerance CONTINUOUS AUTOMATED RESTART MANUAL RESTART UNPROTECTED VMware FT with VMware HA 0% 10% 100% Application Coverage
VMware Fault Tolerance New in vsphere 4! Single identical VMs running in lockstep on separate hosts XApp OS HA App HA XOS App OS FT App OS App OS App OS App OS Zero downtime, zero data loss failover for all virtual machines in case of hardware failures Integrated with VMware HA/DRS VMware ESX VMware ESX X No complex clustering or specialized hardware required Single common mechanism for all applications and operating systems
Agenda How VMware Addresses Business Continuity Considerations for Disaster Recovery Site Recovery Manager Next Steps
Challenges of Traditional Disaster Recovery Complex recovery processes and infrastructure???????? Dependent on perfect training, documentation, and execution Failure to meet recovery requirements > Recovery takes days to weeks > Recovery tests often fail > Significant IT time and resources consumed
Reducing the Risk of Disaster Recovery Failures Drivers of risk to the recovery process New applications or changing app/infrastructure configuration Gap between current configuration and last revision of the DR plan Human error and manual steps during DR testing & failover Availability of key DR staff Lengthy recovery time Increasing complexity of managing the DR solution The downside of these risks Lost business & productivity for each hour of downtime (Unpredictable) staff overtime Application end-users disrupted by testing & outages; inability to meet SLAs
Reducing and Managing Recovery Risk Recovery Risk IT Environment without Virtualization & DR Automation TESTING GAP Recovery Risk DR Test Virtualization + DR Automation Unproven Recoverability DR Test Time Virtualization During the testing gap, organizations can t be sure that they can Frequent DR Testing recover the current IT environment A failover scenario may take days or weeks to complete, leaving the business at extreme risk DR Test DR Test Time Virtualization & DR Automation Greatly Reduce Recovery Risk
Key Features of Virtualization for Disaster Recovery Hardware-Independence Reliably recover a virtual machine to any hardware Enable waterfalling of equipment to recovery site Encapsulation All information about a system is stored as data on disk Entire systems can be protected with data protection tools Partitioning and Consolidation Reduced hardware requirements at production and DR site Can use higher consolidation ratios at DR site Resource Pooling Transparently share and allocate hardware resources Automatic resource optimization
Advantages of Virtual Disaster Recovery VMware is a true enabler for Disaster Recovery Virtual machines are portable Virtual hardware can be automatically configured Test and failover can be automated (minimizes human error) The need for idle hardware is reduced Costs are lowered, and the quality of service is raised
Infrastructure Challenges of Traditional Recovery Fastest, most reliable recovery requires duplicating infrastructure Same servers, same network configuration, etc. Requires ongoing management Production Recovery Idle infrastructure at recovery site Difficult to share Time-consuming to repurpose Organizations spend significant time and money on recovery infrastructure that is rarely used
Reduce Cost and Complexity of Recovery Infrastructure Eliminate hardware dependencies Reduce risk of failures during recovery Reduce ongoing management burden Reduce infrastructure t requirements Consolidate production and recovery Reuse servers from production for recovery Turn recovery site into productive resource Leverage recovery site for other workloads Resource guarantees ensure predictable resource allocation Production VMware Recovery Failover Test/Dev VMware
Improving Data Protection VMware enables scalable, non-disruptive backup and simple, reliable restore to any hardware APP OS APP OS ESX BACKUP APP OS Traditional backup APP Disruptive to applications and users OS Backup Job Slow, complex process for full restore Hardware dependencies complicate restore VM Snapshot Tape or disk Backup with VMware vsphere Non-disruptive to applications & users Enables off-host, off-lan backup with standard backup software via vstorage APIs for Data Protection Enables image and file-level backup of virtual machines
VMware Data Recovery New in vsphere 4! 1. Backup VirtualCenter vcenter Server t Agent-less, disk-based backup and recovery of your VMs 1.Schedule backups via VC 2.Snapshots taken 3.Data de-dupedduped and stored VM or file level restore Incremental backups and data de-dupe to save disk space 2. Restore VirtualCenter vcenter Server 1.VM goes down 2.Select VM images/files to recover 3.Restore VM running in seconds X X De-duplicated Storage Quick, simple and complete data protection ti for your VMs Centralized Management through VMware Infrastructure client Cost-effective use of storage for backup data Copyright 2005 VMware, Inc. All rights reserved.
VMware Data Recovery Key Components Backup and Recovery Appliance - Linux appliance in OVF format - leverages vstorage API for Data Protection to discover, manage backup and restore - First backup is full VM, then incremental forever - VM or file level restore VMware vsphere - VSS support via VMware Tools - Changed block tracking functionality allows backups to be more efficient VMware vcenter Server Destination Storage - Any VMFS storage: DAS, NFS, iscsi or Fibre Channel storage plus CIFS shares as target - All backed up virtual machines are stored in a deduplicated datastore vcenter Server integration - vsphere Client Plugin - Wizard driven backup and restore job creation - Automatically import virtual machine inventory - Awareness of HA/VMotion/DRS - Leverage vcenter licensing engine
Improved Recovery with VMware Data Recovery Backups and restores can run simultaneously Highly customizible image level restore Replace a lost VM Restore to a different location/datastore Select disks to restore Fast roll back : Use change tracking to roll back a virtual disk/virtual Machine to an earlier state Only transfers modified blocks for fast restore Restore Rehearsal: Run a restore of a VM to a different datastore and disable networking
Simplifying the Disaster Recovery Process Physical Configure hardware Install OS Configure OS Install backup agent Start Single-step automatic recovery Virtual 40+ hrs < 4 hrs Restore VM Power on VM Eliminate recovery steps No operating system re-install or bare-metal recovery No time spent reconfiguring hardware Standardize recovery yprocess Consistent process independent of operating system and hardware
VMware for Disaster Recovery Customers 55% of customers using virtualization for BC/DR (#1 reason for virtualization behind consolidation/resource utilization) Press VMware Site Recovery Manager 2008 Gold Award for Backup and Disaster Recovery Software and Services (Storage Magazine) VMware Infrastructure Best Disaster Recovery Product of 2006 (TechTarget) Using VMware Infrastructure t in our disaster recovery plans, we ve been able to reduce the time it takes to recover our critical systems by 50 percent. -- Ted Duncan, Education Datacenter, Florida Department of Education
Agenda How VMware Addresses Business Continuity Considerations for Disaster Recovery Site Recovery Manager Next Steps
VMware vcenter Site Recovery Manager Site Recovery Manager leverages VMware vsphere to deliver advanced disaster recovery management and automation Simplifies and automates disaster recovery workflows: Setup, testing, failover Turns manual recovery runbooks into automated recovery plans Provides central management of recovery plans from the VMware vsphere Client Works with VMware vsphere to make disaster recovery rapid, reliable, manageable, affordable
Disaster Recovery Scenarios with Site Recovery Manager Failover to Failover to Bidirectional passive DR site active DR site failover Production Production Production Local failover Production Recovery Recovery Production Most common traditional scenario Very expensive architecture Leverage recovery infrastructure for test, development, training Reduces sunk cost of recovery site Production applications at both sites Each site acts as the recovery site for the other Less common scenario Protection against large localized failures in datacenter
Deployment Topologies Standard Deployment: 1:1 mapping between each protected site and its recovery site Shared Recovery Sites: New in SRM 4.0! Multiple sites can be protected by a single, shared recovery site Leverage for remote office/branch office topologies
Shared Recovery Site Scenario Site1 vcenter Shared Recovery Site Site1 SRM A Site1 SRM B Site2 SRM A Site2 vcenter Site3 SRM B Site3 vcenter
Site Recovery Manager Key Components vcenter Server Virtual Machines Site Recovery Manager Site Recovery Manager > Manages and monitors recovery plans > Tightly integrated with vcenter Server Site Recovery Manager VMware vsphere > Requires supported version of ESX vcenter Server Virtual Machines > Requires supported version of vcenter Server Storage New in SRM 4.0! VMware vsphere VMware vsphere Servers Storage > iscsi, FibreChannel or NFS storage Storage Partner Replication Servers > Integrated via replication adapters created, certified and supported by replication Storage vendor Partner Replication
Integrated Management of Disaster Recovery Managed through vcenter plug-in Key configuration steps
Disaster Recovery Setup Site A Site B Integrate with replication Identify which virtual machines are protected by replication configuration Replication Map recovery resources Server resources, network resources, management objects Create recovery plans For virtual machines, applications, business units Convert manual runbook to pre-programmed p response Customizable with scripting and callouts Simplify configuration of recovery infrastructure and process Simplify coordination of replication with virtual environment
Creating and Editing Recovery Plans Recovery plan editor Recovery plans for failure scenarios
Testing Site B Create isolated test environment Snapshot replicated LUNs before test Change all virtual machines to a test port group before powering them on Automate t test t execution Using recovery plan created during setup Customizable for testing with extra breakpoints and callouts for testt Log test execution Reset environment after test Power off and delete any test VMs Delete snapshots of replicated LUNs Non-disruptive testing of recovery plans Testing can incorporate existing/non-virtual DR tools and processes
Testing and Executing Recovery Plans Steps in recovery plan Status and time stamps When to execute User confirmation i message
Failover Automation Site A Site B Detect site failures Raise alert when heartbeat lost Initiate failover User confirmation of outage Granular failover initiation Replication Manage replication failover Break replication Make replica visible to recovery hosts Execute recovery process Use pre-programmed plan Provide visibility into progress Automation for failover process Real-time, step-by-step visibility into execution progress
Failback Original Site Site A Reverse Replication Recovery Site Site B Configuring Failback Original site is again operational Set up replication in reverse Reconfigure SRM so that the original site is now the recovery site Create new recovery plan for failback Executing Failback Test and execute automated failback with Site Recovery Manager to restore operation at original site Controlled and automated failback using SRM Restoring Protection Configure replication so that protected VMs are replicated to recovery site Recreate and test recovery plans
Failover / Failback Initiation
Simplified Compliance Self-documenting recovery plans Centrally managed Always current Easier testing Ensure recoverability with realistic testing Auditable testing and failover View and export recovery plans, tests, execution
Site Recovery Manager Licensing License SRM for what you protect Example 1: Licenses for single-direction protection 2 vcenter Server instances 6 processors VMware vsphere (4 processors for Site A, 2 for Site B) 4 processors Site Recovery Manager No extra licenses required for failback VMware vsphere VMware vsphere vsphere licenses required for both sites 2-proc 2-proc 2-proc Example 2: Licenses for bi-directional protection 2 vcenter Server instances 6 processors VMware vsphere (4 processors for Site A, 2 for Site B) 6 processors Site Recovery Manager
Agenda How VMware Addresses Business Continuity Considerations for Disaster Recovery Site Recovery Manager Next Steps
Why VMware Software for Business Continuity Expand protection Any workload in a virtual machine can be protected with minimal incremental effort and cost Slash planned downtime Zero-downtime hardware maintenance Non-disruptive virtual machine disk migration Minimize unplanned downtime Platform reliability built-in Automatic restart after server or OS failure Manageable, automated disaster recovery
Site Recovery Manager Customer References If your organization is already taking advantage of virtualization, then adding Site Recovery Manager to handle disaster recovery is a no-brainer. Jerry Wilkin Senior Systems Administrator, Dayton Superior Corporation Learn more at www.vmware.com/customers/stories
VMware BC/DR Service Offerings VMware vcenter Site Recovery Manager Jumpstart The VMware vcenter Site Recovery Manager Jumpstart provides you with a proof-of-concept, on-site installation and configuration of SRM. 3da days sonsite on-site, 5 participants max Plan and Design for VMware vcenter Site Recovery Manager The Plan and Design for VMware vcenter Site Recovery Manager service provides a comprehensive architectural design for SRM that addresses your requirements, accommodates VMware vsphere dependencies Offered in 3-tiers as a soft-bundle
Where Can I Learn More? vcenter Site Recovery Manager Product Page www.vmware.com/products/srm cts/srm Overview, datasheet, webinars, docs, community links Free 60-day Evaluation all you need to get started! Business Continuity Solutions from VMware www.vmware.com/solutions/continuity VMbook on BC/DR www.vmware.com/resources/techresources/1063 External Resources Administering VMware Site Recovery Manager book by Mike Laverick http://www.lulu.com/content/4343147
Questions?
What s New in Site Recovery Manager 4.0? Support for NFS-based storage replication solutions In addition to existing support for iscsi and FibreChannel solutions, Site Recovery Manager now supports replication solutions which use NFS Many-to-one failover with shared recovery sites Protect multiple sites using a single, shared recovery site Simplified architecture requires only a single instance of vcenter Server at the shared recovery site to manage the recovery of all the protected sites. Compatibility with vsphere 4.0 Easily protect virtual machines running on ESX 3.x x* and/or 4.0 hosts Site Recovery Manager 4.0 requires and is optimized for vcenter Server 4.0 * Check product documentation for details on which specific versions of ESX 3.x are supported by SRM.