Implementing a Holistic BC/DR Strategy with VMware VMware vforum, 2014 2014 VMware Inc. All rights reserved.
What s on the agenda? Defining the problem Definitions VMware technologies that provide BC and DR vsphere HA and App HA vsphere FT vsphere Data Protection / Advanced vcenter Availability vsphere Replication vcenter Site Recovery Manager (SRM) vcenter Infrastructure Navigator (VIN) Find out more
IT Business Continuity
Is It a Real Problem?
What s the Difference? Disaster Avoidance Disaster Recovery Planned vs. Unplanned
Disaster Recovery vs. Business Continuity Example: Tuesday, August 23, 2011 at 1:51 PM EDT - Magnitude 5.8 earthquake near Mineral, Virginia Disaster recovery required? No Interruption to business continuance? YES!
Fault Tolerance vs. High Availability Fault tolerance Ability to recover from component loss Example: Hard drive failure High availability X Uptime percentage in one year Downtime in one year 99 3.65 days 99.9 8.76 hours 99.99 52 minutes 99.999 five nines 5 minutes
RTO, RPO, and MTD Recovery Time Objective (RTO) How long it should take to recover Recovery Point Objective (RPO) Amount of data loss that can be incurred Maximum Tolerable Downtime (MTD) Downtime that can occur before significant loss is incurred Examples: Financial, reputation
Making an Application Service Highly Available vsphere HA NEW: vsphere App HA
vsphere App HA New Protect off-the-shelf apps VMware vfabric tc Server Policy-based
vsphere App HA New Hyperic Agents Running in VMs vfabric Hyperic Virtual Appliance vsphere App HA Virtual Appliance vcenter Server vsphere vsphere vsphere vsphere vsphere HA Cluster
vsphere App HA New
vsphere App HA Key notes New Available only in vsphere Enterprise Plus Based on VMware vcenter Hyperic Full vcenter Hyperic available only in VC Ops Suite Advanced and Enterprise
What s new in App HA 1.1 New Edit policy Create Duplicate View Delete Custom Service Add a new service Shell script Level 3 Support 5 new languages 5.1 support vsphere 5.1 U2 ESX 5.1
vsphere HA Keep In Mind RTO measured in minutes (not seconds) Requires shared storage Best practices Use admission control percentage policy Test post-failure performance with host maintenance mode Isolation response leave powered on Network and storage redundancy Also see BCO5047
vsphere Fault Tolerance (FT) Zero recovery time, data loss Host hardware failure only Does not protect against OS and application failure Works fine with HA, App HA Why not FT? Resource requirements does workload really need it? VM has multiple CPUs see BCO5065 No VM snapshots backups require agent
Data Protection (Backup and Restore) Agents? No Agents? Both! No agents for majority of workloads keep it simple Agents for certain apps vsphere Data Protection (VDP) Advanced Backup and recovery for VMware, from VMware Based on proven, mature EMC Avamar Agent-less VM backup and restore Agents for granular tier-1 application protection
vsphere Data Protection New
VDP Advanced Keep In Mind Engineered for SMB environments Uses VADP VM snapshots, CBT Utilizes Windows VSS in VMware Tools Works fine with HA, not with FT RDM virtual yes, physical no Is it DR? Maybe depends on RTO, RPO Needs replication offsite, right? see BCO5041
VDP Advanced Keep In Mind Best Practices Prepopulate DNS, always use FQDN Manage VM snapshots Avoid deploying to slow storage Do not power-off, always shut down gracefully Do not schedule backups during maintenance window Also see BCO4756 and BCO5041
vcenter Availability Run vcenter Server application in a VM Run vcenter Server database in a VM Run both in same VM? Protect with vsphere HA vcenter and DB VM restart priority set to High Enable guest OS and App monitoring App HA can protect SQL Server database
vcenter Availability Back up vcenter Server VM and database Image-level backup for vcenter Server VM App-level backup using agent for database backup Why not FT for vcenter Server? vcenter Server requires minimum of 2 vcpus FT does not protect against application failure Replicate vcenter Server, database VMs?
vsphere Replication DR Native tool built into the platform Per-VM hypervisor replication, managed in VC Selectable RPO from 15 min up to 24 hours Selectable destination datastore (Disktype agnostic)
Replication Across Sites VR Appliance vcenter Server VR Appliance vcenter Server ESXi ESXi ESXi ESXi ESXi ESXi VRA VRA VRA VRA VRA VRA NFC NFC NFC NFC NFC NFC VMDK1 StorageStorage StorageStorage (VMDK1)
Four Steps for Full Recovery Right-click, select Recover Select a target folder Select a target resource Click Finish Will validate your choices as you go
New Feature Retain Historical Replicas vsphere VR Agent Retention of multiple points in time allows reversion to earlier known good states After recovery, use the snapshot manager to revert to earlier points
MPIT Presented as VM Snapshots after Failover Use the snapshot manager to revert to earlier points, an interface all administrators have been comfortable with for many years.
vsphere Replication Interoperability HA, vmotion, DRS Storage vmotion and Storage DRS Now supported! VDP Mostly no problem! If using VSS ensure you are using 5.5!! Fault tolerance Doesn t work with VR FT conflicts at the vscsi disk filter level.
vsphere Replication Best Practices RPO Only what is necessary! Just because you can RTO Don t set one! No testing, no automation, manual process. Don t use them. If you must, use virtual compatible. VSS Only if necessary! What about bandwidth? Very hard to determine. Do a local loopback first. RDMs? Don t mix ABR and VR!
SRM What is it? A Disaster Recovery engine A tool that uses externally replicated data (VR or array based) to speed the RTO of a BCP A product that allows for DR to be tested, automated, planned, repeatable and customizable What is it not? A replication engine A tool for systems that need near-instant RPO A disaster avoidance stretched cluster
Key Components of SRM vcenter Server One vcenter Server (Windows or VCVA) per site, same versions Replication SRM Server One SRM Server per site, same versions vsphere hosts, recommend same versions per site (pre vsphere 5.x only if using array replication) vsphere Essentials Plus and higher editions supported
SRM Replication Options Multi-tier App Web Storage-based Replication LUN 1 SRM can utilize BOTH array based AND vsphere Replication App DB LUN 2 Hub SRM will see existing standalone vsphere Replication protected VMs vsphere Replication Multi-tier App Web SRM can install vsphere Replication from scratch if needed App DB
Recovery Workflows Failover Automation User defined recovery plan Minimize errors Non-disruptive Failover Testing Isolated test environment Increase confidence in DR process Planned Migration Zero data loss Operational migration Failback Automation Re-protect VM s, migrate back
SRM Interoperability Works with VR and- ABR Backups, VADP or other are fine HA is no problem at all vmotion and DRS are fine Storage vmotion and Storage DRS Sort of Replication Dependent FT is yellow Array replicated only and the FT status is not recovered Web vs vsphere Client
SRM A Few Best Practices Not exhaustive How long is VMworld? Big ones: Biggest one: Storage Layout Test Network Configuration Test often! Size vcenter correctly Do a Business Impact Analysis RPO, RTO, Cost of downtime, interdependencies, criticality of applications, priorities, units of failover, overlooked externalities, executive buy-in,..
Protection Groups (PGs) More PGs = more granular testing/failover DR testing is easier fewer resource requirements Fail-over only what is needed More configuration/complexity Less protection groups = less complex Fewer LUNs, PGs, recovery plans Less flexibility Majority of outages are partial (not entire data center) design accordingly Find a good balance between flexibility and simplicity Fewer LUNs/PGs Less complexity Less flexibility Varies by customer Right combination of complexity and flexibility More LUNs/PGs More complexity More flexibility
Test Network Use VLAN or isolated network for test environment Default Auto setting does not allow VM communication between hosts Different vswitch can be specified in SRM for test versus run Specified in Recovery Plan
vsphere Infrastructure Navigator
VMware Multiple Levels of Protection Site A SQL vsphere HA/FT
VMware Multiple Levels of Protection Site A VDPA SQL vsphere HA/FT
VMware Multiple Levels of Protection Site A Site B VDPA SQL VR/SRM SQL vsphere HA/FT
Additional Resources
Find Out More Take an online hands on lab Ask for a demo Install 60-day evaluation
Thank You