Virtual Infrastructure Implementation of High Availability and Business Continuation Solutions Seva Semouchin Technical Account Manager VMware International Limited What this Presentation is About HA High Availability DR VI Disaster Recovery Virtual Infrastructure 2 www.decus.de 1
Agenda HA and DR in Adaptive Enterprise ITIL and VIM VI and High Availability VI and Business Continuation VIM Implementation of HA and DR Implementation of High Availability Solutions HA Classic (ESX 2.5.x / VC 1.x) HA Advanced (ESX 3.0 / VC 2.0) Aligning with VIM Business Continuation Solutions DR Best Practices Aligning with VIM Conclusion 3 HA an DR in Adaptive Enterprise ITIL and VIM Introduction in ITIL and VIM Usage ITIL and VIM to Implement Adaptive Enterprise VMware VI Technology basics ESX Server Virtual Center VI and High Availability HA Implementation Practices VI Added Values VI and Disaster Recovery DR Implementation Practices VI Added Values 4 www.decus.de 2
IT Infrastructure Library - ITIL Best Practice Framework Process based De facto standard worldwide Service Delivery IT Service Continuity Management -> Disaster Recovery Service Availability Management -> High Availability 5 Virtual Infrastructure Management - VIM Virtual Infrastructure Methodology (VIM) is a fourphased methodology developed and employed by VMware Professional Services to consistently deliver comprehensive solutions to assess, plan, build and manage VMware virtual infrastructure. Assess Plan Build Manage 6 www.decus.de 3
In Other Words - VIM Is being practiced by VMware PSO consulting branch of VMware and partners Is deliverable based Is a collection of best practices gathered by VMware profesional Is ITIL aligned Is under development 7 VIM Impacts People New responsibilities and procedures may require new skills or staff role changes or additions Process New paradigms may require new procedures Impact on existing procedures must be addressed Technology Includes not just new servers, but impacts networking and storage Business strategy Financial objectives and requirements should be addressed 8 www.decus.de 4
VIM Phases and Objectives Assess Identify opportunities for virtualization, and model scenarios Identify goals, methods, impacts, and scope Plan Introduce VI concepts through prototyping and whiteboard sessions Define architecture, implementation plan and validation test criteria, Build Implement virtual infrastructure solution and train staff in-depth Generate confidence and acceptance Manage Support ongoing maintenance and operations Identify opportunities for next iteration 9 VMware ESX Server 10 www.decus.de 5
VMware Virtual Center Virtual Center Agent ESX Server Farm Virtual Center server Virtual Center Database. Management and Performance Data 11 VMotion 12 www.decus.de 6
High Availability Works against: Unplanned outages a.k.a. Failures Planned outages a.k.a. Maintenance Usual Practices Cold Standby Warm Standby Hot Standby (cluster) Based on Service Level Agreemens (SLA) 13 VI Added Value to High Availability Cold Standby Use the same standard server for all virtualized applications Warm Standby Redeploy Broken VM from template Use VM repository Hot Standby Use VMotion to prepare maintenance Cluster VMs Cluster VM and Physical Boxes Cluster two ESX Boxes Use VMware HA (since ESX 3.0 / VC 2.0) instead of clusters 14 www.decus.de 7
Disaster Recovery Works Against Complete lost or heavy damage of the whole facility Usual Practices Own Standby Facility Outsourced Standby Facility Partial Standby Capacity Data Replication Data Backup Continuous Trainings Based On Service Level Agreements 15 VI Added value to Disaster Recovery Standby Facility Old, non standard equipment may be used Easy to outsource One standby facility for many production facilities By partial capacity easy to adapt to SLAs. Decision, which VMs should run and which not can be quickly revised Replication and Backup Replicate VMs as data Use the same processes to redeploy VMs from template or repository as for HA Use VMs to recover physical boxes (P2V) Disaster Simulation Implementation is easier and cheaper 16 www.decus.de 8
Implementation of High Availability Solutions HA Classic (ESX 2.5.x / VC 1.x) Clustering Virtual to Virtual Clustering Virtual to Physical Clustering with VCS HA Advanced (ESX 3.0 / VC 2.0) VMware HA (previous DAS) VMware HA vs. Failover Cluster (MSCS for example) VMware DRS Aligning with VIM HA on Assesment Phase HA on VIM Plan and Implementation Phase HA on Management Phase 17 VMware Classic Clustering 18 www.decus.de 9
Veritas VCS Solution VCS VM Agent VCS Software Shared Storage 19 VMware HA (previous DAS) 2-way 2-way 4-way 8-way Server Farm Solves the all my eggs in one basket (one ESX box) problem Detects an ESX hardware failure Automatically restarts virtual machines on remaining boxes Complementary to DRS. DRS places the VM s Requires shared storage Built-in alternative to clustering (for selected applications) Available as VirtualCenter add-on in 2005 20 www.decus.de 10
VMware HA (previous DAS) 2-way 2-way 4-way 8-way Server Farm Solves the all my eggs in one basket (one ESX box) problem Detects an ESX hardware failure Automatically restarts virtual machines on remaining boxes Complementary to DRS. DRS places the VM s Requires shared storage Built-in alternative to clustering (for selected applications) Available as VirtualCenter add-on in 2005 21 VMware DRS 2-way 2-way 4-way 8-way Server Farm Farm-level resource balancing How it works VM s are automatically VMotion d to boxes with more spare capacity Leads to 60%-80% server utilization Intelligent placement Continuous optimization through VMotion Available as VirtualCenter add-on in 2006 22 www.decus.de 11
VMware DRS 2-way 2-way 4-way 8-way Server Farm Farm-level resource balancing How it works VM s are automatically VMotion d to boxes with more spare capacity Leads to 60%-80% server utilization Intelligent placement Continuous optimization through VMotion Available as VirtualCenter add-on in 2006 23 Implementing HA with VWware HA and DRS Availaible with ESX Server 3.0 / Virtual Center 2.0 VMware HA and DRS both are plug-ins for Virtual Center Shared Storage is Required In Big Environments use Folders to Separate HA Groups Better used together First VMware HA restarts failed VMs Then VMware DRS distributes load on survived servers Best practice is the implementation of ITIL Capacity Planning. You can have necessary ressources to restart VMs Provides services comparable with Failover Cluster With VMware DRS even more value 24 www.decus.de 12
VMware HA vs. Failover Cluster Cluster Failover group Quorum Cluster Database For unintended failover need application restart The same for intended failover No load balancing VMware HA Virtual machine Virtual Center VC Database Thesame No restart is necessary (VMotion) Load Balncing with DRS 25 VC is Not The Singe Point of Failure VirtualCenter Heartbeat NW Agents distributed on ESX Servers maintain heartbeat network Automated install & configuration via VirtualCenter Independent of VirtualCenter after initial configuration 26 www.decus.de 13
Aligning VMware HA Solutions with VIM Assess Plan Build Manage HA on Assesment Phase HA on VIM Plan and Implementation Phase HA on Management Phase 27 Disaster Recovery Solutions DR Best Practices Transactional vs. Crash Consistent Data Possible DR policies VI Advantages for DR Solutions Replication and Redeployment VMware DRS Aligning with VIM DR on Assesment Phase DR on VIM Plan and Implementation Phase DR on Management Phase 28 www.decus.de 14
Transactional vs. Crash Resistent Data Some data, can survive the crash of host computer it is crash consistent data Examples OS Disk, Journal Files Other data, could be damaged through application abort it is transactional data Example Datababase tablespaces We need different approach for different kind of data. Crash consistent data could be replicated as is Transactional data should be replicated in consistent state Quiesce application (like Oracle begin backup ) Clone data at storage array level Only then replicate With a little bit of luck you can replicate transactional data as is and it will be still consistent. 29 VM Data Types.VMDK.VMDK.VMDK VMFS Volume Raw LUN.VMX Local file system Local Storage VM Configuration file (.vmx) Stoarage Are Network Virtual disk file -.vmdk Raw LUN, RAW LUN linked to VMFS 30 www.decus.de 15
31 Replication Candiates VMFS volumes (vmdk files) Array Replication Network replication Raw devices Array Replication. VMX files Network Replication Network replication 32 www.decus.de 16
VMFS Replication Do not place too much VMs on one VMFS volume For example 2 VMFS volumes with 10 VMs each per ESX server in production mode and 3 such volumes per ESX server in DR mode Place on different VMFS volumes VMDK files with transactional data database files with crash consistent data Operating Disks Application Executables Redo logs and generally log files In the case of crash consistent data you may separate frequently changed data and stable data. For example OS disks and redo log disks. 33 Raw Device replication Note that RDM files on replicated VMFS volumes are points to original raw devices, not to replicated ones. This will require recreation of RDM files after the site failover. 34 www.decus.de 17
.VMX (VM Configuration) Files Replication Must be copied to failover site only when changed Changes aren t frequent Copies used for DR can be edited by script Use less virtual RAM Different location of VMDK files on failover site. VMFS volumes possibly will be mounted to other mount points 35 Use Highest VM density on Failover Site Production:Failover 1:2 or 2:3 Place less VMs on VMFS volumes to better distribute them over survived ESX servers Use scripts to change configuration (.vmx) files of VMs moved to failover site 36 www.decus.de 18
Use Bidirectional Replication Place some of active VMs to failover site. This will allow them to stay alive during disaster In this case data must be replicated in both directions from production to failover site and vice versa 37 Use VMs instead of Cluster Groups Reuse HA Techniques for Disaster recovery Less administrative overhead. We need to bring online only one LUN with VMFS volume for many VMs and not one LUN pro cluster group Same effect. Failover in MSCS means restart this application on another node We don t need Windows advanced server licenses for each VM 38 www.decus.de 19
Disaster Recovery Scenario Identify a disaster (cigarette smoked nearby to smoke sensor is NOT a disaster) Bring LUNs with VMFS volume with OS VMDK and other crash consistent data online (script) Make them visible to ESX servers on failover site (script) Mount VMFS volumes (script) Recreate RDM files for raw devices (script) Change vmx files if necessary (script) Start VMs of dedicated ESX servers. For VMs with transactional data use special startup procedure to initiate data recovery prior to start the application (script) 39 Disaster Recovery Optimized for Critical VMs Replicated data can be cloned on failover site using TimeFinder VMFS Cloned LUNS could be made available for ESX servers and mounted there. In case of disaster we need just to start VMs Clones should be updated as frequently, as necessary. Since more storage is necessary should be used for critical VMs only 40 www.decus.de 20
Desaster Recovery P2V 41 Aligning VMware DR Solutions with VIM Assess Plan Build Manage DR on Assesment Phase DR on VIM Plan and Implementation Phase DR on Management Phase 42 www.decus.de 21
Conclusion Virtual Infrastructure increases cost effciency for implementing DR and HR solutions 43 Reference Customer #1 A large insurance company in the USA is using a VI DR Solution based on SRDF. These applications were running on physical machines when the DR Initiative came up, but were moved into Virtual Machines due to the fact that replication would be easier. This drove over 90 machines to be P2V'd into VMs. 110 Virtual Machines Applications: SQL servers Infrastructure servers custom insurance application servers. ESX Server is a 4-way IBM x366 with 16GB RAM. SRDF/a and SANCopy used in replication strategy Replication over 1500 Km away. 44 www.decus.de 22
Reference Customer #2 Guardian insurance implemented virtual desktop infrastructure (VDI) to make this solution desaster resistant SRDF was introduced 700 Virtual Machines 350 of them used for VDI Applications: on VDI just desktop OS 95% Windows XP 5% Widows 2000 professional and Windows NT workstation 56 ESX Server 8-way IBM x445 with 32GB RAM. SRDF/a and TimeFinder used in replication strategy Total amount of data being replicated 7 TB Replication over 300 Km away. 45 Other References Implemeted Volvo IT Vector SGI (Dallas) Infineon (Cary USA) Eastman Under Investigation Montag and Caldwell (Investment) 46 www.decus.de 23