Disaster Recovery Best Practices WWT Educational Webcast Ed Levens World Wide Technology David L. Jones EMC
Questions are Encouraged You can ask questions during the presentation by using the link provided in the Webcast Viewer.
Your Success Drives Ours Relentless Focus on People, Process & Partnerships Strong Partner Relationships Over 1,000 Talented Employees Proven Processes Nearly $3 Billion in Revenues Strong Credit Line - $350MM + Key Contract Vehicles: VHA, HPG ITES-2H, GSA, SEWP
Our Focus Technology Unified Communications Security Mobility Solution Integrated voice, video and data networks can lower costs and provide employees with productivity benefits. Adaptive threat response that stops network threats before they stop your business. Maintain your competitive advantage through the freedom and flexibility of wireless networks. Data Center Intelligent storage architectures can help reduce expenses; increase agility for changing priorities; and improve remote file management and backup.
Disaster Recovery Best Practices David L. Jones EMC 5
Agenda Today's Reality IT Business Continuance and Disaster Recovery Considerations Technology Choices EMC RecoverPoint Questions? 6
Unfortunately, disasters do happen we must be prepared. Are You Ready? 7
Unfortunately, disasters do happen Of all the organizations surveyed 55% had an incident that disabled their primary data center 60% of these had a regional backup site that was also disabled by the incident When systems go down, the losses add up Organizations want better protection! 8
Types of Disasters Type of Disaster Example Nature / Man-Made Katrina / 9/11 Sudden / Time to Prepare Earthquake / Hurricane Building / Local Area / Region Fire / Power Outage / Flood 9
Most Frequent Impacts to IT Availability Disasters represent a fraction of Environmental issues Server Application Software 30% Client Application Software 5% 40% 30% Network S/W 5% 15% Software Failure Environment Hardware 5% 10% People Planned Downtime Source: IEEE Computer 10
Dilbert Does Disaster recovery 11
Definitions Business continuance / COOP describes the processes and procedures an organization puts in place to ensure that essential functions can continue during and after a disaster Disaster recovery is the process, policies and procedures related to preparing for recovery or continuation of technology infrastructure critical to an organization after a natural or human-induced disaster High availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period 12
Continuity of Operations Policy (COOP) It is the policy of the United States t to have in place a comprehensive and effective program to ensure continuity of essential Federal functions under all circumstances. As a baseline of preparedness for the full range of potential emergencies, all Federal agencies shall have in place a viable COOP capability which ensures the performance of their essential functions during any emergency or situation that may disrupt normal operations. 13
Agenda Today's Reality IT Business Continuance and Disaster Recovery Considerations Technology Choices EMC RecoverPoint Questions? 14
Business Continuance EMC / WWT Approach Build on our understanding of our customers, their business / mission, and their critical processes and objectiveses Capitalize on our long pedigree in designing, building and managing business/mission-critical systems for the Data Center Technology Business Continuance 15
IT Considerations Management buy in and commitment is critical Know the regulations specific to your agency or organization Conduct a risk assessment and identify critical priorities Determine response for different disaster scenarios Establish clearly defined roles & responsibilities for personnel Establish effective communication channels Maintain necessary resources, tools, and supplies Testing! Testing! and more Testing! Disaster recovery must be included as part of every process 16
IT Considerations Disaster recovery must become part of the IT mind set not an after thought High availability and disaster recovery go hand in hand Define Architectures that build disaster recovery in from the beginning Application Development Infrastructure Design QA, QC, Test and Development Make use of industry recognized processes and architectures ITIL, MOF, MSA / WSSRA, etc Recovery of applications without user interruption is nirvana but difficult to achieve across the entire enterprise 17
IT Considerations Recovery Point Objective (RPO) The last saved data that the restarted application will reflect following the recovery. Also, a measure of the amount of time for which work may be lost in the event of an unplanned outage at the primary site. Period tape backup vs. continuous disk-to-disk k replication Synchronous vs. Asynchronous Recovery Time Objective (RTO) - The time that will pass before an infrastructure is available. In order to reduce RTO, data must be online and available at another site. Distance Data must be recovered on undamaged hardware outside the disaster zone. Required distance between primary and recovery sites should be based on likely regional threats. 18
Agenda Today's Reality IT Business Continuance and Disaster Recovery Considerations Technology Choices EMC RecoverPoint Questions? 19
Business Requirements should Drive Technology Options Business Considerations RTO Infrastructure Alternativesti Cold Site RTO=Days RPO Warm Site Isolation Hot Site Protection ti GAP Active-Active RTO=Zero Matching Data protection to Business Requirements 20
Data Center Design and Architecture Data Center design should be a high priority to ensure all the aspects of power, cooling, access and security have been core to the design The distance between data centers will change the options that you have for the deployment of a disaster recovery strategy for all the services IT provides Cold Site, Hot Site, Bunkers, Fully Active / Active This is business decision first Make effective use of and leverage your existing facilities Leveraging disaster recovery assets can provide maximum value BUT can also extend time to recovery or RTO This choice will impact the technology decisions and options that are available to you 21
Reference Architectures 22
Virtual and Physical Considerations Server, Storage and Network Virtualization cam maximize resources and streamline operations and disaster recovery Server virtualization is mature and there are many choices VMware Microsoft HyperV Citrix / Zen Cisco California Storage virtualization is mature but not as widely deployed EMC Invista HDS Array based NetApp VSeries Oh Other Network virtualization is a developing technology Cisco Converged Networking NEXUS Brocade / Foundry merger Server Vendor Products 23
Virtual and Physical Considerations Disaster recovery considerations for virtualized environments Physical to Virtual Virtual to Physical Physical to Physical Virtual to Virtual Consolidated disaster recovery using virtualization technologies can maximize resources DR in a box Maximum utilization of disaster recovery resources Virtualization can present management challenges Virtual to Physical Mappings Management infrastructure must provide visibility ibili Server Storage Network No Silver Bullets technology continues to develop 24
Understanding Data Consistency Applications and data are Order Entry CRM interrelated (Federated) All data movement must be stopped/started at the same point in time To restart applications you must have all the data not parts of it Recovery requires dependentwrite consistency across all volumes and systems DB DB DB SCM Systems share information how do you get a consistent view? 25
Infrastructure Services Without Disaster recovery enabled infrastructure most other Disaster recovery efforts will fail Core services like Networks, DNS, Directory Services, etc are required for all of the other process that run in the Data Center VPN and remote access services can be your best ally in the event of disaster and must be core to your plans Management infrastructure will play a role in conducting root cause analysis ONLY if it is available In most cases infrastructure services are COTS based and have been designed to provide availability using a geographically distributed scale out model Vendor selection and partnership is key in this area because most infrastructure solutions both hardware and software based must interoperate 26
Applications Applications are very rarely standalone Multi-tired applications (WEB, App Server, Database) will almost always require all tiers to operate Most applications will not work if the required infrastructure is not also part of the plan Data consistency between the tiers makes recovery much easier and more timely Network based or Software based load balancing is the most common method for making WEB and Application tiers resilient Applications that require persistent data storage may have additional requirements There is no silver bullet for all applications Work with your COTS vendors and internal development groups to define standards 27
Applications An example via email Email IS NOT a standalone application An enterprise class email implementation will usually consist of at least the following: Main email data servers SMTP (Inbound and outbound mail) Integration point with a directory server Blackberry, Blueberry, Strawberry, you get the point WEB based email front end Real Time Collaboration SharePoint, DB system, IM, etc Multiple Infrastructure touch points DNS, WINS, VPN, etc External Vendors Cellular provider Providing a disaster resilient email solution requires all of these things to be coordinated 28
Databases Different types of databases require different kinds of disaster recovery solutions Read only / Data warehousing Transactional Most common types of disaster recovery solutions in the database space are Oracle GRID/RAC based or scale out implementations - Clustering Storage replication with application tie in Data Base level replication Most disaster recovery solutions for databases require a tight integration with the application tier solution in order to ensure transaction level recovery Most transactional database solutions will require tight integration with your disaster recovery choices for storage and data replication 29
Storage / Data Protection Daily backup Snapshots Any point in time Significant point in time Daily recovery points from tape or disk More frequent disk-based recovery points All recovery points Database checkpoint Pre-app patch Post-app Database patch checkpoint Quarterly close Any userconfigurable event Significant Continuous points time Data Any point Protection in time Snapshot Daily backup Yesterday 24 hours Midnight Now 30
Storage / Data Protection Creating remote and local copies of your data is a must for disaster recovery The replication of storage data is a complex process that requires knowledge of what is being stored, detailed performance analysis and network impact analysis Synchronous vs. Asynchronous It s all about distance Adaptive solutions can provide dynamic RPO Application level consistency is paramount Many types of storage replication technologies exist Array Based Usually locks you into storage array choices Host Based Complex in larger enterprises Appliance based Offers most flexibility 31
Storage / Data Protection A data replication solution that allows the flexibility of applying ppy different RPO policies to both storage and in turn applications is key Ability to prioritize RPO application by application Create tiered model based on business requirements Data Backup is here to stay and having a robust backup AND restore environment is crucial Tape Backup to Disk (VTL & CDP) Offsite storage of backup data Data Security Date protection can reside on many tiers consolidating it s management is key 32
Vendor Choice is Critical Disaster recovery IS complex Disaster recovery spans internal IT organizations and specific technology disciplines Management by In is critical for success Disaster recovery involves many internal and external partners Partnering with vendors is key as are the partnerships between your vendors! 33
Agenda Today's Reality IT Business Continuance and Disaster Recovery Considerations Technology Choices EMC RecoverPoint Questions? 34
GDA1 Data Replication Pain Points in Heterogeneous Environments Application platform support Local site Oracle Exchange SQL Application response time Application- consistent recovery Remote site Oracle Exchange SQL Corruption protection SAN SAN SAN Existing infrastructure Communications cost Disaster-recovery testing IBM HDS EMC HP SUN Heterogeneous storage IBM HDS EMC HP SUN 35
Slide 35 GDA1 Added host platform support to graphic in red, change back to normal, updated title. Content: please adjust build as appropriate -- all the boxes should flow in with a slight delay between each. Gary Archer, 1/9/2008
RecoverPoint Concurrent Local and Remote (CLR) Data Protection ti PRODUCTION SITE DISASTER RECOVERY SITE Cluster Active Node Cluster Passive Node RecoverPoint appliances Standby Disaster Recovery Server Tape Backup Manager Replication Data Flow SAN SAN/WAN SAN Tape Library RecoverPoint Replication Services Storage Groups and Logs Local Journal Remote Journal Replicated Storage Groups and Logs Performance architecture Out-of-band design leveraging intelligent host and fabric interfaces* Supports CLARiiON write splitting on CX3 and CX4 arrays Designed to work in enterprise-class environments Replication across heterogeneous storage* Leverage existing storage investments Co-exists with local l CDP True bi-directional, any-to-any replication Replicate between arrays at same or different site* * RecoverPoint/SE does not support intelligent fabric, and only supports a single CLARiiON array at each side/site True CDP data protection for applications All writes stored in Journal with application bookmarks for recovery Supports Microsoft Volume Shadowcopy Service (VSS) and VDI APIs Concurrent local and remote data protection Create local and remote copies of the same LUNs Recover both locally or remotely to different point-in-time images No impact to production or the other replica during recovery Unified management interface Remotely configure, monitor, manage CDP/CRR Programming CLI for intelligent scripting 36
Journaling for Application-Aware Aware Recovery Journal Includes Data Plus Metadata Time/date Identifies the time image was saved Bookmarks: System-generated group bookmarks e.g., Volume Shadowcopy Service (VSS) backup User-generated bookmarks Other EMC product bookmarks EMC Replication Manager System-event-generated bookmarks Microsoft SQL Server Microsoft Virtual Device Interface (VDI) operations Microsoft Exchange Microsoft VSS 37
Grouping for a Consistent View Allows application recovery to be tiered by service level Multiple volumes per group Mixed recovery point objectives within same infrastructure OE Group 1 CRR Provides independent replication controls Recover by group, locally or remotely Start/stop t/ t by group Enables grouping of optimization Importance Resource usage Recovery point and recovery time objectives CRM SCM CDP CDP E-mail Group 2 Group 3 CRR CRR CRR 38
Grouping for Federated Environments Each tier has different service level agreements Consistency groups per tier Operational recovery of tier Parallel l consistency across tiers Federated environments Recover to a known point for all applications Disaster recovery for tier or application Spans operating systems, applications, storage, and servers Enables advanced functions Full environment cloning Application upgrade testing Data mining Consistent production rebuild 1: Linux (Web OE) 2: Windows (CRM) Consistency group Consistency group 3: UNIX (SCM, Financials ) Consistency group 39
RecoverPoint/Cluster Enabler (RecoverPoint/CE) RecoverPoint WAN RecoverPoint Each named cluster group s associated devices reside in a single RecoverPoint consistency group of the same name File Share Witness with RecoverPoint/CE installed Supports Microsoft Cluster Server on Windows Server 2003 and Microsoft Failover Cluster on Windows Server 2008 Enterprise and Datacenter Editions CG1: Devices for Cluster Group1 CG2: Devices for Cluster Group2 Cluster nodes with RecoverPoint/CE installed 40
VMware Infrastructure 3.5 Value and Innovations Consolidate and 3 Update contain servers Manager Management Site Lab Manager VDI Recovery Optimize your and ACE Manager Workstation infrastructure Infrastructure Converter + Automation Business Desktop Software Optimization i Continuity it Management Lifecycle Manage and secure desktops 2 DPM Distributed Resource Scheduler (DRS) Virtual Infrastructure Resource Management High Availability + Consolidated Backup Availability VirtualCenter Mobility + Storage VMotion VMotion Security Maximize continuity and uptime Automate your virtual labs 1 VMware Virtual Machine File System Virtualization Platforms Virtual SMP Virtualization Platforms ESX Server 3.5 ESX Server 3i 3.5 41
VMware Site Recovery Manager Integration Simplifies and automates disaster recovery workflows Setup, testing, and failover Makes disaster recovery a property of the virtual machine (VMware Distributed Resource Scheduler and High Availability) Provides central management of recovery plans from VirtualCenter Turns manual recovery processes into automated recovery plans Four EMC products integrated with VMware Site Recovery Manager SRDF family MirrorView Celerra Replicator RecoverPoint The RecoverPoint storage adapter requires RecoverPoint V3.0 (or later) PRODUCTION APP APP APP APP OS OS OS OS RECOVERY APP APP APP APP OS OS OS OS Makes disaster recovery rapid, reliable, manageable, affordable 42
Agenda Today's Reality IT Business Continuance and Disaster Recovery Considerations Technology Choices EMC RecoverPoint Questions? 43
Questions are Encouraged You can ask questions during the presentation by using the link provided in the Webcast Viewer.
Thank You Disaster Recovery Best Practices Ed Levens World Wide Technology David L. Jones EMC