Implementing Disaster Recovery? At What Cost?



Similar documents
Real-time Protection for Hyper-V

Combining Onsite and Cloud Backup

5 Essential Benefits of Hybrid Cloud Backup

Business Continuity: Choosing the Right Technology Solution

What You Need to Know About Cloud Backup: Your Guide to Cost, Security, and Flexibility

(Formerly Double-Take Backup)

DISASTER RECOVERY BUSINESS CONTINUITY DISASTER AVOIDANCE STRATEGIES

Financial Services Need More than Just Backup... But they don t need to spend more! axcient.com

Backup Software? Article on things to consider when looking for a backup solution. 11/09/2015 Backup Appliance or

Manufacturers Need More Than Just Backup... But they don t need to spend more! axcient.com

Cloud, Appliance, or Software? How to Decide Which Backup Solution Is Best for Your Small or Midsize Organization.

What you need to know about cloud backup: your guide to cost, security, and flexibility. 8 common questions answered

Future-Proofed Backup For A Virtualized World!

SQL Server Storage Best Practice Discussion Dell EqualLogic

Application Brief: Using Titan for MS SQL

Virtualizing disaster recovery using cloud computing

Affordable Remote Data Replication

Archiving, Backup, and Recovery for Complete the Promise of Virtualization

Bringing the edge to the data center a data protection strategy for small and midsize companies with remote offices. Business white paper

WHITE PAPER. The 5 Critical Steps for an Effective Disaster Recovery Plan

Backup 2.0: un opportunità bestiale. Todd Fredrick Executive Vice President Sales&Marketing e Cofounder di AppAssure

IBM Global Technology Services March Virtualization for disaster recovery: areas of focus and consideration.

Simplify Your Data Protection Strategies: Best Practices for Online Backup & Recovery

Leveraging Virtualization for Disaster Recovery in Your Growing Business

Integrated Application and Data Protection. NEC ExpressCluster White Paper

W H I T E P A P E R. Disaster Recovery Virtualization Protecting Production Systems Using VMware Virtual Infrastructure and Double-Take

white paper Using Cloud for Data Storage and Backup By Aaron Goldberg Principal Analyst, Content4IT

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

Things You Need to Know About Cloud Backup

Virtual Disaster Recovery

The Benefits of Continuous Data Protection (CDP) for IBM i and AIX Environments

What you need to know about cloud backup: your guide to cost, security and flexibility.

How To Protect Data On Network Attached Storage (Nas) From Disaster

How do you test to determine which backup and restore technology best suits your business needs?

RackWare Solutions Disaster Recovery

June Blade.org 2009 ALL RIGHTS RESERVED

DEFINING THE RIGH DATA PROTECTION STRATEGY

BACKUP IS DEAD: Introducing the Data Protection Lifecycle, a new paradigm for data protection and recovery WHITE PAPER

Solution Overview: Data Protection Archiving, Backup, and Recovery Unified Information Management for Complex Windows Environments

What are the benefits of Cloud Computing for Small Business?

OmniCube. SimpliVity OmniCube and Multi Federation ROBO Reference Architecture. White Paper. Authors: Bob Gropman

Hybrid Business Cloud Backup

Cloud Backup and Recovery

Disaster Recovery Strategy for Microsoft Environment

What you need to know about cloud backup: your guide to cost, security, and flexibility. 8 common questions answered

Backup and Archiving Explained. White Paper

The case for cloud-based disaster recovery

How To Get A Storage And Data Protection Solution For Virtualization

Barracuda Backup for Managed Services Providers Barracuda makes it easy and profitable. White Paper

Complete Storage and Data Protection Architecture for VMware vsphere

Demystifying Virtualization for Small Businesses Executive Brief

WHITE PAPER. The Double-Edged Sword of Virtualization:

Session 11 : (additional) Cloud Computing Advantages and Disadvantages

Synchronous Replication of Remote Storage

CA XOsoft Replication and CA XOsoft High Availability CA Partner Frequently Asked Questions

GETTING THE MOST FROM THE CLOUD. A White Paper presented by

Native Data Protection with SimpliVity. Solution Brief

DISASTER RECOVERY WITH AWS

Continuous Data Protection for any Point-in-Time Recovery: Product Options for Protecting Virtual Machines or Storage Array LUNs

Whitepaper : Cloud Based Backup for Mobile Users and Remote Sites

Red Hat Enterprise linux 5 Continuous Availability

How can I deploy a comprehensive business continuity and disaster recovery solution in under 24 hours without incurring any capital costs?

EMC RECOVERPOINT: BUSINESS CONTINUITY FOR SAP ENVIRONMENTS ACROSS DISTANCE

SQL SERVER ADVANCED PROTECTION AND FAST RECOVERY WITH EQUALLOGIC AUTO-SNAPSHOT MANAGER

SQL SERVER ADVANCED PROTECTION AND FAST RECOVERY WITH DELL EQUALLOGIC AUTO SNAPSHOT MANAGER

Data Protection and Recovery

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

Data Protection as Part of Your Cloud Journey

How To Run A Modern Business With Microsoft Arknow

Backup and Redundancy

The Difference Between Disaster Recovery and Business Continuance

The Impact Of The WAN On Disaster Recovery Capabilities A commissioned study conducted by Forrester Consulting on behalf of F5 Networks

Traditional Disaster Recovery versus Cloud based DR

High availability and disaster recovery with Microsoft, Citrix and HP

Zerto Virtual Manager Administration Guide

Transcription:

Implementing Disaster Recovery? At What Cost? Whitepaper Viktor Babkov Technical Director Business Continuity Copyright Business Continuity May 2010

In today s environment, minimizing IT downtime has become a basic imperative for any business, no matter how small or large. In the last two decades, businesses have invested unprecedented amounts into technology which protects their data: undoubtedly the most valuable asset a business has. But data protection is only half the battle: how do you or your clients answer the following questions: (1) How much data would we lose if our server failed or our office was evacuated? (2) How quickly can we get back to business? (3) How would such outage impact our business financially? The answers to these questions are vital in addressing risk management issues for a company s shareholders, and its management team. Today, the recoverability of a business carries greater significance than traditional backup strategies. We all know it costs to lose a server, network or a data center. Every minute of downtime or data-loss, wears a potentially massive cost for almost any business. Most CEO s and CIO s already understand the financial risk of downtime. They are perhaps less aware of the increasing regulatory risk: as a result of increasing regulation of industry verticals with data protection and availability standards. In addressing these risk areas, management must find the right balance between the real need of the business and the financial outlay required for risk mitigation. Recognizing this need for balance, the technology market, has risen to the opportunity by providing us with a myriad of options to choose from. Perhaps too many? Some market players tell us to use courier companies and ship tapes, others suggest replicating data or doing point in time snapshots, sometimes we will be told to buy new hardware or procure implementation of new infrastructure platforms such as virtualization, subscription services, etc.. With the variety of options and factors to consider, it sometimes feels like the average CIO needs an extra degree to push a business s IT resilience strategy in the right direction.

This paper aims to assist decision makers in identifying the best strategic pathway for their business. The paper addresses the following issues: 1. Insurance of Data Replication 2. Definition of IT Recovery - RTO and RTO Recoverability 3. Business Continuity Planning Business Process and reliance on IT 4. Disaster Recovery vs. Operational Recoverability 5. History of Backup / Evolution of Grid Computing 6. DR and Cloud Computing 7. Bandwidth and other Best Practice Technical Considerations 8. Deployment Impact and Ongoing Change Management 9. DR and Virtualization 10. Software vs. Hardware The benefit of optimized Workloads 11. Solution for purpose - CAPEX and OPEX 1. Insurance of Data Definition RPO and RTO Recoverability Insurance Companies involved in corporate insurance almost universally offer a product described as business continuity insurance. This description sounds comforting and all-encompassing, but is it too good to be true? Get to the fine print and you often find that while policies of this type usually offer protection against the risk of damage to or loss of hardware, premises and so on, they do not insure against the risk of data loss or the cost of downtime. As insurance cannot effectively protect a business against these categories of damage, management has an obligation to effect operational initiatives which mitigate against the risk of data loss and downtime. Historically, these initiatives have taken the Backup and Restore approach: companies would back up their data to some medium and, in the event of a hardware / software failure would seek to restore the system to the backed up state. This approach created two issues: (1) Data which was generated between the last backup date and the date of sytem failure is lost; and (2) While a restoration is in progress, staff are unable to effectively carry out their duties, and new data is lost. Defining modern concepts of IT Recovery - RTO and RTO Recoverability Today s CIO must define two objectives when addressing Business Continuity in IT: (1) The amount of data the business can afford to lose (commonly known as the Recovery Point Objective (RPO); and (2) The period for which the sytem can we be down for (known as the Recovery Time Objective (RTO)).

To determine the objectives for a particular business, management must review: (1) the business processes which are dependent on IT infrastructure and assess the cost of interrupting those processes as a result of an a infrastructure failure; and (2) the technology available to mitigate against the infrastructure failure and assess the up-front and ongoing (maintenance, subscription etc) costs of the relevant technology. 2. Defining the cost of downtime Business Process and reliance on IT Setting the RTO / RPO requires a robust analysis of business fundamentals. Such an analysis cannot be undertaken without extensive consultation with department management, who will have a much clearer appreciation of the interaction between IT and business functions. A classic example is internet banking: if we lose this function, payroll may not complete. If this occurs, staff may not be able to process their direct debits leaving the business facing potential reimbursement claims. A more extreme recent example is a major airline s loss of its reservation system in late 2009: a 6 hour outage resulted a week of havoc across airports throughout New Zealand and Australia as well as significant cost exposure for the airline (missed flights, claims for local accommodation etc). The outcome of this type of consultation is usually documented in a Business Impact Analysis report. The consultation process and the delivery of the report allow a business to clearly define its RPO and RTO objectives and to determine appropriate strategies to deliver on those objectives. From a technical perspective, defining such strategies means isolating the dependence of business critical applications to specific front-end servers, networks and database servers ( Infrastructure Components ). The RPO/RTO of each business function then defines the technical solution for each related Infrastructure Component. The more available a solution (i.e. the less downtime) the higher its price. Therefore, less business critical functions (where for example, we can t afford to lose data, but can afford to allow 24 hours for a restoration) will be provisioned and priced accordingly, if we can t afford for an Infrastructure component to have any downtime, another (more costly) solution will be required.

3. Disaster vs Operational Recoverability ` A Disaster is not limited to hardware or software damage from sources external to the business: a disaster may also mean accidental deletion of file, an email, administrator error, individual server failure, etc. The business must be prepared to recover easily from operational error or have capability to roll back easily. This capability carries significant weight in maintaining up-time, but although some technologies feature this capability, it is often overlooked in Disaster Recovery budgets. 4. History of Backup / Evolution of Grid Computing The original media for storage of data goes back some 200 years to punch card technology. In the 1960 s the technology was replaced with magnetic tape. Since one roll of magnetic tape could store as much data as 10 000 punch cards it achieved instant success and became the most popular way of storing of computer data until the mid 1980s. Driven by the birth of TCP/IP protocol and very soon followed by the explosion of the public internet, data networks have grown exponentially allowing databackup over networks to be considered a possibility. Data-Replication was born. If we look at the extent of distributed networks today and the available bandwidth, replication of data, becomes common-sense. The last part of equation is to leverage replication technology for a powerful business continuity solution. Further, replicating to a central location means benefiting from real-time backup (already off-site), without data loss and likely without duplication of data at our backup site. When selecting the correct replication technology, we must consider added overhead on production performance, so the I/O penalty impact is a mandatory consideration when evaluating a business continuity solution. Replication technology would ideally simply monitor the run-time of the server and not penalize the disk periodically with I/O overhead to compare the difference from last snapshot. Additionally, we must not be limited by distance as often the case with hardware based solutions, and consume as little bandwidth as possible. The end result must be to free our production servers from performance stress of backup overhead, while removing the need for off-site courier and vaulting, remove shrinking backup window, address open file concerns, etc. 5. DR and Cloud Computing Centralized Backup of course, is only half the picture. A solution must allow our business to resume quickly in the event of disaster. This means that our data must not only be protected but handle the workloads of all of our servers. These workloads must be spun up on demand ideally on dissimilar hardware and/ or auto provisioned virtual servers. Cloud models and Virtualization are a great fit for providing multi-subscribed resources on demand. Many businesses today that have deployed disaster recovery have often deployed one for one (hardware) configuration for disaster recovery ( DR ) at significant cost. In the past this approach was usually taken because older

technology often had significant hardware dependencies: this is no longer the case. The increasing importance of green IT and power economy, means that disaster recovery tools must be clearly delineated as redundancy tools: DR not require processing resource when it is not used. From a best practice perspective, we suggest testing a business s DR capability at least once per quarter, (testing requires the processing capacity to be available only for the duration of the testing phase (3 days per month is normally adequate). In our view, in order to drive down costs and improve power economies, businesses will need to increasingly do away with their run-time DR, and instead move to a Cloud based DR services that allows for instant resource availability. The technology must be free of hardware and/or virtualization platform dependency and allow for fast restoration of service onto any hardware or a virtual machine. 6. Bandwidth and other Best Practice Technical Considerations Bandwidth Businesses considering replication as a DR solution must ensure that they have adequate bandwidth (in our experience, clients have often assessed technologies that can replicate, without consideration bandwidth requirements, especially data band and quantity costs). Deciding on a solution needs to involve running of throughput analyses of the proposed solution. Such an analysis gives a clear picture of the current bandwidth utilization and the additional load that would be place on the network in implementing a replication solution. The bandwidth efficiency, compression and throttling capability of different solutions will need to be carefully assessed to ensure production traffic is not impacted. Platform agnostic The solution which is ultimately implemented should not intrude on the existing environment. EG hardware or infrastructure replacements should not be required to facilitate a better DR strategy. This means that the solution should ideally be software based and hardware / virtualization platform agnostic. This is a common oversight when selecting a hardware based solution which locks in investment without allowing the flexibility to cater for future change. Failback and Failover capability Failover is as the term for the process of moving our workload to the DR site: this function is almost universally addressed by DR vendors. Most vendors, however, fail to address the process of Failback or restoring back to production. The solution adopted must cater for potentially a restore to new and dissimilar hardware in production, and just as importantly an on-line restoration capability so that users remain productive. It is also important to consider the impact to bandwidth during fail-back, so that only the changed data is sent where possible. The DR solution, we must have roll-back capability to allow for operational recovery.

7. Deployment Impact and Ongoing Change Management Far too many solutions offered on the market require some form of infrastructure change: You must buy a SAN or change your storage system, you must virtualize your entire network, you must reconfigure your servers and so on. The impact of this type of change increases risk and cost. The DR solution selected must not be intrusive on deployment or require reconfiguration of infrastructure. After all, we re trying to make our network more resilient, not to reconfigure our infrastructure as a whole and introduce new risk. Business must consider the scalability and flexibility of its DR solution in the long-term. Just as a business s operations are dynamic, IT Infrastructure changes continuously. If a business implements a DR solution that requires a lock-step approach for application or service pack updates in DR, it will increase its ongoing operational overhead, as well as increase the probability of the solution not functioning properly in the event a disaster occurs. DR technology should cater for the protection of system state or hardware independent workload, in order to allow for true recovery: based on the need at the time i.e. recovery of the original server, input of new hardware, or use of hardware made available on demand in the Cloud.. 8. DR and Virtualization One of the biggest miss-conceptions in the market is that virtualization is a solution for Disaster Recovery (DR) or High Availability (HA). In reality, this is not the case. Virtualization certainly simplifies disaster recovery as it removes dependency from hardware, however for replication; it relies on underlying hardware or third party software. High Availability technologies offered by virtualization should also be questioned, as they still rely on a single point of failure of the underlying storage hardware or the SAN. Ideally, we must implement a technology that removes hardware dependency without the need to implement new infrastructure. On the other hand, virtualization brings to the table many other advantages when it comes to recovery of workloads on demand. One of the biggest is consolidation. If we consider our resource requirements at DR as being significantly lower than production, the dynamic workload consolidation technology offered by virtualization allows us to consolidate protected workloads, and thus gain strong run-time economics. 9. Software not Hardware Optimized Workloads Unless we have a requirement for synchronous replication (i.e. write data to DR before writing to production), we do not need a SAN to deploy DR. SAN costs significantly more as a DR solution, and typically requires significantly more bandwidth to replicate data. There are distance limitations and hardware vendor dependencies. SAN replication is typically difficult to test, and can only replicate the data that is stored on it. Where servers are physical, the system state of the machine or applications stored locally on the server s hard disk are not catered for. Further, most SAN based solutions, require failover of entire LUN rather than just the one machine that has failed. This proves challenging and cumbersome. It is an all or nothing situation.

Software solutions offer significantly more flexibility, in monitoring, testing and granularity for Disaster Recovery. Software solutions also typically cost a-lot less to deploy and maintain. 10. Solution for purpose - CAPEX and OPEX Finally, when considering and evaluating our solution, we should consider and check-list the following key elements: CAPEX OPEX Deployment Expenses and Training Cloud DR / Multi-Subscribed Resources on Demand Service Software Licensing at DR Change of Infrastructure Bandwidth cost Requirement (Investment in New Hardware) and performance impact Ongoing Change Management Electricity / Air Conditioning cost Bandwidth Upgrade Costs (if any) Ongoing Engineering Maintenance of DR Solution including patch updates, monitoring and regular testing Platform independent recoverability Cost of Equipment at DR Cost of Run-Time licensing at DR Software and Hardware Maintenance Other key elements Hardware, Virtualization platform agnostic RPO / RTO capability must address needs accordingly to requirements Enterprise Monitoring and Reporting Capability Simple DR activation process (either one button, or automatic), remote accessibility for users (network considerations / terminal server access considerations/ Domain Controllers) Local Support, and Industry certification and recognition solution (must be proven and reliable)