1 Disaster Recovery as a Cloud Service: Economic Benefits and Deployment Challenges Tim Wood, Emmanuel Cecchet, KK Ramakrishnan*, Prashant Shenoy, Kobus van der Merwe*, and Arun Venkataramani UMass Amherst and AT&T*
2 Gulf Oil Spill Disasters happen Disasters are expensive
3 Data Center Disasters Disasters cause expensive application downtime Truck crash shuts down Amazon EC2 site (May 2010) Lightning strikes EC2 data center (May 2009) Comcast Down: Hunter shoots cable (2008) Squirrels bring down NASDAQ exchange (1987 and 1994) Need plans and systems in place to recover from disasters
4 Disaster Recovery Use DR services to prevent lengthy service disruptions Long distance data backups + failover mechanism Periodically replicate state Switch to backup site after disaster Private Backup Site Enterprise DC send backups Can the cloud reduce the cost of DR and improve the level of service? Public DR Cloud
5 DR Metrics DR Goal: minimize data loss, downtime, and cost Recovery Point Objective (RPO) Amount of tolerable data loss Recovery Time Objective (RTO) Acceptable system downtime Time Detect Provision Restore Connect RPO RTO We focus on RPO and RTO > 0
6 Why DR Fits in the Cloud Customer: pay-as-you-go and elasticity Normal case is cheap (need few resources to make backups) Lower cost for a given RPO Can rapidly scale up resources after disaster is detected Cloud s virtualized infrastructure reduces RTO Can allow for business continuity Provider: High degree of multiplexing Customers will not all fail at once Can offer extra services like disaster detection Is the cloud an economical platform for DR today? What additional features are needed?
7 Warm Backup Site DR on Demand Cheaply synchronize state during normal operation Obtain additional DR resources on demand after failure Short delay to provision and initialize applications App 1 App 2 Normal Mode Post-Disaster Enterprise DC App 100 Backup State DR Server DR Cloud App 1 App 100
8 Cost Analysis Scenario Compare the cost of DR in Colocation center to Cloud Colo case pays for servers and space at all times Cloud DR only pays for resources as they are used Case 1: RUBiS ebay-like multi-tier web application 3 web front ends 1 database server Only database state is replicated Web servers 3x Database Enterprise DC DR Server Web servers 3x Colocation Center Database Database DR Cloud Web servers 3x
9 Cost Analysis: Colocation vs Cloud Normal Case Resources needed to replicate DB state Servers Normal Case colo = 4 servers Post-Disaster colo = 4 servers cloud = 1 VMs cloud = 5 VMs Post-Disaster Resources needed to run all application components Network 5 GB/day 180 GB/day Colocation: $28.04/day $66.01/day Cloud: $3.80/day $52.03/day 99% Uptime cost (3 days of disaster per year) Colo: $10,373 per year Cloud: $1,562 per year Web servers 3x Database RUBiS
10 RPO vs Cost Tradeoff Case 2: Data Warehouse Post-disaster twice as expensive with Cloud Cloud charges premium for high powered VM instance Cloud still cheaper overall due to lower normal case costs Cloud allows tradeoff between RPO and cost Only pay for DR server during periodic backups in cloud Yearly 99% Uptime Cost Cloud Colo 24hr 12hr 4hr 2hr RPO ~0 Continuous Replication Colo center pays server and space costs regardless of RPO!
11 Cost Analysis Summary Benefits of cloud computing depend on: Type of resources required to run application Variation between normal mode and post-disaster costs RPO and RTO requirements Likelihood of disaster Cloud has greatest benefit when post disaster cost much higher than normal mode
12 Provider Challenges Revenue Maximization Mainly makes income from storage in normal case But must pay for servers and keep them available Can use pricing mechanism such as spot instances Rent resources but be able to quickly reclaim for DR Rent priority resources at higher cost that are guaranteed to be available Correlated Failures Large disasters could affect many customers simultaneously Cloud provider must Use a risk model to decide how many resources to own for DR Spread out customers to minimize impact of correlated failures
13 Planning More DR Challenges Use models to help understand tradeoff between cost and RPO/RTO for a given application and workload Efficient state replication Minimize the bandwidth and cloud server costs in the normal case Post Disaster Failover Enable business continuity by minimizing recovery time Automated/virtualized cloud infrastructure can lower RTO
14 Summary Cloud based Disaster Recovery Can substantially reduce cost for customer Particularly when server cost varies before/after disaster Provides flexible tradeoff between cost and RPO Can lower recovery time, enable business continuity Provider must handle correlated failures Open challenges How many resources must provider reserve for DR? How to seamlessly transfer network connections? How to fail back to primary site after disaster passes?
15 Cost Details
16 Enabling Business Continuity Business continuity allows applications to keep working after a disaster Crucial for critical business/government services Virtualized cloud infrastructure can lower RTO Automates VM creation and cloning Cloud can also help with disaster detection Many remaining challenges How to ensure application is revived in a consistent/correct state? How to redirect traffic to failover site?
17 DR Requirements Recovery Point Objective (RPO) Amount of tolerable data loss Recovery Time Objective (RTO) Acceptable system downtime Performance Impact on normal operation and after recovery Consistency Correctness of application data and outputs Geographic Separation DR site should not be affected by same disaster
18 What is the cloud good for? Cloud platforms are best for users who have variable needs over time Customers only pay for what they use Providers get economy of scale and can multiplex resources for many customers Applications well matched for the cloud: Web sites with growing or variable demand Infrequent compute intensive jobs (monthly payroll) and... Disaster recovery!
How AWS Pricing Works May 2015 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 15 Table of Contents Table of Contents... 2 Abstract... 3 Introduction...
Microsoft System Center 2012 R2 Why Microsoft? For Virtualizing & Managing SharePoint July 2014 v1.0 2014 Microsoft Corporation. All rights reserved. This document is provided as-is. Information and views
Microsoft Corporation and HP Using Network Attached Storage for Reliable Backup and Recovery Microsoft Corporation Published: March 2010 Abstract Tape-based backup and restore technology has for decades
white paper Public or Private Cloud: The Choice is Yours Current Cloudy Situation Facing Businesses There is no debate that most businesses are adopting cloud services at a rapid pace. In fact, a recent
Putting the cloud to work for your organization. A buyers guide to cloud solutions. What s in this guide for you? If you re thinking about bringing the cloud into your business but aren t sure where to
Five Hosted VoIP Features WHITEPAPER: hosted exchange BUYER S GUIDE www.megapath.com executive summary The adoption of cloud-based hosted services is gaining momentum among businesses interested in reducing
WHITE PAPER: CA ARCserve Backup Network Data Management Protocol (NDMP) Network Attached Storage (NAS) Option: Integrated Protection for Heterogeneous NAS Environments CA ARCserve Backup: Protecting heterogeneous
NDMP Backup of Dell EqualLogic FS Series NAS using CommVault Simpana A Dell EqualLogic Reference Architecture Dell Storage Engineering June 2013 Revisions Date January 2013 June 2013 Description Initial
A new Breed of Managed Hosting for the Cloud Computing Age A Neovise Vendor White Paper, Prepared for SoftLayer Executive Summary Traditional managed hosting providers often suffer from issues that cause
NetVault, NDMP and Network Attached Storage Simplicity and power for NAS Written by Adrian Moir, Dell Scott Hetrick, Dell Abstract This technical brief explains how Network Data Management Protocol (NDMP)
White Paper Dedupe 2.0: What HP Has In Store(Once) By Jason Buffington, Senior Analyst June 2012 This ESG White Paper was commissioned by HP and is distributed under license from ESG. White Paper: Dedupe
A Decision-Maker s Guide to Cloud Computing and Managed Hosting A Rackspace White Paper Autumn 2009 Summary Organisations have never had so much choice about how to host their applications. From Dedicated
Migration Planning Kit Microsoft Windows Server 2003 This educational kit is intended for IT administrators, architects, and IT managers. The kit covers the reasons and process you should consider when
WHITE PAPER Addressing Virtualization and High-Availability Needs with Sun Solaris Cluster Sponsored by: Sun Microsystems Jean S. Bozman October 2009 EXECUTIVE SUMMARY Global Headquarters: 5 Speen Street
WHITEPAPER Microsoft SQL Server Databases Thrive in the Cloud Virtualizing Data-Intensive Applications for Page 2 Overview As more and more organizations embrace cloud computing to save money, increase
The Incremental Advantage: MIGRATE TRADITIONAL APPLICATIONS FROM YOUR ON-PREMISES VMWARE ENVIRONMENT TO THE HYBRID CLOUD IN FIVE STEPS CONTENTS Introduction..................... 2 Five Steps to the Hybrid
Data Protection for Isilon Scale-Out NAS A Data Protection Best Practices Guide for Isilon IQ and OneFS By David Thomas, Solutions Architect An Isilon Systems Best Practices Guide May 2009 ISILON SYSTEMS
HOW SAAS CHANGES AN ISV S BUSINESS A GUIDE FOR ISV LEADERS Sponsored by Microsoft Corporation Copyright 2012 Chappell & Associates Contents Understanding the Move to SaaS... 3 Assessing SaaS...3 Benefits
White Paper EMC FOR NETWORK ATTACHED STORAGE (NAS) BACKUP AND RECOVERY Abstract This white paper provides an overview of EMC s industry leading backup and recovery solutions for NAS systems. It also explains
Develop an intelligent disaster recovery solution with cloud technologies IBM experts share their insight on how cloud technologies can help restore IT operations more quickly, reliably and cost-effectively
VoIP Solutions Guide Everything You Need to Know Simplify, Save, Scale VoIP: The Next Generation Phone Service Ready to Adopt VoIP? 10 Things You Need to Know 1. What are my phone system options? Simplify,
Firewall Strategies June 2003 (Updated May 2009) 1 Table of Content Executive Summary...4 Brief survey of firewall concepts...4 What is the problem?...4 What is a firewall?...4 What skills are necessary
Plug Into The Cloud with Oracle Database 12c ORACLE WHITE PAPER DECEMBER 2014 Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only,
No One (Cluster) Size Fits All: Automatic Cluster Sizing for Data-intensive Analytics Herodotos Herodotou Duke University firstname.lastname@example.org Fei Dong Duke University email@example.com Shivnath Babu Duke