Leveraging Public Clouds to Ensure Data Availability



Similar documents
How AWS Pricing Works

How AWS Pricing Works May 2015

Migration Scenario: Migrating Backend Processing Pipeline to the AWS Cloud

OTM in the Cloud. Ryan Haney

Storage and Disaster Recovery

Deployment Options for Microsoft Hyper-V Server

Web Application Hosting in the AWS Cloud Best Practices

Building your Big Data Architecture on Amazon Web Services

Drobo How-To Guide. Cloud Storage Using Amazon Storage Gateway with Drobo iscsi SAN

Backup and Recovery of SAP Systems on Windows / SQL Server

A Unique Approach to SQL Server Consolidation and High Availability with Sanbolic AppCluster

Amazon EC2 Product Details Page 1 of 5

Amazon Cloud Storage Options

DLT Solutions and Amazon Web Services

The Total Cost of (Non) Ownership of a NoSQL Database Cloud Service

CUMULUX WHICH CLOUD PLATFORM IS RIGHT FOR YOU? COMPARING CLOUD PLATFORMS. Review Business and Technology Series

Cloud Models and Platforms

Preparing Your IT for the Holidays. A quick start guide to take your e-commerce to the Cloud

An Introduction to Cloud Computing Concepts

Big Data on AWS. Services Overview. Bernie Nallamotu Principle Solutions Architect

Data Backup and Restore (DBR) Overview Detailed Description Pricing... 5 SLAs... 5 Service Matrix Service Description

Amazon Relational Database Service (RDS)

BIG DATA TRENDS AND TECHNOLOGIES

Ultra-Scalable Storage Provides Low Cost Virtualization Solutions

Designing a Data Solution with Microsoft SQL Server 2014

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

Intro to AWS: Storage Services

Web Application Hosting in the AWS Cloud Best Practices

A Survey on Cloud Storage Systems

TABLE OF CONTENTS THE SHAREPOINT MVP GUIDE TO ACHIEVING HIGH AVAILABILITY FOR SHAREPOINT DATA. Introduction. Examining Third-Party Replication Models

CA ARCserve Replication and High Availability Deployment Options for Hyper-V

Barracuda Backup Server. Introduction

GeoCloud Project Report USGS/EROS Spatial Data Warehouse Project

Leveraging Cloud Storage with SharePoint and StoragePoint

Web Application Deployment in the Cloud Using Amazon Web Services From Infancy to Maturity

Leveraging Public Cloud for Affordable VMware Disaster Recovery & Business Continuity

Part V Applications. What is cloud computing? SaaS has been around for awhile. Cloud Computing: General concepts

The Microsoft Large Mailbox Vision

Cloud Platforms in the Enterprise

ZADARA STORAGE. Managed, hybrid storage EXECUTIVE SUMMARY. Research Brief

The case for cloud-based disaster recovery

Scalable Architecture on Amazon AWS Cloud

Database as a Service (DaaS) Version 1.02

Introduction to Cloud : Cloud and Cloud Storage. Lecture 2. Dr. Dalit Naor IBM Haifa Research Storage Systems. Dalit Naor, IBM Haifa Research

How To Use An Npm On A Network Device

Affordable, Scalable, Reliable OLTP in a Cloud and Big Data World: IBM DB2 purescale

CREATING SQL SERVER DISASTER RECOVERY SOLUTIONS WITH SIOS DATAKEEPER

High Availability Databases based on Oracle 10g RAC on Linux

ArcGIS for Server in the Amazon Cloud. Michele Lundeen Esri

HP StorageWorks Data Protection Strategy brief

StorReduce Technical White Paper Cloud-based Data Deduplication

Combining Onsite and Cloud Backup

Postgres Plus Cloud Database!

The Cloud Hosting Revolution: Learn How to Cut Costs and Eliminate Downtime with GlowHost's Cloud Hosting Services

Best Practices for Using MySQL in the Cloud

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

COST-BENEFIT ANALYSIS: HIGH AVAILABILITY IN THE CLOUD AVI FREEDMAN, TECHNICAL ADVISOR. a white paper by

Course 20465C: Designing a Data Solution with Microsoft SQL Server

Storage Options in the AWS Cloud: Use Cases

How To Back Up A Virtual Machine

Demystifying the Cloud Computing

SQL Server Virtualization

Estimating the Cost of a GIS in the Amazon Cloud. An Esri White Paper August 2012

An Esri White Paper January 2011 Estimating the Cost of a GIS in the Amazon Cloud

Protecting Microsoft SQL Server

Using Cloud Services for Test Environments A case study of the use of Amazon EC2

How To Choose Cloud Computing

How To Protect Your Data From Harm

Cloud Computing for SCADA

Increased Security, Greater Agility, Lower Costs for AWS DELPHIX FOR AMAZON WEB SERVICES WHITE PAPER

Real Time Big Data Processing

CLOUD COMPUTING FOR THE ENTERPRISE AND GLOBAL COMPANIES Steve Midgley Head of AWS EMEA

Migration and Disaster Recovery Underground in the NEC / Iron Mountain National Data Center with the RackWare Management Module

Eliminate SQL Server Downtime Even for maintenance

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Building Storage-as-a-Service Businesses

Comprehensive Agentless Cloud Backup and Recovery Software for the Enterprise

Backing up to the Cloud

Software Performance, Scalability, and Availability Specifications V 3.0

Leveraging the Cloud for Data Protection and Disaster Recovery

Cloud Computing. Chapter 4 Infrastructure as a Service (IaaS)

Big Data Technologies Compared June 2014

Navigating Among the Clouds. Evaluating Public, Private and Hybrid Cloud Computing Approaches

Cloud Courses Description

AVLOR SERVER CLOUD RECOVERY

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

With Eversync s cloud data tiering, the customer can tier data protection as follows:

A Study on Cloud Computing Disaster Recovery

Cloud Service Model. Selecting a cloud service model. Different cloud service models within the enterprise

Cloud Computing and Amazon Web Services

GigaSpaces Real-Time Analytics for Big Data

How To Choose Between A Relational Database Service From Aws.Com

Solution Brief: Creating Avid Project Archives

Windows Server 2003 Migration Guide: Nutanix Webscale Converged Infrastructure Eases Migration

How To Backup To Cloud

Transcription:

Systems Engineering at MITRE CLOUD COMPUTING SERIES Leveraging Public Clouds to Ensure Data Availability Toby Cabot Lawrence Pizette

The MITRE Corporation manages federally funded research and development centers (FFRDCs), partnering with government sponsors to support their crucial operational missions. FFRDCs work in the public interest and operate as strategic partners with their sponsoring government agencies to ensure the highest levels of objectivity and technical excellence. January 2013

Table of Contents 1.0 Problem 1 2.0 Analysis 1 3.0 Cloud Computing Considerations 2 4.0 Prototype 3 5.0 Conclusion 4 Acronyms 5 References 6

THE BIG PICTURE: Replicated databases in cloud environments are a cost-effective alternative to explore for ensuring the availability of data. Leveraging Public Clouds to Ensure Data Availability Toby Cabot Lawrence Pizette 1.0 Problem In the event of a disaster such as a hurricane, earthquake, or attack by an adversary, a data center hosting large amounts of U.S. Government data could become unavailable within seconds. This fragility requires that the U.S. Government establish strategies for safeguarding and restoring access to data commensurate with operational needs. As it is not possible to move many gigabytes (GB), terabytes (TB), or petabytes (PB) of information across a wide area network in the initial seconds of a disaster, data must be pre-positioned at alternate locations. 2.0 Analysis System architects can consider several options for backing up data. First, replication can be employed by pre-positioning geographically separate but identical databases. As transactions are committed to the primary database, they are copied to the backup database. Second, very large data sets can be copied onto physical media, shipped to backup locations, and imported into standby databases. This process has the drawback of not being in real time, but captures a snapshot of when the media is copied. It can be faster than using a network to transfer very large, multi-tb datasets. 1 Third, databases can be copied to backup media and stored off site. When needed, the off-site storage can be used to recreate databases in a geographically separate environment. This process can be less costly because the data is simply stored offline unless needed. However, it has the drawback White House [former] chief information officer Vivek Kundra announced [in February 2010] an initiative to consolidate hundreds of redundant federal government databases, writes Byron Acohido. He adds, Kundra also called for stepping up the federal government s reliance on cloud-based systems to deliver public services. These databases are widespread and permeate Government Information Technology. Forrester Research writes that databases are a critical asset to any enterprise. It estimates the database market to reach $32 billion per year by 2013. of being a snapshot in time and not quickly available for mission-critical systems. If a conventional, dedicated data center approach is used, all these options may be problematic. The first two options require continual operation of two data centers with support staff, duplicate infrastructure, and software licensing costs. The first option also requires continual network connectivity. The third option requires the availability of a receiving data center with adequate spare capacity whenever it may be needed. Many public cloud providers offer redundancy solutions to address these challenges. For example, commercial companies such as Microsoft, Rackspace, and Amazon Web Services (AWS) offer networkbased replication of their public cloud database Leveraging Public Clouds to Ensure Data Availability 1

products. 2,3,4 And as an example of shipment-based back-up, Amazon provides a service, AWS Import/ Export, that allows flash drives and hard drives to be shipped to Amazon for importing to their Simple Storage Service (S3). 5 This paper addresses potential uses of cloud computing approaches to facilitate data access in the event of a failure in a primary location. We conclude that replicated databases in cloud environments are a cost-effective alternative for ensuring the availability of data. There is nothing magic about cloud computing. It can provide a significant reduction in infrastructure cost by sharing costs among multiple users and leveraging the economies of scale of very large operations. This can reduce the cost of many operations, including database replication, whether populated with imported data (e.g., by shipping disk drives) or on-demand or real-time backups. This paper will focus on a public cloud provider example to illustrate the costs for cloud-based database replication. As the government develops alternatives that provide analogous services and realize a degree of similar cost savings, these will offer more private means to the same end. Every situation requires it s own analysis based on the consideration in Section 3. 3.0 Cloud Computing Considerations We recommend the following analyses: Systems Engineering and Technical Analysis: Perform an analysis to determine whether cloud deployment is appropriate. Cloud computing has significant benefits that can be leveraged by many projects, but policy, security or technical obstacles can make cloud computing (shared or public) difficult or suboptimal for some projects. If cloud represents a viable solution, ensure that the provider s storage capacity limitations are greater than the planned database size. Run a pilot, if necessary. Analyze and Perform a Return on Investment Calculation: Analyze the system to determine how tolerant it can be to unavailability or loss of data, and whether the system could benefit from a cloud computing approach. Some systems can tolerate outages of a day or two while a new hosting The typical government Web-based system is a three-tier architecture: presentation, business logic, and database. It often uses a browser and Web server to provide a graphical user interface. When a user enters data or clicks on the browser, a request is sent to the Web server and, for dynamic content, to the business logic tier. The business logic tier relies upon the database to read and write persistent data. One browser action from the user can result in many reads and writes to the database. For example, an inventory system may receive a request to provide a product to a user. The business logic tier may need to validate if the inventory is available, verify the shipping address, and decrement the requested amount from inventory. This persistent data is maintained in a relational database management system that may be a candidate for the cloud. capability is provisioned, some can tolerate hours of downtime, while other systems cannot be offline for more than a few seconds without severe consequence. Another consideration is tolerance to lost transactions before and during failover. Some systems can tolerate the loss of some transactions that were processed shortly before the failover, while others cannot. The implications of being off-line will determine the operational benefit of investing in different cloud strategies. The total costs should also be analyzed; these include software license and support agreements that can be affected by cloud deployments. Replicate Database: If warranted by the return on investment, systems engineering, and technical analysis, pre-position a replicated copy of the data in a secondary cloud environment. For example, most enterprise-scale relational database management systems (RDBMSs) offer the capability to copy data from a primary database to an always-running secondary database (i.e., hot standby). The primary and secondary databases can be hosted at geographically separate locations, and can be configured as cloud-to-cloud replication or replication between a private datacenter and cloud service. 2 Cloud Computing

4.0 Prototype Figure 1. Prototype High-Level View To demonstrate the capabilities and determine the cost effectiveness of cloud computing for backup, an exemplar MySQL RDBMS database hosted in AWS Elastic Compute Cloud (EC2) environment was prototyped and replicated to Rackspace. We performed one test at small scale and one at medium scale. The medium scale test first wrote 1 billion database records 500 million names and 500 million address records for a sample size of 37.5 GB of raw data. Subsequently, the test updated 50 million records of each type, then deleted 50 million records of each type. Update and delete activity drives additional communication between databases. Vendor monitoring tools were used to measure the amount of traffic in and out of the environments, and that data helped calculate costs for the cloud environments. The prototype took approximately 60 hours to run at medium scale. The outbound network traffic from AWS to Rackspace was 210.6 GB, and the inbound traffic from Rackspace to AWS (e.g., for polling of replicated transactions) was 5.6 GB. The small-scale test was similar to the mediumscale test, but at an order of magnitude smaller scale: 100 million records were written and 10 million database records were modified and deleted. The gathered prototype data helped extrapolate the cost of large databases. Thus, a large database will be one order of magnitude larger than the medium scale prototype (i.e., 375 GB of raw data). Because TB-scale and PB-scale databases can have additional processing needs and may not be appropriate for a single-instance type of environment, the analysis was not scaled to that degree, which would require a parallel processing, horizontally scalable, database management system (e.g., a sharded RDBMS, RDBMS cluster, not only SQL [NoSQL] database, or map/reduce cluster). This prototype addressed the database only. There was no attempt to model failover at application tiers above the database, although this can be a significant systems engineering problem. While each organization must perform its own cost of ownership analysis based upon its specific processing needs and costs, the hosting and input/ output (I/O) costs of running a database in the cloud can be low. Assuming the equivalent volume of data from the prototype spans one month, the monthly costs are shown in Table 1. While the services leveraged are well-known cloud offerings, there are other services from these and other providers that could increase or decrease the costs. Leveraging Public Clouds to Ensure Data Availability 3

Small Database Medium Database Large Database Cost element 7 (prototype) (prototype) (calculated) Raw data size 3.75 GB 37.5 GB 375 GB AWS Elastic Block Store 10 GB @ $0.10/GB month (assume that 10 GB is allocated for each 3.75 GB of data, due to overhead of indexes, etc.) and $0.10 per million I/O requests 8 $6.27 ($1.00 for 10 GB of EBS plus $0.67 for 6.67 million I/O requests) $62.70 ($10.00 for 100 GB of EBS plus $6.90 for 69 million I/O requests) $627.00 ($100.00 for 1,000 GB of EBS plus $69.00 for 690 million I/O requests) Large EC2 On-Demand Linux instance running 24 hrs./day for 30 days @ $0.32/hr. (running in U.S. East $230.40 $230.40 N/A Region) 9 Extra-Large EC2 On-Demand Linux instance running 24 hrs./day for 30 days @ $0.64/hr. (running in U.S. East N/A N/A $460.80 Region) 10 Rackspace Linux instance running 24 hrs./day for 30 days @ $0.24/hour (4,096 MB RAM, 160 GB Disk) 11 $172.80 $172.80 N/A Rackspace Linux instance running 24 hrs./day for 30 days @ $0.96/hour (15,872 MB RAM, 620 GB Disk) 12 N/A N/A Data transfer out of EC2 (1st GB free, then $0.12 per GB up to 10 TB) Data transfer out of Rackspace ($0.18/GB) 21.5 GB costs $2.46 0.6 GB costs $0.11 21.5 GB costs $2.46 5.6 GB costs $1.00 2,106 GB costs $252.60 140 GB costs $25.20 Total $412.71 $498.95 $2,125.80 Table 1. Cost Analysis 5.0 Conclusion Given the criticality of many government services, adequate attention must be paid to disaster recovery considerations. This becomes even more crucial as database consolidation potentially increases the impact of the loss of single data center. Replicated databases in cloud environments are a cost-effective alternative to explore for ensuring the availability of data. Given that the capabilities of some commercial Infrastructure as a Service offerings are designed to meet many U.S. Government needs for Federal Information Security Management Act moderate systems, Federal Information Technology leadership has more options for providing cost-effective availability to its customers. 4 Cloud Computing

Acronyms Acronym Definition AWS COTS DB EBS EC2 GB I/O NoSQL PB RDBMS ROI TB Amazon Web Services Commercial Off-the-Shelf Database Elastic Block Store Elastic Compute Cloud Gigabyte Input/Output Not only SQL Petabyte Relational Database Management Systems Return on Investment Terabyte Leveraging Public Clouds to Ensure Data Availability 5

References 1 A 1 TB database would take over 1,000 hours to move a network at the T1 connection speed of 1.544 million bits per second and almost 1 hour at the much faster fiber optic OC48 rate of 2,488 million bits per second. 2 Amazon. (2012, July 26). Amazon FAQs, Amazon Web Services. http://aws.amazon.com/rds/faqs/#86. 3 Windows Azure. (2012, July 26). Windows Azure SQL Database Overview, Windows Azure. http://msdn.microsoft.com/en-us/library/windowsazure/ee336241.aspx. 4 Google. (2012, July 26). Structuring Data for Strong Consistency, Google Developers. http://code.google.com/appengine/docs/python/datastore/hr/overview.html. 5 Amazon. (2012, July 26). AWS Import/Export, Amazon Web Services. http://aws.amazon.com/importexport/. 6 The amount of disk storage required was significantly larger than 37.5 GB, due to database storage efficiency, log files, indexes, etc. 7 Prices as of April 26, 2012 8 Amazon. (2012, July 26). Amazon EC2 Pricing, Amazon Web Services. http://aws.amazon.com/ec2/pricing/. 9 Ibid. 10 Ibid. 11 Rackspace. (2012, July 26). How We Price Cloud Servers, Rackspace Cloud Servers. http://www.rackspace.com/cloud/cloud_hosting_products/servers/pricing/. 12 Ibid. 6 Cloud Computing

MITRE www.mitre.org 2013 The MITRE Corporation All Rights Reserved Approved for Public Release Distribution Unlimited Case Number: 12-0230 Document Number: MTR120400

MITRE www.mitre.org