Data center outages impact, causes, costs, and how to mitigate



Similar documents
plantemoran.com {Data center. } Design standards

Please visit for complete details.

Overcoming the Causes of Data Center Outages

TIA-942 Data Centre Standards Overview WHITE PAPER

Consulting Engineering Outsourcing

Matthias Machowinski, Directing Analyst for Enterprise Networks and Video, Infonetics Research, 20152

Finding Your Cloud: A down-to-earth guide to planning your cloud investment. Getting the full benefit of a key strategic technology investment

Presented by Edward P. Rafter, P.E., CxAP Tier IV Consulting Group

Improving Data Centers Energy Reliability & Efficiency. December 2015

Cloud Computing Continued. Jan Šedivý

A Quick Primer on Data Center Tier Classifications

Building a Tier 4 Data Center on a Tier 1 Budget

Datacenter Assessment

Sovereign. The made to measure data centre

National Survey on Data Center Outages

Combining Onsite and Cloud Backup

How To Understand The Data Center Bubble In India

GETTING THE MOST FROM THE CLOUD. A White Paper presented by

CLOUD COMPUTING SECURITY ISSUES

The Data Center Tier Performance Standards and Their Importance to the Owner s Project Requirements

COMMODITIZING THE DATACENTER. Exploring the Impacts of the Shift to Virtualization and Cloud Computing

How cloud computing can transform your business landscape

What are the benefits of Cloud Computing for Small Business?

Opex-based data centre services: Co-location, managed services and private cloud business support

MaximumOnTM. Bringing High Availability to a New Level. Introducing the Comm100 Live Chat Patent Pending MaximumOn TM Technology

Enabling an agile Data Centre in a (Fr)agile market

Brivo OnAir TOTAL COST OF OWNERSHIP (TCO) How Software-as-a-Service (SaaS) lowers the Total Cost of Ownership (TCO) for physical security systems.

Cloud Computing. Chapter 4 Infrastructure as a Service (IaaS)

High Availability with Postgres Plus Advanced Server. An EnterpriseDB White Paper

Data Center Services. Uncovering Colocation & Managed Hosting Opportunities

Best Practices in Business Recovery: Colocation or DRaaS? By Brien M. Posey

UK Government ICT Storyboard July 2010

TO AN EFFECTIVE BUSINESS CONTINUITY PLAN

Understanding the Promise of (DCIM)

ROI of IT DISASTER RECOVERY

penelope athena software SOFTWARE AS A SERVICE INFORMATION PACKAGE case management software

Pulsant Delivers Agile and Cost-Effective Hybrid Cloud Services with Cisco ACI

Jan Kremer s Data Center Design Consultancy

The Cloud. JL Cabrera LTEC 4550

Cloud Computing Disaster Recovery (DR)

Colocation, Hybrid Cloud & Infrastructure As A Service

Future- Building a. Business: The Ultimate Guide. Business to

Tier Standards. Eric Maddison CEng. CEnv. MBA MSc. MCIBSE MEI. Consultant EMEA

Introduction to IT Infrastructure Components and Their Operation. Balázs Kuti

Cloud Computing. Cloud computing:

CLOUD COMPUTING OVERVIEW

Datacenter Site Selection Checklist

Meeting Management Solution. Technology and Security Overview N. Dale Mabry Hwy Suite 115 Tampa, FL Ext 702

Quick guide: Using the Cloud to support your business

How cloud computing can transform your business landscape.

Cloud Computing for Libraries: A SWOT Analysis

Data Center Management. Didik Partono Rudiarto CDCP, ITIL-F, COBIT5, CITM

2013 Cost of Data Center Outages

Backup is Good, Recovery is KING

The Business Case for Colocation in a Cloud Obsessed World Andy Huxtable: Colocation Product Management

Colocation Hosting Primer Making the Business and IT Case for Colocation

BUILDING THE CARRIER GRADE NFV INFRASTRUCTURE Wind River Titanium Server

Security Benefits of Cloud Computing

Session 11 : (additional) Cloud Computing Advantages and Disadvantages

Expert Reference Series of White Papers. Cloud Computing: What It Is and What It Can Do for You

Whitepaper. The ABC of Private Clouds. A viable option or another cloud gimmick?

Nine Considerations When Choosing a Managed Hosting Provider

Cloud Computing Paradigm Shift. Jan Šedivý

Powering Converged Infrastructures

Uptime Institute Tier Classification System & Operational Sustainability

Cloud Computing for Small to Mid Size Businesses. Tech66, LLC William Burleson

CLOUD ERP AND ACCOUNTING: SELECTION AND PLANNING GUIDE

The Importance of Software License Server Monitoring White Paper

Is a Cloud ERP Solution Right for You?

Cloud Computing INTRODUCTION

Which Data Center? Key factors in choosing who to trust with your IT Infrastructure

11 Common Disaster Planning Mistakes

Whitepaper: Cloud Computing for Credit Unions

Data Center Colocation Build vs. Buy

Managing business risk

Cloud, Community and Collaboration Airline benefits of using the Amadeus community cloud

ITOPIA SERVICE LEVEL AGREEMENT

2014 DATA CENTER TRENDS

Transcription:

Data center outages impact, causes, costs, and how to mitigate Data centers sometimes fail. You can build in safeguards and fail safe mechanisms and redundancy through backup systems but like all engineered systems, data centers can -- and sometimes do -- fail. See Table 1 for some of the notable data center outages of 2011 and 2012 to see how even the biggest brands with access to the best technology and resources can suffer from data center outages.

02 TABLE 1 WHO HOW LONG WHAT HAPPENED IMPACT Huffington Post, Buzzfeed, Gawker and several others Few days Water flooded data centers in New York after Hurricane Sandy Several websites and other services down Twitter Few hours Both primary and backup systems failed A well publicized campaign to encourage athletes and visitors to the Olympics to tweet was affected Salesforce 7 hours Power failure in data center CRM services to customers affected Bank of America 6 days Online banking down across U.S. 29 million users affected Amazon Web Services 4 days Amazon EC2 (elastic compute cloud) services went down Amazon Web Services 4 days Amazon EC2 (elastic compute cloud) services went down Intuit 2-4 days Customers lost access to applications such as TurboTax Online, QuickBooks Online, Quicken and QuickBase. Several thousands Google 2 days Gmail affected 120,000 users affected Blackberry 24 hours plus Unavailable worldwide Millions of users affected Yahoo 24 hours plus Yahoo Mail outage Microsoft 24 72 hours Windows Live, Hotmail inboxes disappear Verizon 24 hours plus Series of data outages Several US states unable to get LTE service Netflix 4 8 hours Netflix streaming service affected 20 million users affected 2011 2012 Notable Data Center Outages in 2011 and 2012 Source: See Ref 1, Ref 2

03 So how can businesses ensure that disruptions due to data center glitches are minimized? First, some perspective.using an outsourced data center is,in almost all cases, a whole lot more reliable and cost-effective for a company thanbuilding one in-house. That s because a thirdparty data center is able to share the very high cost of the technology, infrastructure, and personnel that go into building the data center among multiple customers. In fact, the economies of scale are so compelling that while data centers are growing in size, they are declining in numbers (see Ref 3). Which just means that more companies are outsourcing more of their IT infrastructure to third-party data centers. Second, it helps to know what makes up a data center in order to better understand what is involved in keeping it robust. What is inside a data center? A data center is a configuration of server rooms, cooling units, storage, batteries, and generators. At the core of a data center are racks and racks of servers. Servers need power, lots of it -- a typical large data center occupies 50,000 square feet of space and consumes 5 MW of power. Bringing in so much power generates massive amounts of heat. This heat is carried away by cooling units that force cool air from the floor, through the racks, and into ducts above. Data centers collect and store vast amounts of data. This data needs to be stored safely, often for several years (as in the case of financial information). The hardware for storage is therefore stored in secure locations for example, in underground mines. Since data centers run on power and utility power can fail, every data center has batteries for backup thousands of them stacked up and constantly being charged. In the event of a power failure, these battery banks provide power. But batteries can provide power only for a few minutes at most. To provide power during longer power failures and blackouts, most data centers have banks of diesel generators on standby. And since these massive diesel generators need fuel, data centers need to store thousands of liters of diesel fuel. Causes and cost of data center outage Information on data centers is hard to come by. Because data centers are critical pieces of IT infrastructure and store sensitive customer data, data center managers are fiercely protective of their privacy. Probably the first and only major survey of data center outages and costs associated with these outages are two studies by the Michigan based Ponemon Institute sponsored by Emerson Network Power. Both studies are limited to U.S. data centers but can be considered representative of the industry.

04 Datacenter outages the Indian context Outage causes In the 2011Data Center Risk Index published by hurleypalmerflatt, an engineering consultancy, and Cushman & Wakefield, a real estate consultancy, India ranked at the bottom of the 20 countries ranked in descending order of risk associated with running a data center. The U.S., Canada, and Germany were at the top of the rankings. On the face of it, this is a dismal ranking for a country that is at the center of the global outsourcing revolution. On closer look though, things are not as bad as they seem. To begin with, the Data Center Risk Index is a weighted average of 11 macro and local factors covering a wide range of attributes from the cost of energy to political instability to inflation to availability of water. Depending on their priorities and approaches to risk, individual customers will arrive at significantly different assessments of risk. The first study, National Survey on Data Center Outages, published in September 2010, surveyed 453 individuals responsible for data center operations in the U.S. Of these, 95% said they had an unplanned data center outage in the last two years. Each respondent averaged 2.48 complete shutdowns with an average downtime of 107 minutes. This was best highlighted during the world s largest power blackout when an estimated 600 million people in the northern half of India lost power for two days in July 2012. In spite of the massive disruption across several areas of the economy from public transport to industry to hospitals, there were no reports of major disruptions in data centers anywhere in India (see Ref 4). One ostensible reason is that the bulk of the data centers are located in Mumbai and the south of India while the blackout was in the northern half of India. But the real reason was that India has a chronic power problem and data centers are geared to work through intermittent, low, and no power from public utilities. Most third-party data centers have power back up for days on end it s just another risk to be managed. Apart from complete shutdowns, respondents reported far more frequent partial rack- or rowbased outages an average of 6.8 row-based outages with an average downtime of 152 minutes, and an average of 11.2 rack-based outages with an average duration of 153 minutes in a two-year period. The most frequently cited root causes of data center outage were: UPS battery failure (65%), UPS capacity exceeded (53%), human error (51%), and UPS equipment failure (49%). The most common responses to unplanned outages were to repair, replace or purchase additional IT or infrastructure equipment, followed by contacting the equipment vendor for support. TABLE 2 Data Center Resilience Tier Levels Tier 1: Basic 99.671% availability Tier 2: Redundant Components 99.741% availability Susceptible to disruptions from both planned and unplanned activity Less susceptible to disruptions from both planned and unplanned activity Single path for power and cooling distribution, no redundant components (N) Single path for power and cooling distribution, includes redundant components (N+1) May or may not have a raised floor, UPS, or generator Includes raised floor, UPS, or generator Takes 3 months to implement Annual downtime of 28.8 hours Must be shut down completely to perform preventive maintenance Takes 3 to 6 months to implement Annual downtime of 22.0 hours Maintenance of power path and other parts of the infrastructure require a processing shutdown Tier 3: Concurrently Maintainable 99.982% availability Tier 4: Fault Tolerant 99.995% availability Enables planned activity without disrupting computer hardware operation, but unplanned events will still cause disruption Planned activity does not disrupt critical load and data center can sustain at least one worst-case unplanned event with no critical load impact Multiple power and cooling distribution paths, but with only one active path, includes redundant components (N+1) Multiple active power and cooling distribution paths, includes redundant components (2 (N+1), i.e., 2 UPS each with (N+1) redundancy) Includes raised floor and sufficient capacity and distribution to carry load on one path while performing maintenance on the other Takes 15 to 20 months to implement Annual downtime of 0.4 hours Takes 15 to 20 months to implement Annual downtime of 1.6 hours

05 Outage costs The second Ponemon Institute study, Calculating the Cost of Data Center Outages, published in February 2011, surveyed 41 independent data centers in the U.S. that experienced at least one complete or partial unplanned shutdown in the previous 12 months. The survey revealed that data center outages have significant financial consequences ranging from a minimum cost of $38,969 to a maximum of $1,017,746 per organization. The average cost of a data center outage was $505,502 per incident. ($ = 55 INR). How to evaluate data center reliability Historically, data centers have been designed in the absence of established standards. This made it very difficult for network managers to choose technologies to build and benchmark data centers. In 2005, the Telecommunications Industry Association (TIA) published TIA-942, the first standards to specifically address data center infrastructure. The TIA-942 standards cover site space and layout, cabling infrastructure, tiered reliability, and environmental considerations. Of these, the tiered reliability standards are directly useful to organizations looking to evaluate data center resilience across vendors. The TIA standards, based on a system pioneered by the New York-based Uptime Institute in the mid-nineties, prescribe architectural, security, electrical, mechanical, and telecommunications recommendations. There are four tiers of availability from Tiers 1 to 4, with Tier 4 being the most resilient. See Table 2 for a description of the tiers redundancy is indicated in terms of N where N represents only the necessary system need. Going up the levels has a significant cost impact -- construction costs for Tier 3, for instance, are double that for Tier 1.So organizations need to carefully determine an appropriate tier level for their different needs. ebay for example, started out with all their applications in a Tier 4 data center till they analyzed their needs more closely and determined that 80% of their equipment could be shifted out without loss of reliability search, for instance, could be in a Tier 2 center whereas databases and network backbones needed to be in a Tier 4 center. ebay says they cut their data center Capex and Opex by half by matching applications to data center tier level (see Ref 5). How to mitigate data center outages Experts recommend the following to minimize data center outages and mitigate damage: Invest in better equipment. It s tempting to save money by buying cheap but the cost of hardware failure is very high. Provide redundancy -- relying on any single machine or a single component in the core architecture is disastrous. When it comes to crucial data, never assume that someone else is automatically protecting you. Have backups. Have your data available on multiple servers in multiple data centers. Even consider having them in different geographical regions and spread between different service providers.

06 Conclusion Data center outages are real and they can cause significant loss of revenue. The frequency and duration of data center outages varies by the size of the data center. Outages become less frequent and shorter in duration as data centers increase in size. The smaller the data center the longer and more common the outages. IT equipment failure is the most expensive root cause and human error is the least expensive.but the benefits of outsourcing IT infrastructure to a third-party data center far outweigh the risks. As with all engineered systems, the risk is quantifiable and manageable. References: Major data center outages in 2011: http://www.evolven.com/blog/2011-devastating-outages-majorbrands.html Salesforce outage: http://www.informationweek.com/cloud-computing/software/salesforce-outage-followsdata-center-po/240003577 U.S. Datacenters Growing in Size But Declining in Numbers, IDC press release, 9 Oct 2012 India s Blackout, DataCenter Dynamics, Penny Jones, 31 July 2012, http://www.datacenterdynamics.com/blogs/penny-jones/india%e2%80%99s-blackout Matching applications to data center tier level: http://blog.uptimeinstitute.com/2011/07/matchingapplications-to-data-center-tier-level/ www.netmagicsolutions.com 1800 103 3130 http://blog.netmagicsolutions.com http://twitter.com/netmagic http://linkedin.com/company/netmagic The content you have downloaded has been produced with thoughtful, original research efforts by Netmagic. Please do not duplicate or misuse it. You may quote portions of our research in your own material provided you include a proper attribution to this original source. You are free to share this content on the web with friends and colleagues. 2013. All rights reserved.