Why RAID is Dead for Big Data Storage. The business case for why IT Executives are making a strategic shift from RAID to Information Dispersal



Similar documents
Solving Data Loss in Massive Storage Systems Jason Resch Cleversafe

Parallels Cloud Storage

How To Make A Backup System More Efficient

Silent data corruption in SATA arrays: A solution

PARALLELS CLOUD STORAGE

HDD & RAID. White Paper. English

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem

Compare Cost and Performance of Replication and Erasure Coding

Replication and Erasure Coding Explained

RAID 6 with HP Advanced Data Guarding technology:

Business-centric Storage FUJITSU Hyperscale Storage System ETERNUS CD10000

PIONEER RESEARCH & DEVELOPMENT GROUP

WHITE PAPER. QUANTUM LATTUS: Next-Generation Object Storage for Big Data Archives

NetApp RAID-DP : Dual-Parity RAID 6 Protection Without Compromise

Maginatics Cloud Storage Platform for Elastic NAS Workloads

The idea behind RAID is to have a number of disks co-operate in such a way that it looks like one big disk.

SOLUTION BRIEF KEY CONSIDERATIONS FOR LONG-TERM, BULK STORAGE

Planning for and Surviving a Data Disaster

Reducing Storage TCO With Private Cloud Storage

Protecting Information in a Smarter Data Center with the Performance of Flash

High Availability and Disaster Recovery for Exchange Servers Through a Mailbox Replication Approach

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

Understanding Object Storage and How to Use It

The Microsoft Large Mailbox Vision

IBM Global Technology Services September NAS systems scale out to meet growing storage demand.

WHITE PAPER. Reinventing Large-Scale Digital Libraries With Object Storage Technology

Best Practices RAID Implementations for Snap Servers and JBOD Expansion

MagFS: The Ideal File System for the Cloud

Total Cost of Ownership Analysis

Fault Tolerant Servers: The Choice for Continuous Availability on Microsoft Windows Server Platform

AD INSERTION STORAGE REQUIREMENTS AND CACHING WHITE PAPER

Unitrends Recovery-Series: Addressing Enterprise-Class Data Protection

technology brief RAID Levels March 1997 Introduction Characteristics of RAID Levels

WHITE PAPER. How Deduplication Benefits Companies of All Sizes An Acronis White Paper

Clustering Windows File Servers for Enterprise Scale and High Availability

DELL RAID PRIMER DELL PERC RAID CONTROLLERS. Joe H. Trickey III. Dell Storage RAID Product Marketing. John Seward. Dell Storage RAID Engineering

Reliability and Fault Tolerance in Storage

Moving Beyond RAID DXi and Dynamic Disk Pools

How To Write A Disk Array

RAID Technology Overview

Managed Backup A Strategic Network Consulting white paper

Guide for SMBs. Seagate NAS: Finding Your Best Storage Fit A Network Attached Storage Decision Guide for Your Small-to-Midsize Business

How to choose the right RAID for your Dedicated Server

A Cloud Storage Solution. Digital Record Center for Medical Images

Matching High Availability Technology with Business Needs

Technology Update White Paper. High Speed RAID 6. Powered by Custom ASIC Parity Chips

Storage Architecture in XProtect Corporate

PARALLELS CLOUD STORAGE

How To Create A Multi Disk Raid

WHITE PAPER. Storage Savings Analysis: Storage Savings with Deduplication and Acronis Backup & Recovery 10

Impact Study Commissioned Byy Cleversafe. Project Director: Dean Davison. January 2015

Fault Tolerant Servers: The Choice for Continuous Availability

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Why StrongBox Beats Disk for Long-Term Archiving. Here s how to build an accessible, protected long-term storage strategy for $.003 per GB/month.

Price/performance Modern Memory Hierarchy

Tiered Storage Management and Design

Three-Dimensional Redundancy Codes for Archival Storage

4 Criteria of Intelligent Business Continuity

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

RAID Level Descriptions. RAID 0 (Striping)

How To Understand And Understand The Power Of Aird 6 On Clariion

Technical White paper RAID Protection and Drive Failure Fast Recovery

Managing the Unmanageable: A Better Way to Manage Storage

White Paper. Enhancing Storage Performance and Investment Protection Through RAID Controller Spanning

Doc. Code. OceanStor VTL6900 Technical White Paper. Issue 1.1. Date Huawei Technologies Co., Ltd.

RAID5 versus RAID10. First let's get on the same page so we're all talking about apples.

StorReduce Technical White Paper Cloud-based Data Deduplication

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

RAID-DP: NetApp Implementation of Double- Parity RAID for Data Protection

Pivot3 Desktop Virtualization Appliances. vstac VDI Technology Overview

Object Storage: Out of the Shadows and into the Spotlight

Inside Track Research Note. In association with. Data Protection and RAID. Modern business needs new storage protection

A 5 Year Total Cost of Ownership Study on the Economics of Cloud Storage

Availability and Disaster Recovery: Basic Principles

Maximizing Deduplication ROI in a NetBackup Environment

How Routine Data Center Operations Put Your HA/DR Plans at Risk

The Value of a Content Delivery Network

What is RAID and how does it work?

The Cloud Hosting Revolution: Learn How to Cut Costs and Eliminate Downtime with GlowHost's Cloud Hosting Services

Time Value of Data. Creating an active archive strategy to address both archive and backup in the midst of data explosion.

VERY IMPORTANT NOTE! - RAID

Hitachi Content Platform as a Continuous Integration Build Artifact Storage System

Designing a Cloud Storage System

Breaking the Storage Array Lifecycle with Cloud Storage

Do You Know Where Your Messages Are?

Video Storage Solutions for Police Departments in Large Cities: What Works, What It Costs

SwiftStack Filesystem Gateway Architecture

STAR Watch Statewide Technology Assistance Resources Project A publication of the Western New York Law Center,Inc.

Taming Big Data Storage with Crossroads Systems StrongBox

RAID Levels and Components Explained Page 1 of 23

Using RAID6 for Advanced Data Protection

How To Save Money On Backup With Disk Backup With Deduplication

Driving Datacenter Change

A Best Practice Guide to Archiving Persistent Data: How archiving is a vital tool as part of a data center cost savings exercise

Reliability of Data Storage Systems

How to recover a failed Storage Spaces

Managing Big Data with Hadoop & Vertica. A look at integration between the Cloudera distribution for Hadoop and the Vertica Analytic Database

REDUCING DATA CENTER POWER CONSUMPTION THROUGH EFFICIENT STORAGE

P u b l i c a t i o n N u m b e r : W P R e v. A

Transcription:

Why RAID is Dead for Big Data Storage The business case for why IT Executives are making a strategic shift from RAID to Information Dispersal

Executive Summary Data is exploding, growing 10X every five years. In 2008, IDC projected that over 800 Exabytes (one million terabytes) of digital content existed in the world and that by 2020 that number is projected to grow to over 35,000 Exabytes. What s fueling the growth? Unstructured digital content. Over 90% of all new data created in the next five years will be unstructured digital content, namely video, audio and image objects. The data storage, archive and backup of large numbers of digital content objects is quickly creating demands for multi-petabyte (one thousand terabytes) storage systems. Current data storage systems based on RAID arrays were not designed to scale to this type of data growth. As a result, the cost of RAID-based storage systems increases as the total amount of data storage increases, while data protection degrades, resulting in permanent digital asset loss. With the capacity of storage devices today, RAID-based systems cannot protect data from loss. Most IT organizations using RAID for big data storage incur additional costs to copy their data two or three times to protect it from inevitable data loss. Information Dispersal, a new approach for the challenges brought on by big data, is cost-effective at the petabyte and beyond levels for digital content storage. Further, it provides extraordinary data protection, meaning digital assets are preserved essentially forever. Executives who make a strategic shift from RAID to Information Dispersal can realize cost savings in the millions of dollars for their enterprises with at least one petabyte of digital content.

RAID and replication inherently add 300% to 400% of overhead costs in raw storage requirements, and will only balloon larger as storage scales into the petabyte range and beyond. Introduction Big data is a term that is generally overhyped and largely misunderstood in today s IT marketplace. IDC has put forth the following definition for big data technologies: Big data technologies describe a new generation of technologies and architectures designed to economically extract value from large volumes of a wide variety of data, by enabling high-velocity capture, discovery and/or analysis. Unfortunately, this definition does not apply to traditional storage technologies that are based on RAID (redundant array of independent drives) despite the marketing efforts of some large, established storage providers. RAID and replication inherently add 300% to 400% of overhead costs in raw storage requirements, and will only balloon larger as storage scales into the petabyte range and beyond. Faced with the explosive growth of data plus challenging economic conditions and the need to derive value from existing data repositories, executives are required to look at all aspects of their IT infrastructure to optimize for efficiency and cost-effectiveness. Traditional storage based on RAID fails to meet this objective. Why RAID Fails at Scale RAID schemes are based on parity, and at its root, if more than two drives fail simultaneously, data is not recoverable. The statistical likelihood of multiple drive failures has not been an issue in the past. However, as drive capacities continue to grow beyond the terabyte range and storage systems continue to grow to hundreds of terabytes and petabytes, the likelihood of multiple drive failures is now a reality. Further, drives aren t perfect, and typical SATA drives have a published bit rate error (BRE) of 10 14, meaning that once every 100,000,000,000,000 bits, there will be a bit that is unrecoverable. Doesn t seem significant? In today s big data storage systems, it is. The likelihood of having one drive fail, and encountering a bit rate error when rebuilding from the remaining RAID set is highly probable in real world scenarios. To put this into perspective, when reading 10 terabytes, the probability of an unreadable bit is likely (56%), and when reading 100 terabytes, it is nearly certain (99.97%). i CLEVERSAFE WHITE PAPER: RAID is Dead for Big Data Storage 2

Raw Storage Multiplier RAID advocates will tout its data protection capabilities based on models using vendor specified Mean Time To Failure (MTTF) values. In reality, drive failures within a batch of disks are strongly correlated over time, meaning if a disk has failed in a batch, there is a significant probability of a second failure of another disk. ii IT organizations address the shortcomings of RAID by using replication, a technique of making additional copies of data to avoid unrecoverable errors and lost data. However, those copies add additional costs: typically 133% or more additional storage is needed for each additional copy, after including the overhead associated with a typical Meet Replication, RAID s Expensive Friend As a result, IT organizations address the big data protection shortcomings of RAID by using replication, a technique of making additional copies of data to avoid unrecoverable errors and lost data. However, those copies add additional costs: typically 133% or more additional storage is needed for each additional copy, after including the overhead associated with a typical RAID 6 configuration. iii Organizations also use replication to help with failure scenarios, such as a location failure, power outages, bandwidth unavailability, and so forth. Having seamless access to big data is key to keeping businesses running and driving competitive advantages. As storage grows from the terabyte to petabyte range, the number of copies required to keep the data protection constant increases. This means the storage system will get more expensive as the amount of data increases (see Figure 1). RAID 6 configuration. 1 Figure 1 CLEVERSAFE WHITE PAPER: RAID is Dead for Big Data Storage 3

IT Executives should realize their storage approach has failed once they are replicating data three times, as it is clear that the replication band-aid is no longer solving the underlying problem associated with using RAID for data protection. Organizations may be able to power through making three copies using RAID 5 for systems under 32 terabytes; however, once they have to make four copies around 64 terabytes, it starts to become unmanageable. RAID 6 becomes insurmountable around two petabytes, since three copies are not manageable by the IT staff of an organization, as well as cost prohibitive for storage in such a large scale. (For assumptions for Figures 1, 2, and 3, please refer to endnote. iv ) Key Considerations for RAID Successors In looking for a better approach to make big data storage systems reliable, there are key considerations that must be addressed: Large-scale data protection It is much easier to design a highly reliable small storage system than to design a highly reliable large storage system. The problem is that given the overall growth rate of data, today s smaller systems are likely to grow to be large systems in very short order. IT organizations must look for a solution that is as reliable storing terabytes as it will be in the petabyte and Exabyte range. Otherwise, the storage system will eventually hit a failure wall. Handles multiple simultaneous failures RAID 6, based on parity, cannot recover from more than two simultaneous failures, or two failures, and a bit rate error before one of the failing drives is recovered. IT organizations must look for a solution that can be configured to handle multiple simultaneous failures that would match realistic outage scenarios in big data environments, provide seamless availability and automatically recover itself. Large-scale cost-effectiveness Keeping reliability constant, RAID gets more expensive as the amount of data increases. IT organizations must find a solution that doesn t multiply raw storage requirements as it grows, to avoid a storage problem that no amount of money can solve. The right solution must also allow the number of nines of reliability to be set versus having to live with the limitations of what is affordable. CLEVERSAFE WHITE PAPER: RAID is Dead for Big Data Storage 4

Information Dispersal - A Better Approach Information Dispersal helps IT organizations address big data storage challenges and significantly reduce storage costs, reduce power and the footprint of storage, as well as streamline IT management processes. Information Dispersal Basics - Information Dispersal Algorithms (IDAs) separate data into unrecognizable slices of information, which are then distributed or dispersed to storage nodes in disparate storage locations. These locations can be situated in the same city, the same region, the same country or around the world. Each individual slice does not contain enough information to understand the original data. In the IDA process, the slices are stored with extra bits of data which enables the system to only need a pre-defined subset of the slices from the dispersed storage nodes to fully retrieve all of the data. Because the data is dispersed across devices, it is resilient against natural disasters or technological failures, like drive failures, system crashes and network failures. Because only a subset of slices is needed to reconstitute the original data, there can be multiple simultaneous failures across a string of hosting devices, servers or networks, and the data can still be accessed in real time. Comparing Information Dispersal and RAID & Replication side by side for one petabyte of usable storage, Information Dispersal is 40% the raw storage of RAID 6 and replication on disk, which translates to 40% of the cost. 1 Realizable Cost Savings Suppose an organization needs one petabyte of usable storage, and has the requirement for the system to have six nines of reliability 99.9999%. Here s how a system built using Information Dispersal would stack up against one built with RAID and Replication. To meet the reliability target, the Dispersal system would slice the data into 16 slices and store those slices with a few extra bits of information such that only 10 slices would be needed to perfectly recreate the data meaning, the system could tolerate six simultaneous outages or failures and still provide seamless access to the data. The raw storage would increase by 1.6 (16/10) times the usable storage, totaling 1.6 petabytes. To meet the reliability target for one petabyte with RAID, the data would be stored using RAID 6, and replicated two, three or even four times possibly using a combination of disk/tape. The raw storage would increase by.33 for the RAID 6 configuration, and then be replicated three times for a raw storage of four times, totaling four petabytes. CLEVERSAFE WHITE PAPER: RAID is Dead for Big Data Storage 5

Comparing these two solutions side by side for one petabyte of usable storage, Information Dispersal requires 60% the raw storage of RAID 6 and replication on disk, which translates to 60% of the cost. v When comparing the raw storage requirements (see Figure 2), it is apparent that both RAID 5 and RAID 6 require more raw storage per terabyte as the amount of data increases. The beauty of Information Dispersal is that as storage increases, the cost per unit of storage doesn t increase while meeting the same reliability target. RAID & Replication versus Dispersal Raw Storage Requirements (Six nines of reliability - 99.9999% over 10 10.0 9.0 Storage System Expansion 8.0 7.0 6.0 5.0 4.0 3.0 2.0 RAID 5 (5+1) RAID 6 (6+2) Dispersal (16/10) Figure 2 1.0 0.0 1 2 4 8 16 32 64 128 256 512 32,768 16,384 8,192 4,096 2,048 1,024 524,288 262,144 131,072 65,536 Usable Terabytes To translate into real world costs, here s an example of storing one petabyte, with six nines of reliability (99.9999%). This also assumes the cost per terabyte is a commodity and the same for either a RAID 6 and replication or Dispersal solution, and set at $1,000. vi 1 Petabyte Scenario RAID 6 & Replication Dispersal Usable Capacity 1 petabyte 1 petabyte Raw Storage Multiplier 2.67 (replicated 2 times) 1.6 Cost / Terabyte $1000 $1000 Total Cost $2,670,000 $1,600,000 Cost Savings $1,070,000 CLEVERSAFE WHITE PAPER: RAID is Dead for Big Data Storage 6

It quickly becomes apparent that an organization can save millions of dollars in the petabyte range, and that dispersal is 40% less expensive. Now another real world scenario: suppose the storage is four petabytes, with six nines of reliability (99.9999%), and assume the cost per terabyte is again set at $1,000. 4 Petabyte Scenario RAID 6 & Replication Dispersal Usable Capacity 4 petabytes 4 petabytes Raw Storage Multiplier 4 (replicated 3 times) 1.6 Cost / Terabyte $1,000 $1,000 Total Cost $4,000,000 $1,600,000 Cost Savings $2,400,000 In this example, it is clear that RAID 6 and replication are significantly more expensive from a capital expenditure perspective, and further, more costly to manage since there are three copies of data. In this example, Dispersal is 60% less expensive. Clearly 40% to 60% less raw storage results in not only hardware cost efficiencies, but also additional savings. Using Information Dispersal, an organization realizes cost savings in power, space and personnel costs associated with managing a big data storage system. CLEVERSAFE WHITE PAPER: RAID is Dead for Big Data Storage 7

524,288 262,144 131,072 Attainable Data Protection Executives begin to fear permanent data loss or significant impact to their ability to analyze the big data under their watch as systems swell to petabytes. Just how likely is data loss using RAID versus Information Dispersal? When looking at number of years without data loss, with a 99.99999% confidence level, RAID schemes on their own (meaning, without replication) clearly degrade as the storage amount increases to the point where data loss is imminent (see Figure 3). 8 Years without data loss (99.999% confidence) 7 6 5 Years 4 3 2 RAID 6 (6+2) Figure 3 1 0 RAID 5 (5+1) 32 64 128 256 512 1,024 2,048 4,096 8,192 65,536 32,768 16,384 Terabytes Stored RAID 5 clearly wasn t designed for protecting terabytes of data, as data loss will be close to certain in no time at all. RAID 6 can prevent data loss for several years with lower storage requirements; however, at just 512 terabytes, there is a high probability of data loss in less than a year. Information Dispersal doesn t even appear on the chart because even for a large storage amount like 524K terabytes, the confidence for years without data loss is not within anyone s lifetime. (It is theoretically over 79 million years.) Clearly, Information Dispersal offers superior data protection and availability as systems scale to larger capacities, and won t incur the expense of replication to increase data protection for big data environments. CLEVERSAFE WHITE PAPER: RAID is Dead for Big Data Storage 8

Avoiding a Resume-Generating Event Big data is here to stay. If an organization relies on RAID and replication, its storage systems will hit a wall in terms of becoming cost prohibitive (as illustrated in Figure 1), or having to sacrifice data protection to store all of its data, and inevitably experience data loss (as illustrated in Figure 3). At this point, many IT executives fear the resume-generating event of having to admit the data they are responsible for managing is not available for analysis or processing and the enterprise is no longer competitive. Some IT executives may be in denial My storage is only 40 terabytes, and I have it under control, thank you. Consider a storage system growing at the average rate of 10 times in five years per IDC s estimation. A system of 40 terabytes today will be 400 terabytes in five years, and four petabytes within just 10 years (see Figure 4). This illustrates that a system in the terabyte range that is currently using RAID will most likely start to fail within the executive s tenure. Figure 4 Savvy IT executives will make a shift to Information Dispersal sooner, realizing it will be easier to migrate their data in a smaller range to the new technology. Further, they will realize significant cost savings within their tenure. CLEVERSAFE WHITE PAPER: RAID is Dead for Big Data Storage 9

Evaluating Cleversafe Cleversafe s Dispersed Storage Platform offers significant advantages over traditional storage systems relying on RAID and replication, including: Big data protection Designed to specifically address big data protection needs, Cleversafe s Dispersed Storage Platform can meet storage needs at the petabyte level and beyond today. Ability to handle multiple simultaneous failures Interested in planning for multiple simultaneous failures over RAID s limit of two? No problem Information Dispersal can handle as many simultaneous failures as the IT organization chooses to configure. Data center has a power outage, bandwidth connectivity issues at a second location, bit rate error on a drive, and another location unavailable due to routine maintenance? With Information Dispersal, big data is still seamlessly available. Large-scale cost effectiveness Information Dispersal is extremely cost effective because it is not relying on copies of data to increase data protection, so the ratio of raw storage to usable storage remains close to constant even as the data increases. Finally, a solution that doesn t get more expensive per terabyte as the storage system grows. Summary Forward-thinking IT executives will make a strategic shift to Information Dispersal to realize greater data protection as well as substantial cost savings for digital content storage. In looking at options, those organizations will do well to look at Cleversafe s Dispersed Storage Platform. Cleversafe Inc. 222 South Riverside Plaza Suite 1700 Chicago, Illinois 60606 www.cleversafe.com MAIN: 312.423.6640 FAX: 312.575.9641 2011 Cleversafe, Inc. All rights reserved. CLEVERSAFE WHITE PAPER: RAID is Dead for Big Data Storage 10

End Notes i Calculated by taking the chance of not encountering a bit rate error in a series of bits. For 10 TB: 1 - ((1-10^-14)^(10*(8*10^12))) For 100 TB: 1 - ((1-10^-14)^(100*(8*10^12))) ii Based on research report Disk failures in the real world: What does an MTTF of 1,000,000 hours mean to you? Bianca Schroeder Garth A. Gibson, Computer Science Department, Carnegie Mellon University, Usenix, 2007. iii For RAID 6 (6+2), assumed array configuration is 6 data drives, and 2 parity drives, resulting in 33% overhead. iv Figures 1, 2, and 3 rely on the same underlying model and assumptions: Disk drives are 1 terabyte. An annual failure rate (AFR) for each drive is set to 5% based on Google s research.* For RAID 5, an array is configured with 5 data drives, and 1 parity drive. For RAID 6, an array is configured with 6 data drives, and 2 parity drives. Calculation for a drive s reliability: (1-AFR). Number of drives Calculation for the reliability of a single array: Drive reliability Number of arrays Calculation for multiple array reliability: Array reliability Disk failure rebuild rate: 25 MB/s Disk service time (hours to replace with new drive): 12 hours Calculations do not factor in bit rate errors (BRE), which would increase the storage cost for RAID 5 and 6 due to additional replication required. Dispersal would not increase because in the event of an encountered bit rate error, Dispersal has many more permutations than RAID 5 or 6 in which to reconstruct the missing bit, as well as lower probability of six simultaneous failures. * Google s research on data from 100,000 disk drive system showed disk manufacturer s published annual failure rates (AFR) understate failure rates they experienced in the field. Their findings were an AFR of 5%. Failure Trends in a Large Disk Drive Population, Eduardo Pinheiro, Wolf-Dietrich Weber and Luiz Andr e Barroso, Google Inc., Usenix 2007. v Since disk drives are commodities, this model assumes the same cost per gigabyte of raw storage for a RAID 6 system and a Dispersed Storage system. vi This is a hypothetical price per gigabyte. Based on current research, this would be highly competitive in the market. Readers may adjust math by using a different price per gigabyte as desired. CLEVERSAFE WHITE PAPER: RAID is Dead for Big Data Storage 11