IN THIS STUDY SITUATION OVERVIEW FIGURE 1



Similar documents
I D C T E C H N O L O G Y S P O T L I G H T. Long- T e r m S t o r a g e S o l u tions Heat Up to M a nage Explosive D a t a G r ow t h

WHITE PAPER Addressing Enterprise Computing Storage Performance Gaps with Enterprise Flash Drives

Looking at (Storage) Clouds from Both Sides Now

Q u a n t u m ' s D X i D e d u p l i c a t i o n S y s t e m I m p r o v e s E f f i c i e n c y a n d R e d u c e s C o s t s

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Balboa Park Online Collaborative Deploys the Exablox OneBlox Solution: Achieves Cost and Management Savings

TCO Case Study Enterprise Mass Storage: Less Than A Penny Per GB Per Year

Driving Datacenter Change

TCO Case Study. Enterprise Mass Storage: Less Than A Penny Per GB Per Year. Featured Products

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

WHITEPAPER It s Time to Move Your Critical Data to SSDs

WD s Datacenter Storage Portfolio Capacity Storage Evolved

W H I T E P A P E R T h e C r i t i c a l N e e d t o P r o t e c t M a i n f r a m e B u s i n e s s - C r i t i c a l A p p l i c a t i o n s

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

SOLUTION BRIEF KEY CONSIDERATIONS FOR LONG-TERM, BULK STORAGE

VMware Enriches vcloud Air Services with Object Storage

Got Files? Get Cloud!

Global Headquarters: 5 Speen Street Framingham, MA USA P F

The Copy Data Problem: An Order of Magnitude Analysis

Technology Insight Series

Amazon Cloud Storage Options

EMC CLOUDARRAY PRODUCT DESCRIPTION GUIDE

WHITE PAPER Making Cloud an Integral Part of Your Enterprise Storage and Data Protection Strategy

Top 10 Myths About Flash Storage

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Global Headquarters: 5 Speen Street Framingham, MA USA P F

T i e r s o f J o y? W h a t ' s G o i n g o n w i t h T i e r e d S t o r a g e?

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Enterprise Data Lake Platforms: Deep Storage for Big Data and Analytics

SOLUTION BRIEF KEY CONSIDERATIONS FOR BACKUP AND RECOVERY

Converged and Integrated Datacenter Systems: Creating Operational Efficiencies

Flash Memory Technology in Enterprise Storage

Modernizing Data Protection With Backup Appliances

How To Understand And Understand Cyber Group

Boost your storage buying power... use ours!

Enterprise SATA and Near-Line Storage ULTAMUS TM RAID

HyperQ Storage Tiering White Paper

Long term retention and archiving the challenges and the solution

Worldwide All-Flash Array and Hybrid Flash Array Forecast and 1H14 Vendor Shares

I D C M A R K E T S P O T L I G H T

Flash Memory Arrays Enabling the Virtualized Data Center. July 2010

Total Cost of Solid State Storage Ownership

I T T R A N S F O R M A T I O N A N D T H E C H A N G I N G D A T A C E N T E R

Get Success in Passing Your Certification Exam at first attempt!

EMC XtremSF: Delivering Next Generation Storage Performance for SQL Server

Worldwide Application Performance Management Software 2013 Vendor Shares

How To Use An Npm On A Network Device

Cisco UCS and Fusion- io take Big Data workloads to extreme performance in a small footprint: A case study with Oracle NoSQL database

Deploying Flash in the Enterprise Choices to Optimize Performance and Cost

Power-Managed Storage:

I D C T E C H N O L O G Y S P O T L I G H T

Gridstore is seeking to simplify this situation, specifically for midsize companies:

Next Generation Backup Solutions

I D C T E C H N O L O G Y S P O T L I G H T. T i m e t o S c ale Out, Not Scale Up

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Everything you need to know about flash storage performance

THE EMC ISILON STORY. Big Data In The Enterprise. Copyright 2012 EMC Corporation. All rights reserved.

IT as a Service Emerges as a New Management Paradigm in the Software-Defined Datacenter Era

Enterprise Workloads on the IBM X6 Portfolio: Driving Business Advantages

Protecting Information in a Smarter Data Center with the Performance of Flash

Flash-optimized Data Progression

RECOVERY SCALABLE STORAGE

W H I T E P A P E R B r i n g i n g C l a r i t y t o H a r d Disk Drive Choices for Enterprise Storage Systems

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Tape s evolving data storage role Balancing Performance, Availability, Capacity, Energy for long-term data protection and retention

Worldwide WAN Optimization Management Forecast and Analysis

IBM Spectrum Protect in the Cloud

EMC XtremSF: Delivering Next Generation Performance for Oracle Database

EMC BACKUP MEETS BIG DATA

ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V

Object Storage: A Growing Opportunity for Service Providers. White Paper. Prepared for: 2012 Neovise, LLC. All Rights Reserved.

Maintaining Business Continuity with Disk-Based Backup and Recovery Solutions

Caringo Swarm 7: beyond the limits of traditional storage. A new private cloud foundation for storage needs at scale

Newmarket Implements Actifio: Gains Faster Recovery and Time to Market for SaaS Customers

All-Flash Arrays: Not Just for the Top Tier Anymore

SOLUTION BRIEF TAPE VERSUS DISK FOR DATA STORAGE

Backing up to the Cloud

OPTIMIZING PRIMARY STORAGE WHITE PAPER FILE ARCHIVING SOLUTIONS FROM QSTAR AND CLOUDIAN

Forward Looking Statements

Solid State Architectures in the Modern Data Center

Intel RAID Controllers

How it can benefit your enterprise. Dejan Kocic Hitachi Data Systems (HDS)

Building a Flash Fabric

Worldwide Cloud Systems Management Software 2013 Vendor Shares

W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System

Global Headquarters: 5 Speen Street Framingham, MA USA P F

INSIGHT. Symantec Optimizes Veritas Cluster Server for Use in VMware Environments IDC OPINION IN THIS INSIGHT SITUATION OVERVIEW. Jean S.

VMware and Primary Data: Making the Software-Defined Datacenter a Reality

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Archiving Information Storage and Its Advantages

W o r l d w i d e B u s i n e s s A n a l y t i c s S o f t w a r e F o r e c a s t a n d V e n d o r S h a r e s

Oracle Enters the High-Growth All-Flash Array Market with an Oracle-Optimized Solution

Unlock the value of data with smarter storage solutions.

Utilizing Cloud Storage for Mainframes

Archive Data Retention & Compliance. Solutions Integrated Storage Appliances. Management Optimized Storage & Migration

How to Avoid Storage Over-Spending:

Well packaged sets of preinstalled, integrated, and optimized software on select hardware in the form of engineered systems and appliances

Maxta Storage Platform Enterprise Storage Re-defined

Nexsan and FalconStor Team for High Performance, Operationally Efficient Disk-based Backup Date: August, 2009 Author:

Transcription:

TECHNOLOGY ASSESSMENT Technology Assessment: Cold Storage Is Hot Again Finding the Frost Point John Rydning Dan Iacono David Reinsel IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA 01701 USA P.508.872.8200 F.508.935.4015 www.idc.com Data is quickly becoming the new currency in the era driven by social media, mobile applications, and devices. The potential value that can be derived from data using Big Data and analytics has just begun to be tapped. IDC forecast in the Digital Universe 2012 report that only 0.5% of potential Big Data is being analyzed, creating opportunity and unrealized value extraction. Storage infrastructures are requiring more capacity to store the increasing volume of data and retain the data for longer periods of time than ever. Cloud storage and Web-scale service providers have popularized the concept of using a single tier of storage for all data using commodity hardware and replicated data several times for data protection. IDC is observing a shifting trend in cloud and Web-scale architectures to a multitier storage strategy in order to sustain growth while maintaining existing data. This cold storage technology assessment will seek to define critical characteristics and market trends: Cold storage. Cold storage is an operational mode or a method operation of a data storage device or system for inactive data where an explicit trade-off is made, resulting in data retrieval response times beyond what may be considered normally acceptable to online or production applications in order to achieve significant capital and operational savings. Emergence of the cloud drive. Proactively, hard disk drive (HDD) manufacturers have developed a "cloud drive" that has a limited set of enterprise drive features but at a price point lower than traditional capacity-optimized enterprise drives. Over time, an assortment of cloud drives with various capabilities is likely to emerge to address product and reliability requirements specific to cloud storage datacenter customers, including a range of data retrieval response times. Use case driven. Not all solutions will be able to serve every purpose. To maximize cost savings, solution providers will have to make design decisions that may not be acceptable to every organization. IDC believes solution providers should focus on providing maximum value to specific use cases versus building a general-purpose solution where there is a marginal cost-per-gigabyte savings versus traditional online storage solutions. The key to the cold storage market is dramatic cost savings achievement, which will motivate changes in application behavior. Marginalizing the cost of change will inhibit the overall cold storage adoption. Filing Information: May 2013, IDC #241005, Volume: 1 Storage Solutions: Storage and the Cloud: Technology Assessment

IN THIS STUDY This IDC study examines the landscape of cold storage features, characteristics, and market trends driving the development of cold storage solutions. SITUATION OVERVIEW Data is quickly becoming the new currency in the era driven by social media, mobile applications, and devices buoyed by mega hyperscale datacenters. The potential value that can be derived from data using Big Data and analytics has just begun to be tapped. IDC forecast in the Digital Universe 2012 report that only 0.5% of potential Big Data is being analyzed, creating opportunity and unrealized value extraction. Storage infrastructures are requiring more capacity to store the increasing volume of data and retain the data for longer periods of time than ever. This trend can be observed in the demand for capacity-optimized storage versus performanceoptimized and I/O-intensive storage in disk storage systems shipped by storage system OEMs as shown in Figure 1. FIGURE 1 Worldwide Capacity-Optimized, Performance-Optimized, and I/O-Intensive Storage Systems Shipments, 2012 2016 Source: IDC, 2013 2013 IDC #241005 1

Cloud storage and Web-scale service providers have popularized the concept of using a single tier of storage for all data using low-cost commodity hardware and replicated data several times for data protection. IDC is observing a shifting trend in cloud and Web-scale architectures to a multitier storage strategy in order to sustain growth while maintaining existing data. At the Open Compute Summit IV in January 2013, Facebook's VP of Infrastructure Jay Parikh revealed 82% of Facebook's read network traffic for photos was servicing 8% of total photos as shown in Figure 2. FIGURE 2 Facebook Photo Access Patterns Note: Data is from the Open Compute Summit IV, January 2013, Santa Clara, California. Source: Facebook, 2013 Facebook's analysis of the data showed there was a fairly substantial demand dropoff for photo access as the photo ages. Therefore, Facebook is paying a substantial premium to have the 92% of the photos infrequently accessed on the same storage tier as the 8% of the photos that are frequently accessed. Facebook's solution (which is discussed in more detail in the Open Compute Cold Storage Standard section) was to create a cold storage solution named "Open Vault." In the cloud storage service market, Amazon was the first company to offer a cold storage service named Glacier at $0.01 per gigabyte per month, which is orders of magnitude less expensive than Amazon's existing S3 service. The monthly price per gigabyte garnered much media attention and so did the service attributes. For example, with Glacier, access to the data is not immediate (as retrieval could take hours), and pricing is variable for how much data is retrieved a month over the free cap. 2 #241005 2013 IDC

Cold Storage Defined In today's data environment, there are two types of data: Active. Data that is frequently accessed (not necessarily modified) Inactive. Data that is infrequently or never accessed after written If everything was equal, then why wouldn't active and inactive be stored on the single platform and tier? There are several factors that can influence a data storage decision; however, IDC has identified the top pain points: Identification. IT organizations have difficulty in discovering which data is active and inactive. Placement. Once the data type is identified, the data will need to be moved or transitioned to the appropriate solution to meet the cost objective. Change in data type over time. The transition period from inactive to active data has the most substantial effect on organizations because the promotion to active is usually not predicted and has the slowest response time to a production application. In the data storage market, response time is critical since most applications are generally programmed to receive response times less than 50ms from online storage (under normal operations). When examining storage response time at a high level, the overriding key performance indicator is cost. For example, storing data on tape offsite is less expensive than always-on spinning disk; however, the response/retrieval time for tape in this solution can be up to 24 hours. Essentially, there is a response time chasm between online applications at milliseconds and offsite tape up to 24 hours. Purpose-built backup appliances (PBBAs) provide a solution for lower storage capacity cost with acceptable application response time through the ability to transition data from inactive to active quickly (i.e., applications don't have to be rewritten to use a PBBA). With the PBBA, the response time chasm is now seconds to 24 hours, and at the end of the day, PBBAs are still always-on spinning disk. To dramatically reduce data storage cost, trade-offs have to be made between service and risk, usually affecting response time. Cold storage is a running mode or method operation of a data storage device or system for inactive data where an explicit trade-off is made, resulting in response times beyond what would be considered normally acceptable to online or production applications in order to achieve significant capital and operational savings. Examples of deployment could result in underlying media, such as tape or disk, that would not be spinning or powered but readily available for access. Defining an arbitrary number to define response time for cold storage is difficult because what is acceptable varies per organization or application or even transaction. Measuring Cold Storage IDC would like to introduce a framework for evaluating and measuring cold storage that can be adapted for each use case dollar per gigabyte per response time ($/GB/RT) objective. Dollar per gigabyte ($/GB) is a great measure of storage capacity; however, it doesn't incorporate the performance (i.e., response time) into the metric. When we reference $/GB, we are referring to the "all in" $/GB, which includes all capex and opex 2013 IDC #241005 3

costs, not just acquisition cost. As stated previously, acceptable performance is variable; however, if organizations want to achieve a storage capacity cost approaching less than $0.01 per month (or significantly cheaper than traditional storage for active data), then IDC believes IT organizations will have to trade-off response time. Cold storage promises to provide a low $/GB/RT cost solution between the response time chasm of seconds to hours. IDC has listed the top influencers to the $/GB/RT metric: Retrieval time. Retrieval time is the expected period between the request for and servicing of data. The appropriate or acceptable retrieval time requirements will vary per application from microseconds to up 24 hours. Data access characteristics. The ability to randomly access, append, and modify data in place will greatly influence the underlying media and data storage solution choice. Power. One of the most limiting factor in datacenters is power, whether it's to operate the devices or cool the space. The power cost to keep a drive fully operational can exceed its capital cost in as little as three years, depending on the type of drive. Traditional storage solutions are usually always on and have limited power management capabilities. Availability/reliability. The method of data protection will determine how much additional capacity, processing power, and network bandwidth will be required. Transactions. To guarantee a level of performance, the objective must contain rated limit of operations that can be performed. Less expensive systems may not linearly scale and may become more expensive, beyond the rated limit. IT organizations need to define a response time objective that meets their business needs and cost level that incents change. Different Power Phases Within a Disk Drive Disk drives have many power phases incorporated into the drive itself, and power draw can vary drastically. Upper layer software can control the power state of the hard disks by the shelf or even at a more granular level per drive. In the consumer desktop market, varying power states are utilized extensively in the field; however, with enterprise storage, many solutions ensure power states stay at "active" to ensure consistent response time. Table 1 exhibits the typical power levels across HDD vendor's capacity-optimized drives. TABLE 1 Five Different Power Phases of Capacity-Optimized Drives HDD Power State Typical HDD Condition Typical HDD Device Power Consumption (Watts) Typical HDD Device Inactivity Time for This Power State (Minutes) Recovery Time or Latency to Retrieve Data from This Power State Active, random R/W Heads tracking spindle rotating electronics: active 12 13 None <12ms 4 #241005 2013 IDC

TABLE 1 Five Different Power Phases of Capacity-Optimized Drives HDD Power State Typical HDD Condition Typical HDD Device Power Consumption (Watts) Typical HDD Device Inactivity Time for This Power State (Minutes) Recovery Time or Latency to Retrieve Data from This Power State Active idle Heads off/unloaded spindle rotating electronics: active 4 6.75 2 100 700ms typical Low rpm idle Heads off/unloaded spindle rotating at reduced rpm electronics: active 2.4 6.1 4 4 6s typical Standby Heads off/unloaded spindle not rotating electronics: only a few key circuits are active (to accept host commands) 0.7 0.75 10 <10s Sleep Heads off/unloaded spindle not rotating electronics: only a few key circuits are active (to accept host commands) 0 15 10 12s typical Note: The inactivity time is typical for most HDD devices; time to each power state is variable and modifiable by system OEMs. Source: IDC, 2013 Note that standard desktop-class 3.5in. HDDs have fewer power management modes in the drive's firmware compared with traditional "nearline" or "business critical" HDDs, or "cloud" HDDs. The Troubles of Massive Array of Idle Disks and Green Storage Massive array of idle disks (MAID) was a technology where individual drive shelves and disk could be spun down to save power. MAID was popularized by COPAN Systems in the mid-2000s, which went bankrupt in 2009; the technology assets were then acquired by SGI in 2010. MAID technology deployed at that time from companies such as COPAN was interesting; however, the technology faced the following challenges: Lack of applications accepting long latency. It can take anywhere from 10 to 30 seconds to fully spin up a disk drive from stopped to fully operational. Applications would have to be designed to account for the latency of I/O, which was not broadly adopted. Disk drive failures with spin up/down. Spinning up a disk from power-off mode was prone to disk failures, causing data to be inaccessible or frequent rebuilds. Lack of power management granularity. Most MAID systems could only spin down/up whole disk shelves at time. A whole disk shelf spin-up process could cause unwanted power spikes within the rack if not done properly. Because of 2013 IDC #241005 5

the random access (and placement) of data, it was unlikely a whole shelf could be spun down for any given period of time to provide substantial savings. This was particularly true for traditional storage arrays that attempted to add MAID or spin-down features into existing arrays. Green storage market. The application and operational complexities with delivering green storage were never offset by substantial savings. For example, a 25% power savings didn't offset rearchitecting applications to incorporate substantial storage latency. Open Compute Cold Storage Standard The Open Compute Project (OCP) was started in 2011 by Facebook to drive hardware efficiency and collaboration in the scalable computing market. At the Open Compute Summit IV in January 2013, Facebook introduced a cold storage hardware design for its requirements: Data stored on disk and almost never read again Bulk load fast archive Sequential writes Random reads Dedicated platform created for cold storage Hardware design highlights: Incorporating Open Vault (Knox) storage platform 4TB SATA shingled magnetic recording (SMR) disk drives Maximum power budget of 1.8 2.6kW/rack Single-drive spin up per Open Vault chassis at any given time Nonredundant power supplies Designed for a 744-rack datacenter layout For more information on Open Compute Project, go to www.opencompute.org/projects/storage/ Facebook has a specific definition and use case for cold storage that may not apply to other environments. The company is willing to trade-off redundancy and latency to achieve maximum power and equipment savings for its use case. Nexsan AutoMAID Nexsan E-Series storage array has a five-level policy-based power management implementation called AutoMAID. AutoMAID power management levels are: Level 1. Heads parked but spindle speeds remain unchanged Level 2. Heads unloaded and drive speed slows to 4,000rpm Level 3. Enters sleep mode and stops spinning Level 4. Powers off drive (including electronics) Level 5. Powers off drive enclosure (including drives) 6 #241005 2013 IDC

The granularity of the power management policies are at the RAID set level. All the power management policies are based on elapsed time idle. Therefore, when an I/O request comes into the array and the data becomes active, the disks continue to stay active as well until the power management policy parameters are met again. Imation acquired Nexsan in early January 2013. EMC Data Domain Extended Retention Data Domain (DD) is known for its success in the purpose-built backup appliance market as a disk target for backup applications. DD Extended Retention (formerly known as Data Domain Archiver) is a two-tier product with the first tier exactly the same as a typical DD device; however, there is a second tier available to further optimize infrequently accessed data. Data can be moved from the primary tier to the secondary tier by user-defined policies. It is well known that DD systems use system memory extensively for performance and store the metadata of the DD file system. A key optimization in the second tier is that the residing data does not have the metadata stored in memory, therefore trading off access latency for longer, less frequently accessed data. FUTURE OUTLOOK The Emergence of HDDs Designed for Cloud Providers Enterprise server and storage OEMs traditionally have used "enterprise grade" drives for performance or capacity rather than HDDs designed primarily for PCs. The main differences between an enterprise HDD and a PC HDD include (but are not limited to) the testing requirements, internal components (such as fixed or rotating shaft spindle motor), the use of rotation vibration sensors, and warranty, which usually dictates the use case as well as price. Cloud providers at scale are always searching for new and innovative ways to lower storage acquisition costs and to find efficiencies. As stated in previous documents, an improvement of 1% efficiency in cost for hyperscale cloud providers can translate into millions of dollars in savings. When speaking with hyperscale providers, it's the same story over and over again "it's all about the math." The Math The cost premium for a capacity enterprise HDD can be 50 100%+ on a price-pergigabyte basis compared with a traditional desktop or mobile-class HDD. When speaking with large and hyperscale cloud providers, the results were mixed; however, the failure rate for their use cases with desktop drives was less than 3% over traditional HDD. To be clear, we are referencing cold or backup storage use cases, not primary "bet your business" transactional workloads. The simple worst-case math is to subtract the failure rate (3%) from the enterprise drive premium (50%), and the potential savings is 47% just in HDD media. The HDD Warranty Why did we subtract the increased expected failure rate (3%) from the premium (50%)? Wouldn't the failure rate be covered under warranty? Until recently, HDD manufacturer warranties did not warranty using desktop drives in enterprise applications or where the desktop drives would be installed into the chassis other than flat. High-capacity storage 2013 IDC #241005 7

chassis tend to plug in and mount their drives from the bottom (i.e., the drive stands like domino). Therefore, if an enterprise or a cloud provider would use desktop HDD in its datacenters, it essentially voided the warranty, so the increased failure rate would have to be covered by the company, not the HDD manufacturer. Enterprise system OEMs and DIY hyperscale will try to use the lowest-cost capacityoptimized HDD possible, as long as the application workload is aligned to the maximum terabyte-per-year limitation established by the HDD vendor, as well as other storage device performance metrics. Exceeding the maximum terabyte-per-year read/write limitation for a given storage device can void the manufacturer's warranty. Compromises and Cloud Drive Proactively, HDD manufacturers have developed a "cloud drive" that has cost benefits compared with an enterprise-class drive with some enterprise-class features. The cloud drive is a new class of HDD that sits between enterprise capacity and desktop HDD, specifically geared for the cloud storage market and warranted by the HDD manufacturer. Over time, an assortment of cloud drives with various capabilities is likely to emerge to address product and reliability requirements specific to cloud storage datacenter customers, including a range of data retrieval response times. What prevents an enterprise or a cloud provider from overworking HDD with an application, which would expose the HDD manufacturer to have excessive warranty liability? IDC has observed HDD manufacturers adding a "workload maximum" into their warranties. In essence, any drive could be used; however, if you exceed the workload rating, the warranty will be void. One challenge with system-identified HDD failures in the field (hence failure rates) is the relatively high percentage (up to 50%) of returned "failed" drives that are subsequently categorized as having "no trouble found" (NTF). In many cases, NTF drives are retested, resold, and put back into service with no further problems. IDC is also observing an emerging trend where HDD distributors will add warranties on top of the HDD manufacturer. An example of this aftermarket warranty would be extending the warranty from two to five years, as long as the workload maximum isn't exceeded. Future HDD Media Improvements: Shingled Media and Helium Drive HDD vendors are being presented with two storage opportunities for database, storage, and backup applications that have inherently longer latency requirements than active storage (typical active applications include query and search). In describing these opportunities, HDD industry participants use different terminology but define these two opportunities similarly: Active archive or online cold: Mainly random read/write access Mainly 7,200rpm but can be slower depending on the data transfer rate expectations for a given application Data retrieval in <1 second when HDD is idle One HDD cycled at a time to reduce power consumption (per a 20 drive enclosure) 8 #241005 2013 IDC

Helium-filled HDD to enable more disks per drive (to increase the drive capacity and reduce $/GB) and also means less power to spin disks (reduce active watt/hdd, watt/gb, and power and cooling requirements) Deep archive or tape replacement: Mainly sequential read/write access Less than 7,200rpm sufficient for data transfer rate expectations Data retrieval in <30 to 60 seconds Helium-filled HDD to enable more disks per drive (reduce $/GB) and also means less power to spin disks (reduce active watt/hdd, watt/gb, and power and cooling requirements) Shingled magnetic recording technology to enable higher storage capacity per disk (reduce $/GB) (SMR is an HDD technology well suited for long sequential writes, and when modifying data after being written to the disk is unlikely.) New DevSleep Command The Serial ATA International Organization (SATA-IO) is developing a new command called DevSleep aimed primarily for use in mobile client devices. This command allows the host to power down the PHY and the link-related circuitry to save even more power. DevSleep is essentially a middle ground between power states found in desktop-class 3.5in. HDDs and mobile-class 2.5in. HDDs. The host initiates the DevSleep signal to the storage device, which puts the device into a very low-power standby mode (target is <5mW), and when the DevSleep signal is withdrawn, the device is expected to move to a more active power state quickly. Thus far, HDD vendors are not seeing any plans from storage subsystem vendors to take advantage of the new DevSleep command in conjunction with use of desktop-class or mobileclass HDDs. Nevertheless, the addition of the DevSleep command set to the SATA protocol could open the door to new power management modes for relatively lowercost desktop-class 3.5in. HDDs compared with more sophisticated power management modes available on traditional "nearline," "business critical," or "cloud" capacity-optimized HDDs. Data Protection and Efficiency Long-term storage of infrequently accessed data is only as good as the integrity of the data at retrieval. There exist many concerns: Unrecoverable data errors or bit rot. Drive media is extremely durable reaching 12 13-"9s"; however, as the capacity per drive increases and overall capacity retained in an environment grows, the probability of unrecoverable media error becomes a reality. If the data is infrequently accessed, then the ability to detect the unrecoverable media errors proactively will be a challenge that will need to be solved. IDC believes solutions will have to perform a complete bit check stored on all media periodically. Rebuild from a drive failure. Traditional RAID solutions during a failure rebuild every bit on a drive including "0s" on the drive, which are not associated with stored logical data. As disk drive capacities increase and spindle speed 2013 IDC #241005 9

maintains (or even slows), the impact of rebuilding a whole disk drive can be extended to several hours to even days. Not only will performance and data protection be degraded during this time; the power budget will be impacted as well. IDC believes cold storage solutions will have to move a logical data-only rebuild solution to minimize the rebuild impact. Protection efficiency. There are two primary methods for protecting data: Multiple copies. For data that will need to be quickly rebuilt locally, datacenter requirements have a tight power budget, and data at the application layer that is compressed (deduplicated or encrypted) lend themselves to multiple copies. Erasure coding. Where erasure coding excels is in a three-datacenter (or more) replication, the goal is to maximize bits written to disk, and the need for expedient rebuild is not necessary. Data Access IDC forecasts object-based storage solutions revenue will grow at a 30 50% CAGR through 2016, and object-based storage solutions will be the future of cold storage data accessed through a RESTful API; however, today, a majority of applications (legacy and new) are not ready to take advantage of this paradigm shift. IDC believes in the interim cold storage solutions will need to provide multiple data access options such as native block, NFS, CIFS, and LTFS. IDC forecasts in 2016, approximately 25% of capacity shipped and 30% of revenue will be NAS compared with block protocols (excluding mainframe). Therefore, NAS-based systems are an important and growing segment in the market, yet solution providers should not exclude the larger block storage market opportunity. Data Retention Each organization will have a different data retention strategy depending on a variety of factors including, but not limited to, compliance, regulation, customer experience, and business value: Policy based. Organizations want to set up business rules and automatically have the data move between defined storage tiers transparently without the need for storage administrator management. Key criteria for the policies will be pertaining to service-level achievement such as retrieval/response time, dataprotection level, and expiration. Metadata management. Two methodologies exist where deep extensible metadata management is integrated in the cold storage solution or the metadata management is streamlined and supplemented with an external solution providing the extensibility. Why does this matter? Metadata inquiry and searching may reduce requests for data in cold storage. For example, IDC foresees extensive metadata mining for Big Data operations as a precursor to retrieving cold storage data to manage resources more effectively. Flash as Cold Storage IDC has been tracking an undercurrent within the storage market that some hyperscale companies were investigating, which is flash media for cold storage. The rational for using the media made sense: 10 #241005 2013 IDC

Probably write once or very little Fast response time from sleep or power off Granular power management (i.e., single-chip power on) Even with the lowest-grade flash media today, the price for capacity doesn't make sense and the supply chain to provide the volume required could not be met. It was an interesting twist on flash media, which has been relegated to the performance-only market. Emerging Vendors As new storage markets begin to emerge, develop, and find use cases, start-up companies will begin to provide solutions. The cold storage market is no different. Start-up companies such as SageCloud, Inktank with Ceph, and cloud storage companies providing Amazon AWS Glacier competitors will enter the market. IDC believes first-generation products will enter the market in late 2013 and early 2014. ESSENTIAL GUIDANCE As the perceived value of data increases, more data will be kept for longer periods of time, placing tremendous burden on IT infrastructures and budgets. The willingness to sacrifice online storage response times to a range of seconds, minutes, or a few hours (however substantially less than offsite tape retrieval) to reduce storage capacity cost dramatically has created a new and emerging cold storage market. As a back of the napkin exercise, if IDC applied Facebook's cold storage candidate statistic to IDC's worldwide capacity-optimized forecast, then the total addressable market for cold storage could be as high as $25 billion by 2016. That said, there isn't a formal forecast by IDC for the nascent cold storage market; however, we believe by 2014 2015, the market will be of size to forecast. IDC believes the nascent cold storage market is still developing and taking shape. The key driver for the cold storage market to accelerate in growth is the availability of solutions that have dramatically reduced capacity-per-gigabyte costs versus that of traditional online solutions. IDC outlines characteristics beyond cost that will be important in cold storage solutions: Open standard access. Significant time and money will be invested by organizations to develop applications to take advantage of cold storage solutions. Therefore to protect their investment, organizations do not want to develop to vendor controller or proprietary standards. Sustained per drive power off. Simply providing the ability to spin down a drive or shelf of drives when idle will not be enough. Cold storage solutions will have to intelligently manage incoming and outgoing traffic to ensure minimal amount of drive spin up and sustain periods of disk drive shutdown to maximize operation savings. Unrecoverable bit error protection (or bit rot protection). Unrecoverable bit errors on the underlying media will be a condition that cold storage solution providers will not be able to ignore. Solution providers will have to put in place mechanisms to search for bit rot and recover proactively with minimal overall impact to performance, power, and cost. 2013 IDC #241005 11

Compliance. Regulatory issues will be a key driver for the cold storage market. Therefore, providing write once, read many (WORM) will be a standard requirement for most enterprise implementations for all or a portion of their cold storage usage. Private availability. Service and cloud providers as well as hyperscale Web properties have been the primary adopters of cold storage technology. For various reasons, traditional enterprises will not be able to adopt an external strategy for cold storage such as security policies, compliance, and scale. Therefore, IDC believes there will be significant demand for enterprises to incorporate cold storage solutions into their existing private cloud offerings. Flexible. Hyperscale Web properties and service providers may limit the generalpurpose nature of a cold storage device to achieve maximum efficiency at scale. The private cloud opportunity for cold storage will require the ability to customize cold storage for various applications and use cases within their environment. Therefore, granular policy-based management will be a key feature in the private cloud market. Replication. Single-site solutions will be sufficient for initial deployments; however, as capacity of the solution grows, organizations will want to replicate the data for site resiliency. Organizations will have various preferences on whether the replication will be synch or asynch; however, it will be important to offer both methods at a very granular policy-based level. Tape and cloud integration. The focus of this document is on disk-based cold storage offerings; however, IDC also believes cold storage in the future will be an integrated strategy incorporating tape and cloud. For extremely long-term storage without expectation of access and compliance reasons, organizations still have use cases for tape. Use case driven. Not all solutions will be able to serve every purpose. To maximize cost savings, solution providers will have to make design decisions that may not be acceptable to every organization. IDC believes solution providers should focus on providing maximum value to specific use cases versus building a general-purpose solution where there is a marginal cost-per-gigabyte savings versus traditional online storage solutions. The key to the cold storage market is dramatic cost savings achievement, which will motivate changes in application behavior. Marginalizing the cost of change will inhibit the overall cold storage adoption. LEARN MORE Related Research Primary Storage Data Efficiency with Solid State Storage (IDC #239312, February 2013) Worldwide Enterprise Storage for Public and Private Cloud 2012 2016 Forecast: Enabling Public Cloud Service Providers and Private Clouds (IDC #238595, December 2012) IDC's Worldwide Storage and the Cloud Taxonomy, 2012 (IDC #238396, December 2012) 12 #241005 2013 IDC

Worldwide Enterprise Storage Systems 2012 2016 Forecast Update (IDC #237886, November 2012) Enterprise Cloud Public and Private End-User Adoption Signals Continued Shifts in IT Spending (IDC #237171, October 2012) The Economic Benefit of Storage Efficiency Technologies (IDC #236221, August 2012) Worldwide File-Based Storage 2012 2016 Forecast: Solutions for Content Delivery, Virtualization, Archiving, and Big Data Continue to Expand (IDC #235910, July 2012) Adoption and Benefits of Storage and Data Efficiency Technologies in the U.S. Storage Market (IDC #236208, July 2012) Synopsis This IDC study examines the landscape of cold storage features, characteristics, and market trends driving toward cold storage solutions. According to Dan Iacono, research director, Storage, at IDC, "As the perceived value of data increases, more data will be kept for longer periods of time, placing tremendous burden on IT infrastructures and budgets. The willingness to sacrifice online storage response times to a range of seconds, minutes, or a few hours to dramatically reduce storage capacity cost has created a new and emerging cold storage market." The Storage Solutions: Storage and the Cloud program provides insights into the driving forces for cloud storage deployments and architecture and spending on storage in the IT infrastructure. The definitions provided in this study represent the scope of IDC's storage systems and cloud research. Copyright Notice This IDC research document was published as part of an IDC continuous intelligence service, providing written research, analyst interactions, telebriefings, and conferences. Visit www.idc.com to learn more about IDC subscription and consulting services. To view a list of IDC offices worldwide, visit www.idc.com/offices. Please contact the IDC Hotline at 800.343.4952, ext. 7988 (or +1.508.988.7988) or sales@idc.com for information on applying the price of this document toward the purchase of an IDC service or for information on additional copies or Web rights. Copyright 2013 IDC. Reproduction is forbidden unless authorized. All rights reserved. 2013 IDC #241005 13