1 White Paper Dedupe 2.0: What HP Has In Store(Once) By Jason Buffington, Senior Analyst June 2012 This ESG White Paper was commissioned by HP and is distributed under license from ESG.
2 White Paper: Dedupe 2.0: What HP Has In Store[Once] 2 Contents Introduction... 3 Deduplication Today... 4 Dedupe 1.0 Optimized Storage is Good... 4 Dedupe 1.5 Smarter Backup Servers Are Better... 5 Dedupe 2.0 Client-side Deduplication Is Best... 5 Deduplication Optimizations and Markers... 5 HP StoreOnce Backup... 6 HP StoreOnce Catalyst... 6 HP StoreOnce B HP Data Protector HP StoreOnce and Symantec OST... 8 The Bigger Truth... 9 All trademark names are property of their respective companies. Information contained in this publication has been obtained by sources The Enterprise Strategy Group (ESG) considers to be reliable but is not warranted by ESG. This publication may contain opinions of ESG, which are subject to change from time to time. This publication is copyrighted by The Enterprise Strategy Group, Inc. Any reproduction or redistribution of this publication, in whole or in part, whether in hard-copy format, electronically, or otherwise to persons not authorized to receive it, without the express consent of The Enterprise Strategy Group, Inc., is in violation of U.S. copyright law and will be subject to an action for civil damages and, if applicable, criminal prosecution. Should you have any questions, please contact ESG Client Relations at
3 Introduction White Paper: Dedupe 2.0: What HP Has In Store[Once] 3 In ESG s most recent IT spending research for 2012, Improve Data Backup and Recovery tied as the number one priority, as seen in Figure 1, with business continuity and disaster recovery (BC/DR) coming in the top ten as well. Figure 1. Most Important IT Priorities for 2012 Which of the following would you consider to be your organization's most important IT priorities over the next months? (Percent of respondents, N=614, ten responses accepted) Improve data backup and recovery Increased use of server virtualization Major application deployments or upgrades Manage data growth Information security initiatives Business continuity/disaster recovery programs Data center consolidation Desktop virtualization Mobile workforce enablement Deploying a "private cloud" infrastructure 30% 30% 29% 27% 27% 25% 24% 23% 22% 22% 0% 5% 10% 15% 20% 25% 30% 35% Source: Enterprise Strategy Group, Data protection is usually in the top ten, but why is it the top IT initiative in 2012? One only has to look at the others in the list to understand more: Server virtualization always near the top over the past few years, as virtualization becomes more commoditized, VM sprawl consumes more and more production storage, which in turn exacerbates legacy approaches to disk-based protection. In addition, not all backup technologies adequately protect virtual machines, often requiring integration with backup APIs such as VMware vstorage 5 VADP or Microsoft Volume Shadow Copy Services (VSS), which not every backup solution has chosen to implement or support to the same degree. Private cloud infrastructure every challenge with protecting virtualized environments is even more daunting in private cloud architectures, where self-service portals, as well as elastic load monitoring, can dynamically create new virtualized resources without any IT interaction with much less, awareness or automation of backup processes. Data growth along with VM sprawl, every other workload also continues to grow; as databases grow into big data, mailboxes continue to expand, with retention limiting the use of offline folders, and unstructured data growing exorbitantly, often with huge amounts of undiscovered redundancy. And, of course, as production data grows, so does disk-based backup solutions (often at 3-5 times or more the production size).
4 White Paper: Dedupe 2.0: What HP Has In Store[Once] 4 Deduplication Today With so many top IT challenges related to or caused by storage demands in production data or the backup system that protects it, deduplication has never been more of a priority or an understood necessity, as seen in Figure 2. Figure 2. Is Your Organization Currently Using any Data Deduplication Solutions? Is your organization currently using any data deduplication solutions? (Percent of respondents, N=323) No, and we have no plans to, 20% Don't know, 4% Yes, 37% No, but we plan to within 24 months, 16% Source: Enterprise Strategy Group, Deduplication technologies provide organizations with substantial data storage efficiency advantages when deployed as a part of their data protection infrastructures. While it does not stop the growth of protected data, it does provide an effective strategy for controlling that growth (at least within one s backups). Deduplication technologies and methodologies continue to evolve as vendors continue to develop improved ways of ensuring that data is transported and stored efficiently. The algorithms that analyze stored data are constantly improving in efficiency, with even a fraction of a percent improvement leading to gigabytes of saved storage. In addition to the algorithms improving, the location where deduplication occurs is also evolving: from simply being enhancements within secondary storage arrays to working across the network within the data path and across multiple devices. Dedupe 1.0 Optimized Storage is Good No, but we plan to within 12 months, 23% The first iteration of deduplication involved a single storage device. In the majority of "Dedupe 1.0" scenarios, the backup server treated the storage device like it would any other storage device and wrote all of its data to it. Even when the deduplicated storage already stored a significant number of items that the backup server was transmitting to it, the backup server was unaware and those redundant elements would be discarded upon receipt by the deduplication appliance. From a data protection perspective, while organizations did reduce their costs due to backup data being stored more efficiently, the time required to perform a backup wasn't improved because the same amount of backup data still needs to be written to the Dedupe 1.0 backup device. And, of course, if the backup server needed to replicate the data anywhere else, the data would be fully read and transmitted from one backup server to the next and then written in full to another (perhaps deduplication-capable) storage device at the alternate location.
5 White Paper: Dedupe 2.0: What HP Has In Store[Once] 5 Dedupe 1.5 Smarter Backup Servers Are Better Dedupe 1.5 improves upon Dedupe 1.0 by enabling the backup server to only send data to the storage device that isn't already present on the device. In effect, the backup server becomes smarter because it is aware of what is already in the deduplicated storage device. Dedupe 1.5 can also improve the speed with which backups are taken because the backup server may no longer be the bottleneck in sending data since less information is actually sent to the backup storage device. Of note is the fact that dedicated backup appliances that utilize deduplication are often thought of as Dedupe 1.5, since the backup software and deduplication storage are within the same device. While certainly better, Dedupe 1.5 does not typically solve the challenge of replicating data between sites without rehydrating; the backup server will read and send the whole data set from the deduplicated storage to another offsite backup server and its deduplicated storage regardless of what the secondary deduplication device might already have. Dedupe 2.0 Client-side Deduplication Is Best Dedupe 2.0 leverages intelligence and awareness at the source, backup server, and storage device. In these scenarios, the awareness of what data is already in the deduplicated storage and the discernment to send new data or not is performed within the production server instead of the backup server or deduplicated storage. In doing so, network savings begin at the production server and backups are often significantly faster since only changed data is transmitted from the production server to the storage solution. Deduplication Optimizations and Markers In addition to optimizing the algorithms and adjusting where in the backup flow deduplication discernment occurs, the most modern deduplication technologies are beginning to offer other evolutionary capabilities, as well. Don t Rehydrate Across Devices or Sites An effective deduplication solution minimizes the rehydration of data between devices. Consider an example of 500GB of data: after deduplication (3:1), the backup storage device may only be storing less than 200GB (including multiple iterations of small changes). Rapid Restores In legacy Dedupe 1.0 solutions, making a replica of that data at another location involved rehydrating the 500GB and transmitting all of it across the network to the target even if it then ends up being deduplicated down to the 200GB again. Often, Dedupe 1.5 solutions would only send the 200GB (keeping the data deduplicated), since the data movement logic in the backup server is deduplication aware. That s better than 1.0. But the best solutions understand that on the next backup, if 1GB was changed in the production server, then only 1GB of new data traverses through the backup server to the deduplicated storage and then only that 1GB is replicated to the secondary site and its deduplicated storage. A key to successfully deploying dedupe is ensuring that restore operations are just as fast when performed from deduplicated backup storage as they are when performed from storage that does not leverage deduplication. Any deduplication solution that substantially increases the amount of time required to perform a recovery operation due to rehydration delays is going to adversely impact data recovery service level agreements (SLAs). Fast (deduplicated) backup is nice, but doesn t mean as much without equally fast (deduplicated) restores.
6 White Paper: Dedupe 2.0: What HP Has In Store[Once] 6 HP StoreOnce Backup HP StoreOnce Backup technology is a "Dedupe 2.0 ready technology that was first introduced in StoreOnce deduplication is available in HP's B6200 Backup System appliance, backup/recovery software, and HP Data Protector. By using a consistent deduplication algorithm across the product suite (see Figure 3), HP StoreOnce minimizes problems related to different products with incompatible deduplication algorithms, which normally lead to comparatively slower recovery versus backup performance and decreased efficiency when it comes to scaling a deduplication solution to meet expanding capacity requirements. Figure 3. HP StoreOnce Using Common Deduplication Technology from Source, to Backup Server, to Storage HP StoreOnce Catalyst Source: HP, StoreOnce Catalyst (new in 2012) is a software accelerator that enables deduplication anywhere rather than just at those places in the network where specific vendor technologies allow it. StoreOnce Catalyst leverages a common deduplication algorithm across the enterprise and allows deduplication at the: Production source or Client. Backup or media server. Target appliance. By enabling source-side deduplication (Dedupe 2.0), HP has the advantage of performing deduplication at the data creation point. Source-side deduplication eliminates the need for specialist deduplication hardware at remote and branch office sites. StoreOnce Catalyst allows customers to align backup with data protection needs, such as minimizing bandwidth utilization when moving data between sites or data centers. As a key Dedupe 2.0 design goal, StoreOnce Catalyst provides a single technology that can be used in multiple locations on the network and that does not require rehydration when data is transferred between source server, backup device, and target appliance. As a software component, HP StoreOnce Catalyst supports HP Data Protector and Symantec NetBackup via OST Integration. Symantec Backup Exec support will be added in August, along with other software solutions that utilize the HP StoreOnce Catalyst software development kit (SDK). HP StoreOnce B6200 The HP StoreOnce B6200 Backup System, the largest in the StoreOnce family, is an enterprise deduplication appliance that uses a scale-out architecture. The B6200 can grow from a single, 48TB two-node couplet up to a capacity of 768TB. It also has a pay-as-you-grow model that allows nodes to be added to the configuration without incurring downtime.
7 White Paper: Dedupe 2.0: What HP Has In Store[Once] 7 Figure 4. The HP StoreOnce Family Source: HP, Also notable is the HP StoreOnce B6200 high availability features to mitigate the impact of node failures. Its Autonomic Restart feature configures backup jobs to restart after node failover without requiring the intervention of the data protection administrator. The StoreOnce B6200 is designed to automatically detect and remediate certain failures, hiding this complexity from data protection administrators and users. Performance According to HP, the HP StoreOnce B6200 Backup system is able to restore data at nearly the same rate at which it performs native backups. Table 1 shows HP s reported performance numbers for backups with the B6200. Table 1. HP StoreOnce B6200 Performance Numbers, as Reported by HP Native VTL performance (max. configuration) With source-side deduplication via StoreOnce Catalyst (max. configuration) Native VTL performance (single couplet) With source-side deduplication via StoreOnce Catalyst (single couplet) 40 TB/hour 100 TB/hour 10 TB/hour 25 TB/hour According to HP, this places the B6200 in a strong position against its closest (unnamed) competitor at: Native VTL performance (max config) 2.7 times its competition With source-side deduplication (max config) 3.2 times its competition Native VTL performance (single couplet) 1.2 times its competition With source-side deduplication (single couplet) 1.7 times its competition Source: HP, According to HP, the performance gains shown in Table 1 are largely due to StoreOnce Catalyst, which enables the deduplication tasks to be distributed across the data protection infrastructure. The ability to perform deduplication at the source, at the backup server, and at the appliance gives organizations new options to help them minimize the growth of data under protection. In providing a solution that works at the source, backup server, and appliance level, HP offers a comprehensive Dedupe 2.0 solution.
8 White Paper: Dedupe 2.0: What HP Has In Store[Once] 8 Backing up the data is one thing, but as the amount of protected data grows, the ability to restore it fast enough to meet SLA agreements becomes more and more challenging. Notably, HP reports that its StoreOnce B6200 can ingest data (in VTL mode) and restore data at up to 40 TB/hour. This is especially impressive since many deduplication technologies suffer an I/O penalty that causes them to restore at much slower rates than their published ingest rate. But at the end of the day, the most important performance number is price-performance ($/TB/hour), in which HP is claiming that it offers 75% better price-performance than its primary competition. HP Data Protector 7 Also new to the HP StoreOnce family is the addition of HP Data Protector (now shipping version 7) to the portfolio. HP Data Protector 7 (HP DP7) is designed to take advantage of the deduplication capabilities of the StoreOnce family and its Catalyst accelerator to enable client-side, media-centric, or target-based deduplication. In addition, by utilizing snapshots from HP storage arrays, HP DP7 can perform instant recoveries of complex workloads or large datasets. HP DP7 also takes advantage of another HP acquisition: the Autonomy Cloud. Currently boasting 50PB, the HP Autonomy cloud is a superset of what was once Iron Mountain Digital s online storage and recovery repository and is now an ideal location to replicate HP DP7 data for offsite preservation. HP DP7 also offers expanded hypervisor support VMware, Microsoft Hyper-V, and Citrix as well as enterprise application protection for Microsoft, Oracle, and SAP server platforms, including granular data restore capabilities for Microsoft Exchange, SharePoint, and VMware. As a last note, HP DP7 also supports HP Autonomy s IDOL 10 framework to deliver meaning-based data protection, resulting in data protection, retention, and indexing based on the types and context of data in an environment. HP StoreOnce and Symantec OST The benefits of HP StoreOnce are not limited to just Data Protector. Via a Symantec OST standard plug-in, a NetBackup Media Server is able to utilize HP StoreOnce deduplicated storage, enabling smarter transit from the media server to the deduplicated storage, but also between HP StoreOnce arrays without rehydration managed by NetBackup. Figure 5. HP StoreOnce Using Common Deduplication Technology from Source, to Backup Server, to Storage Source: HP, While this design does not use StoreOnce Catalyst on the production server, the architecture still qualifies as a Dedupe 2.0 solution since Symantec NetBackup 7.5 has its own client-side deduplication and a new feature called Accelerator for optimized discernment from the production node, as well as utilizing the deduplication capabilities within the StoreOnce appliances.
9 The Bigger Truth White Paper: Dedupe 2.0: What HP Has In Store[Once] 9 Between increased use of server virtualization, the dynamic proliferation of private cloud, the ever-growing unstructured data pool, and the advent of big data, organizational data is going to continue to grow. Effective deduplication reduces the impact of that growth, lessening the strain of constant storage growth on organizations. All organizations need an effective deduplication strategy; ignoring deduplication will lead to data protection infrastructure being swamped with data. Choosing a deduplication strategy involves assessing a range of factors such as data type and location. By being able to support deduplication at the data source, backup server, and appliance, HP StoreOnce gives data protection solution architects a high degree of flexibility in developing a deduplication solution tailored to their needs. By delivering a single solution through HP Data Protector paired with StoreOnce appliances, and a joint solution that offers StoreOnce capabilities through Symantec OST capable software, HP has taken aim at a unified and broadly applicable data protection solution that challenges the presumptions of deduplicated storage. With the addition of cloud capabilities and the understanding of data through Autonomy s assets, the data protection game just got more interesting. Deduplication is not new and HP was certainly not the first to market it. Instead, by watching how deduplication was introduced and listening to the evolving demands of customers who struggle with storage and backup issues, HP built StoreOnce as a next generation or Dedupe 2.0 architecture that is available now. With its formidable enterprise experience, server and storage product lines, and broad partner ecosystem, HP intends to catch up with, and in fact, surpass the status quo to bring better deduplication at a lower cost. If you haven t already, 2012 is the year to wholly adopt deduplication within your data protection strategy. With HP s newest offerings, the short list of enterprise-suitable deduplication for data protection providers just got bigger adding a name that many view as synonymous with enterprise servers and their storage.
10 20 Asylum Street Milford, MA Tel: Fax: AA4-1782ENW
Proven Infrastructure Guide EMC VSPEX PRIVATE CLOUD VMware vsphere 5.5 for up to 1,000 Virtual Machines Enabled by Microsoft Windows Server 2012 R2, EMC VNX Series, and EMC Powered Backup EMC VSPEX Abstract
Best Practices and Recommendations for Scale-up Deployments of SAP HANA on VMware vsphere DEPLOYMENT AND TECHNICAL CONSIDERATIONS GUIDE Table of Contents Introduction...................................................................
WHITE PAPER: DATA PROTECTION........................................ Your Backup Is Not an Archive Who should read this paper Backup Administrators, IT Administrators, IT Managers WHITE PAPER: DATA PROTECTION
Web Scale IT in the Enterprise It all starts with the data Issue 1 2 Q&A With Claus Moldt, Former Global CIO for SalesForce.com and David Roth, CEO of AppFirst 6 From the Gartner Files: Building a Modern
white paper Public or Private Cloud: The Choice is Yours Current Cloudy Situation Facing Businesses There is no debate that most businesses are adopting cloud services at a rapid pace. In fact, a recent
WHITE PAPER Simplifying IT to Drive Better Business Outcomes and Improved ROI: Introducing the IT Complexity Index Sponsored by: Oracle Michael Fauscette June 2014 Randy Perry EXECUTIVE SUMMARY: REDUCING
What s New in the VMware vsphere 6.0 Platform VERSION 1.1/TECHNICAL WHITE PAPER MARCH 2015 Table of Contents Introduction.... 3 vsphere Hypervisor Enhancements.... 3 Scalability Improvements.... 3 ESXi
Special Publication 800-125 Guide to Security for Full Virtualization Technologies Recommendations of the National Institute of Standards and Technology Karen Scarfone Murugiah Souppaya Paul Hoffman NIST
A Requirement for Virtualization and Cloud Computing An ENTERPRISE MANAGEMENT ASSOCIATES (EMA ) White Paper Prepared for FrontRange Solutions October 2012 IT & DATA MANAGEMENT RESEARCH, INDUSTRY ANALYSIS
How AWS Pricing Works May 2015 (Please consult http://aws.amazon.com/whitepapers/ for the latest version of this paper) Page 1 of 15 Table of Contents Table of Contents... 2 Abstract... 3 Introduction...
An Oracle White Paper June 2013 Oracle Real Application Clusters One Node Executive Overview... 1 Oracle RAC One Node 12c Overview... 2 Best In-Class Oracle Database Availability... 5 Better Oracle Database
MOBILE FIRST ENTERPRISE 1 White Paper Mobile-first Enterprise: Easing the IT Burden 10 Requirements for Optimizing Your Network for Mobility 2 MOBILE FIRST ENTERPRISE Table of Contents Executive Summary
A Forrester Consulting Thought Leadership Paper Commissioned By SAP Real-Time Data Management Delivers Faster Insights, Extreme Transaction Processing, And Competitive Advantage June 2013 Table Of Contents
ATA DRIVEN GLOBAL VISION CLOUD PLATFORM STRATEG N POWERFUL RELEVANT PERFORMANCE SOLUTION CLO IRTUAL BIG DATA SOLUTION ROI FLEXIBLE DATA DRIVEN V WHITE PAPER Maximize IT for Real Business Advantage 3 Key
TABLE OF CONTENTS Introduction... 3 The Importance of Triplestores... 4 Why Triplestores... 5 The Top 8 Things You Should Know When Considering a Triplestore... 9 Inferencing... 9 Integration with Text
Two Value Releases per Year How IT Can Deliver Releases with Tangible Business Value Every Six Months TABLE OF CONTENTS 0 LEGAL DISCLAIMER... 4 1 IMPROVE VALUE CHAIN AND REDUCE SG&A COSTS IN FAST CYCLES...
An Oracle White Paper October, 2013 Delivering Database as a Service (DBaaS) using Oracle Enterprise Manager 12c Executive Overview...2 Evolution of Database as a Service...2 Managing the Database Lifecycle...4
Privacy and Tracking in a Post-Cookie World A whitepaper defining stakeholder guiding principles and evaluating approaches for alternative models of state management, data transparency and privacy controls
CLOUD COMPUTING: IS YOUR COMPANY WEIGHING BOTH BENEFITS & RISKS? Toby Merrill CLOUD COMPUTING: IS YOUR COMPANY WEIGHING BOTH BENEFITS & RISKS? Toby Merrill Toby Merrill, Thomas Kang April 2014 Cloud computing
I D C I V I E W E x t r a c t i n g Value from Chaos June 2011 By John Gantz and David Reinsel Sponsored by EMC Corporation Content for this paper is excerpted directly from the IDC iview "Extracting Value
The Fast Close: Are We There Yet? An Oracle White Paper Updated July 2008 The Fast Close: Are We There Yet? Companies that are able to close their books quickly and deliver more-timely information to external
I D C I V I E W The Digital Universe Decade Are You Ready? May 2010 By John Gantz and David Reinsel Sponsored by EMC Corporation Content for this paper is excerpted directly from the IDC iview, "The Digital
Solutions Guide for Data-At-Rest - 2 - SSIF Guide to Data-At-Rest Solutions Table of Contents Introduction... 5 Why Should You Encrypt Your Data?... 6 Threat Model for Data-at-Rest... 7 Encryption Strength...