89 Fifth Avenue, 7th Floor. New York, NY 10003. www.theedison.com 212.367.7400. White Paper. HP 3PAR Thin Deduplication: A Competitive Comparison



Similar documents
89 Fifth Avenue, 7th Floor. New York, NY White Paper. HP 3PAR Adaptive Flash Cache: A Competitive Comparison

EMC XTREMIO EXECUTIVE OVERVIEW

HPe in Datacenter HPe3PAR Flash Technologies

FLASH 15 MINUTE GUIDE DELIVER MORE VALUE AT LOWER COST WITH XTREMIO ALL- FLASH ARRAY Unparal eled performance with in- line data services al the time

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Protecting Information in a Smarter Data Center with the Performance of Flash

HP STOREONCE RECOVERY MANAGER CENTRAL

How To Get The Most Out Of An Ecm Xtremio Flash Array

IOmark- VDI. HP HP ConvergedSystem 242- HC StoreVirtual Test Report: VDI- HC b Test Report Date: 27, April

White Paper. Fueling Successful SMB Virtualization with Smart Storage Decisions. 89 Fifth Avenue, 7th Floor. New York, NY

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January Permabit Technology Corporation

MaxDeploy Hyper- Converged Reference Architecture Solution Brief

June Blade.org 2009 ALL RIGHTS RESERVED

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

SolidFire and NetApp All-Flash FAS Architectural Comparison

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Understanding Data Locality in VMware Virtual SAN

XtremIO Flash Memory, Performance & endurance

EMC VNXe3200 UFS64 FILE SYSTEM

Nutanix Tech Note. Configuration Best Practices for Nutanix Storage with VMware vsphere

ASKING THESE 20 SIMPLE QUESTIONS ABOUT ALL-FLASH ARRAYS CAN IMPACT THE SUCCESS OF YOUR DATA CENTER ROLL-OUT

Evolving Datacenter Architectures

HP 3PAR StoreServ 8000 Storage - what s new

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

IBM Spectrum Scale vs EMC Isilon for IBM Spectrum Protect Workloads

Deep Dive on SimpliVity s OmniStack A Technical Whitepaper

DIABLO TECHNOLOGIES MEMORY CHANNEL STORAGE AND VMWARE VIRTUAL SAN : VDI ACCELERATION

The Shortcut Guide to Balancing Storage Costs and Performance with Hybrid Storage

Kaminario K2 All-Flash Array

Microsoft Private Cloud Fast Track

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

How Flash Storage is Changing the Game

HP Smart Array Controllers and basic RAID performance factors

21 st Century Storage What s New and What s Changing

FLASH IMPLICATIONS IN ENTERPRISE STORAGE ARRAY DESIGNS

High Velocity Analytics Take the Customer Experience to the Next Level

EMC VNX2 Deduplication and Compression

How To Make A Backup System More Efficient

Software Defined Storage Needs A Platform

Delivering SDS simplicity and extreme performance

Redefining Microsoft SQL Server Data Management. PAS Specification

EMC - XtremIO. All-Flash Array evolution - Much more than high speed. Systems Engineer Team Lead EMC SouthCone. Carlos Marconi.

WHITE PAPER 1

A Dell Technical White Paper Dell Compellent

VDI Without Compromise with SimpliVity OmniStack and Citrix XenDesktop

... 3 HP 3PAR StoreServ HP 3PAR StoreServ HP 3PAR StoreServ HP

All-Flash Arrays: Not Just for the Top Tier Anymore

Best Practices for Deploying Citrix XenDesktop on NexentaStor Open Storage

Mixed All-Flash Array Delivers Safer High Performance

Technology Insight Series

Data Deduplication: An Essential Component of your Data Protection Strategy

Capitalizing on Smarter and Faster Insight with Flash

Everything you need to know about flash storage performance

THE SUMMARY. ARKSERIES - pg. 3. ULTRASERIES - pg. 5. EXTREMESERIES - pg. 9

Springpath Data Platform

Maxta Storage Platform Enterprise Storage Re-defined

Desktop Virtualization and Storage Infrastructure Optimization

Using VMWare VAAI for storage integration with Infortrend EonStor DS G7i

The Advantages of Flash Storage

VMware Virtual SAN Design and Sizing Guide TECHNICAL MARKETING DOCUMENTATION V 1.0/MARCH 2014

Nimble Storage for VMware View VDI

EMC Backup and Recovery for Microsoft SQL Server 2008 Enabled by EMC Celerra Unified Storage

Flash Storage Roles & Opportunities. L.A. Hoffman/Ed Delgado CIO & Senior Storage Engineer Goodwin Procter L.L.P.

SYMANTEC NETBACKUP APPLIANCE FAMILY OVERVIEW BROCHURE. When you can do it simply, you can do it all.

Dell Compellent Storage Center SAN & VMware View 1,000 Desktop Reference Architecture. Dell Compellent Product Specialist Team

EMC FLASH STRATEGY. Flash Everywhere - XtremIO. Massimo Marchetti. Channel Business Units Specialty Sales EMC massimo.marchetti@emc.

Exchange Storage Meeting Requirements with Dot Hill

Whitepaper. NexentaConnect for VMware Virtual SAN. Full Featured File services for Virtual SAN

Speeding Up Cloud/Server Applications Using Flash Memory

XtremIO DATA PROTECTION (XDP)

HP 3PAR storage technologies for desktop virtualization

NEXENTA S VDI SOLUTIONS BRAD STONE GENERAL MANAGER NEXENTA GREATERCHINA

HP Flash Storage as part of the Converged Infrastructure

Microsoft SQL Server 2014 Fast Track

New Features in PSP2 for SANsymphony -V10 Software-defined Storage Platform and DataCore Virtual SAN

Backup and Recovery Best Practices With CommVault Simpana Software

IOmark-VM. DotHill AssuredSAN Pro Test Report: VM a Test Report Date: 16, August

Protect Microsoft Exchange databases, achieve long-term data retention

FLASH STORAGE SOLUTION

Understanding Storage Virtualization of Infortrend ESVA

SQL Server Virtualization

All-Flash Arrays Weren t Built for Dynamic Environments. Here s Why... This whitepaper is based on content originally posted at

Business white paper Invest in the right flash storage solution

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE. Matt Kixmoeller, Pure Storage

Introduction to NetApp Infinite Volume

DeltaStor Data Deduplication: A Technical Review

FlashSoft Software from SanDisk : Accelerating Virtual Infrastructures

Moving Virtual Storage to the Cloud

Why Choose VMware vsphere for Desktop Virtualization? WHITE PAPER

Guide to the Flash Storage Revolution

Cloud Optimize Your IT

Redefining Microsoft Exchange Data Management

Best Practices for Architecting Storage in Virtualized Environments

Worldwide All-Flash Array and Hybrid Flash Array Forecast and 1H14 Vendor Shares

CONFIGURATION GUIDELINES: EMC STORAGE FOR PHYSICAL SECURITY

How To Create A Flash-Enabled Storage For Virtual Desktop 2.5 (Vdi) And 3.5D (Vdi) With Nimble Storage

INTRODUCTION TO THE EMC XtremIO STORAGE ARRAY (Ver. 4.0)

StarWind Virtual SAN for Microsoft SOFS

Answering the Requirements of Flash-Based SSDs in the Virtualized Data Center

Deploying Affordable, High Performance Hybrid Flash Storage for Clustered SQL Server

Transcription:

89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 White Paper HP 3PAR Thin Deduplication: A Competitive Comparison

Printed in the United States of America Copyright 2014 Edison Group, Inc. New York. Edison Group offers no warranty either expressed or implied on the information contained herein and shall be held harmless for errors resulting from its use. All products are trademarks of their respective owners. First Publication: June 2014 Produced by: Chris M. Evans, Senior Analyst; Manny Frishberg, Editor; Barry Cohen, Editor- in- Chief

Table of Contents Executive Summary... 1 Introduction... 2 Objective... 2 Audience... 2 Contents of this Report... 2 Space Optimization in Primary Storage... 3 Data Deduplication... 4 Technical Features... 4 Managing Resiliency... 5 Making the Cost of Flash Acceptable... 5 Anticipated Space Savings... 5 HP 3PAR Thin Deduplication: Deep Dive... 6 Background... 6 Hardware Acceleration... 6 Thin Deduplication Implementation... 6 Express Indexing... 7 Thin Clones... 7 Space Savings and Write Efficiency... 8 Competitive Analysis... 9 SolidFire Storage System... 9 Pure Storage FlashArray... 10 EMC XtremIO... 11 Conclusions and Recommendations... 12 Interpreting Savings... 12

Executive Summary As data growth continues at exponential rates, IT departments are being asked to deliver storage at ever- increasing levels of efficiency the classic do more with less dilemma. At the same time, traditional storage arrays are failing to keep up with I/O density requirements and customers are transitioning to all- flash systems, which have a much higher raw $/GB price point. Space reduction technologies such as thin provisioning, compression and data deduplication form a key strategy in all- flash systems by helping businesses meet their storage needs while driving high levels of efficiency. HP 3PAR StoreServ s thin deduplication feature continues the story of delivering value to customers through optimizing the way their shared storage systems store data. Thin deduplication further leverages the use of HP 3PAR s custom application- specific integrated circuit (ASIC) to minimize the impact of performing deduplication inline as data is written to the array. Strong data integrity is maintained through additional integrity checks on every deduplicated write, a process that is achieved at line speed using the ASIC technology. HP 3PAR StoreServ thin deduplication is the latest feature in a line of thin technologies, including thin provisioning, thin persistence and thin reclaim that deliver value and cost savings to the customer. Each of the technologies is fully built- in to the 3PAR StoreServ architecture. In this study, HP 3PAR StoreServ was compared to competing all- flash offerings from SolidFire, Pure Storage and EMC. All of the solutions offer inline (real time) deduplication, although FlashArray from Pure Storage does do some post- processing of data. Both SolidFire and Pure Storage integrate compression into their space saving technologies (and their savings figures). Only HP 3PAR and Pure Storage offer additional data integrity checking through hash verification. From thin deduplication alone (not including Zero Page Detect), HP 3PAR StoreServ achieves up to a 10:1 savings, depending on the data type in use. This exceeds the figures claimed by the three competing platforms, two of which also include compression technology and pattern detection in their calculated figures. In summary, thin deduplication, added to the existing set of thin technologies, extends HP 3PAR StoreServ s leadership in offering customers highly efficient, highly scalable primary storage for every enterprise requirement. Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 1

Introduction Objective This report looks at the implementation of data deduplication on the HP 3PAR StoreServ storage platform and compares the features and functionality offered to equivalent products in the marketplace today. The constant drive to do more with less means all space reduction technologies are valuable tools for increasing the level of efficiency in primary storage arrays. The ubiquity of flash, as we will discuss, means primary deduplication is ready for production implementation. Audience Decision makers in organizations, looking to deliver highly efficient deployments of centralized storage will find this report provides an understanding of the technical issues in deploying deduplication and the resultant benefits it can deliver. Contents of this Report Executive Summary A summary of the background and conclusions derived from Edison s research and analysis. Space Optimization in Primary Storage A primer on the evolution of shared storage and space savings techniques that help to manage exponential growth. HP 3PAR Thin Deduplication: Deep Dive An in- depth discussion on the features and functionality of the HP 3PAR StoreServ thin deduplication feature. Competitive Analysis An examination of the implementation of deduplication in competitive storage platforms with comparison to HP 3PAR StoreServ. Conclusions and Recommendations A summary of the findings from the research. Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 2

Space Optimization in Primary Storage The exponential rate of data growth has been a significant challenge for many organizations to manage since the introduction of shared storage over 20 years ago. Demand for storage is insatiable, with estimates on growth varying from 50-100 percent per annum. To help manage growth, storage vendors have implemented software features that optimize the use of physical storage capacity. These include: Thin Provisioning this is a space reduction technique that stores only host- written data to disk. Space savings are made through storing only the actual data written to each volume, rather than reserving out the whole capacity of the volume in thick provisioned implementations. Thin provisioning solutions can save anywhere from 35-75 percent of physical disk capacity, depending on the data profile, however ongoing housekeeping is required to keep efficiency at optimum levels. HP 3PAR StoreServ systems see an average of 65 percent based on field data. Zero Page Reclaim this space reduction technique identifies pages of empty or zeroed data and removes them from physical disk, retaining metadata information to indicate the logical page in the volume is empty. Most solutions use post- processing zero page reclaim (ZPR) as the overhead of identifying empty pages in real time impacts I/O performance. However, the HP 3PAR StoreServ platform is unique in using a dedicated ASIC processor that identifies and eliminates zero pages in real time (known as Inline Zero Detect), reducing disk I/O and saving on disk capacity. Data Compression this is a space reduction technique that identifies repeated patterns or redundancy in data and removes it, leaving in place metadata that allows the original information to be recreated. Although compression can make significant savings, the overhead on processor requirements means many vendors have chosen not to implement the technology. Space Efficient Snapshots and Clones although not directly a space reduction technique, snapshots and clones of primary data can be taken space- efficiently, using metadata to track the differences between the primary volume and the snapshots. On some architectures there are performance implications from using snapshots; some also require space to be reserved for a snapshot pool, however no such restrictions exist within the HP 3PAR StoreServ platform. Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 3

Data Deduplication Deduplication is a space reduction technique that identifies redundant or duplicate data in physical storage, removing the redundant copies to retain a single copy of data on disk. Metadata (in the form of lookup tables in memory) is used to map logical volumes to the single instance copies of data. Significant savings in physical disk capacity can be achieved where systems contain lots of similar or repeated data, such as virtual server and virtual desktop environments. To date, deduplication has been widely used in disk backup systems where savings of 90-95 percent, or over 20:1 reduction in physical capacity have been realized. Technical Features Some of the technical features of data deduplication include: Inline/Post Processing data deduplication can be performed either as data is being committed to disk, in which case it is known as inline, or after the data is on disk, so- called post processing. Inline processing requires fast efficient algorithms in order to minimize any impact on performance, with the added benefit that space savings are realized immediately. Post processing removes any direct performance impact, however physical disk space usage will vary as data is written to disk and deduplication is performed as a background task. Fixed/Variable Block Size deduplication techniques identify potentially duplicate data either using fixed or variable data block techniques. Variable block algorithms typically produce higher deduplication ratios than fixed- block solutions but require more processing overhead. Smaller fixed block sizes tend to produce more efficient results, but cost more in terms of processor overhead and system memory through additional metadata lookups. Data Hashing hashing refers to the process of generating a unique checksum value from a block of data. The hash value from each block is used as the fingerprint to reference that data in metadata tables and when comparing new data for deduplication. Hashing techniques vary in their reliability, with some algorithms generating the same hash value for different data, known as a hash collision. There is a balance to be struck between the complexity of the hash algorithm and the impact on performance, so some implementations use lightweight hashing and validate all data before confirming duplicates. Data Profile the deduplication of data results in a more randomized pattern of access for a single volume, as the original physical locations for blocks of data are not Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 4

determined by the logical volume layout. Random data access is more difficult for HDD- based storage arrays to manage as random I/O results in a lot of latency from mechanical disk head movement. Flash storage on the other hand has no such issues, making this technology highly suited to managing deduplicated data. Managing Resiliency In systems that are highly deduplicated, a single block of data may be a component in tens or hundreds of logical volumes. As a result, the impact of losing data due to a hardware failure is much higher than in non- deduplicated environments. Data loss could occur through logical corruption (due to a software bug) or through hardware failure (such as two disks failing in a RAID group using single parity). Some deduplication implementations are enabled by default and cannot be disabled by the administrator, which may be undesirable for certain data types. Making the Cost of Flash Acceptable All- flash arrays are a recent entrant into the shared storage marketplace. These appliances use flash exclusively as the permanent storage medium. Flash is much more expensive per GB than traditional hard drives, and as a result, vendors of these products have looked to find ways to make the cost of all- flash arrays more acceptable based on the historical $/GB measurement. One solution has been to quote array capacities after space reduction savings have been applied. The result is a much more palatable cost that is more in line with traditional disk- based arrays. However basing purchasing decisions on anticipated space savings can be risky, unless the data profile is well known or validated first. Anticipated Space Savings The aim of deduplication is to make savings on physical disk space. Savings vary with the type of data being optimized, with highly redundant data such as virtual server and VDI (Virtual Desktop Infrastructure) deployments seeing the best results. Structured data, encrypted data and media content does not usually realize much in the way of savings as the data usually already been optimized by the application. Data savings may also change over time as information is created and destroyed through a normal lifecycle. The savings made from deduplication should therefore be seen more as an additional benefit rather a core capacity measurement. Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 5

HP 3PAR Thin Deduplication: Deep Dive Background The HP 3PAR StoreServ architecture is based on a cache coherent active- mesh cluster comprised of multiple controller nodes and disk shelves. All controllers participate in data access, in an active- active configuration, ensuring that all resources on all nodes are used to service I/O requests. The HP 3PAR OS uses a three- level mapping methodology similar to that used in enterprise operating systems to store and track physical and virtual resources. With the introduction of flash technology, the HP 3PAR StoreServ architecture is ideally placed to exploit faster storage media, through features that include the existing range of thin technologies. Physical space on backend storage is divided up into 1GB units known as chunklets. Chunklets are then combined to create logical disks (LDs), applying data protection (RAID) and data placement rules to each LD. Virtual volumes (VVs) or logical unit numbers (LUNs) are then created out of logical disks as the entity that is assigned to hosts using a page size granularity of 16KiB. Data resilience is achieved by distributing data across multiple nodes, disk shelves and disks. Hardware Acceleration One of the key differentiators of the 3PAR StoreServ platform is the use of a custom hardware controller, or ASIC. The ASIC, now in its fourth generation, provides line speed zero page detect for each 16KiB block of data written to the array. It is a core technology in delivering the existing 3PAR StoreServ thin technologies, including thin provisioning, thin persistence, thin conversion and thin copy reclamation. Thin Deduplication Implementation Thin deduplication is a new feature initially implemented on HP 3PAR StoreServ 7450 Storage Systems deployed with the generation four ASIC or later. The feature is provided as a no- cost option within the base HP 3PAR OS suite, providing customers with the option for immediate cost savings at no additional charge. Thin deduplication is available for both virtual volumes and snapshots. Thin deduplication is an inline deduplication process that takes advantage of the generation four ASIC to perform hash calculations of each 16KiB block of data as it is Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 6

written to the system. When data is received by the system, the hash calculation effort is offloaded to the ASIC and delivered at wire speed. The array then uses a feature called Express Indexing to check whether the new data already exists in the system. If a hash match is found, the ASIC is used to do a bit- by- bit comparison of the new data with the copy on the backend flash to ensure no hash collision has occurred. As this function is offloaded to the ASIC and performed at line speed, there is negligible CPU overhead. Express Indexing The HP 3PAR operating system uses a process called Express Indexing to detect duplicate page data. The process takes advantage of the innovative and robust tri- level indexing system used within the OS to store and manage traditional (non- deduplicated) volumes. When data is received by the array, Express Indexing calculates a hash value for each 16KiB block of data. The hash value is then used to check whether the new data block already exists on the system by walking the metadata tables using the hash value. If the block of data is located, it is read from the backend and compared at a bit level (using XOR) in the ASIC. The XOR of two equal pages will result in a page of zeros that will also be detected in line by leveraging the ASIC zero detection built in engine. A successful comparison results in a dedupe hit, in which case the virtual volume LBA pointers are updated to reference the located data and the incoming data is discarded. In the unlikely event a hash collision is detected, then the data is stored to disk directly associated with the virtual volume and not treated as deduplicated. If the new data was not located at lookup, a new data block is allocated and the data is written to backend storage. With this innovative technique the HP 3PAR StoreServ solution makes efficient use of existing memory structures to track unique and deduplicated data and map it to virtual volumes. With the 3PAR memory structure design there is no need to keep reference counts to shared data as any unreferenced data is eventually cleaned up as part of an online garbage collection process via a mark and sweep algorithm. Thin Clones The abstraction of logical and physical volume content through deduplication provides the ability to implement features such as thin clones. A thin clone is a replica of a volume that is created through copying only the metadata that associates a virtual Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 7

volume with the physical data on disk. At initial creation, thin clones point to the same blocks of data as the cloned volume, however as volumes are updated and the content of data changes, new writes will map to different deduplicated blocks (or create new blocks), so no direct overwrite process occurs. Thin clones continue to stay thin if updated data continues to map to existing deduplicated data on the array. Thin clones allows HP 3PAR StoreServ to implement highly efficient and instant volume copies for hypervisor cloning functions such as VAAI on VMware vsphere and ODX on Microsoft s Hyper- V. Space Savings and Write Efficiency HP 3PAR Thin Deduplication has been shown to deliver savings of up to 10:1, depending on the source data. This matches the levels of savings claimed by other all- flash storage vendors. HP has also done research on the differences between using the default 16KiB block size of the 3PAR StoreServ platform and the lower 4KiB size used by other platforms. Tests showed a modest improvement in savings of less than 15 percent. As a result, HP chose to remain with the existing 16KiB block size as this resulted in the optimum use of processor and memory resources. HP also looked at telemetry data from tens of thousands of existing customer systems. These showed the sweet spot for deduplication was between 8KiB and 16KiB in block size. Values lower than this saw some modest improvement in savings but introduced higher system load. HP 3PAR StoreServ s write striping capability means that write I/O across SSDs are distributed evenly, reducing the risk of catastrophic device failure. HP provides a 5- year unconditional warranty on cmlc drives in StoreServ systems. Inline Zero Detect means data is removed from the I/O pipeline and not written to backend storage, further reducing the wear on SSD devices. Finally features such as Adaptive Write and Adaptive Sparing provide additional SSD management, resulting in extending SSD capacity by a further 20 percent. All of the features described are fully integrated with the new thin deduplication technology. Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 8

Competitive Analysis Data deduplication has not been widely adopted in the primary storage marketplace, however the all- flash array vendors have used the technology as part of new architecture designs. The notable exception to early deduplication adoption is NetApp, who introduced deduplication technology into Data ONTAP as early as 2007. Unfortunately, this implementation was based on post- processing data and consequently limited aggregate size due to the performance impact of the post- processing task. In the all- flash startup market, deduplication has become a table stakes feature with vendors looking to emphasize the effective cost per GB of their products after space saving techniques have been applied. This has caused problems for Violin Memory, who have no native space reduction technologies in their products. Three vendors offering deduplication have been chosen as a comparison to the HP 3PAR StoreServ technology. These are SolidFire s Storage System, Pure Storage FlashArray and EMC XtremIO. All of these systems are new technology from startups and therefore have deduplication built into their architecture. SolidFire Storage System SolidFire s Storage System has been available since 2012, evolving through three generations of hardware and six generations of the platform s Element operating system. The SolidFire architecture is a scale- out shared nothing loosely coupled node design, which uses a back- end 10GbE network for inter- node communication. Systems can expand and shrink by adding and removing nodes. Data protection is implemented through simple mirroring of data between nodes. SolidFire uses a content- based data placement algorithm to distribute data evenly across a node complex. Space reduction is achieved through a combination of both data deduplication and compression. As data is received by the system, it is divided into 4KiB blocks and compressed before being hashed. The content is then routed to the node responsible for managing that hashing range of data. If the new data is found to be a duplicate, then a reference to the content is stored against the volume and the node discards it; if the data is unique it is written to SSD. New deduplicated data is not checked before writing to disk. Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 9

Compressing data as it is written to the system results in blocks of variable length, which are then written in a tightly packed arrangement on backend storage. The means as data is expired from the system, housekeeping is required to reclaim usable space and restack content on physical media. SolidFire delivers inline deduplication based on a 4KiB block size and is always enabled. The company claims between 4:1 and 10:1 efficiency savings, based on both compression and deduplication, although no breakdown of each method is given. Pure Storage FlashArray Pure Storage released their first FlashArray product in May 2012. The system is built on a scale- up architecture consisting of dual active- active redundant node controllers and shelves of solid- state disk (SSD). FlashArray uses five different techniques for data reduction1, all known together as FlashReduce. The components are: Pattern Removal this looks for repeated patterns in data including identifying zeroed data. Inline Compression this process uses a lightweight implementation of the LZO (Lempel- Ziv- Oberhumer) algorithm and is a first pass at compression inline before data is committed to disk. Adaptive Inline Deduplication deduplication is performed inline using a variable- size block deduplication algorithm, based on blocks from 4KiB to 32KiB in 512 byte increments (the minimum size is based on SSD page writes, which are 4KiB). Deep Reduction this process uses a patent pending form of the Huffman encoding algorithm and is performed as a post- processing task to achieve more aggressive space savings. Copy Reduction all snapshots and clones in a FlashArray system are deduplication aware. This feature is also implemented in the HP 3PAR StoreServ platform. Deduplication is always enabled within FlashArray systems, however the architecture allows the deduplication process to be curtailed during periods of heavy system load. In this scenario, hash lookups may be abandoned and potentially duplicate data written to disk. As a result, FlashArray uses the Deep Reduction feature to identify missed deduplication opportunities and to apply compression more aggressively than could be achieved inline. 1 http://www.purestorage.com/blog/pure- storage- flash- bits- adaptive- data- reduction/ Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 10

FlashArray deduplication cannot be disabled on a per- volume basis; all volumes have deduplication applied to them. Pure Storage quote their space savings using a real- time ticker on their website, which shows savings based on information from customer arrays. This shows an overall reduction rate of 5.72:1, with 2.13:1 achieved from deduplication and 2.68:1 from compression. EMC XtremIO EMC acquired the Israeli startup XtremIO in 2012, with the first GA products shipping at the end of 2013. The all- flash XtremIO platform is based on a scale- out node architecture of paired controllers called X- Bricks, which encapsulate a fixed amount of flash (25 drives) per controller pair. Multiple X- Bricks are connected through an RDMA mesh. The XtremIO design uses a content- based data placement architecture where data is stored in 4KiB blocks based on the hash value generated by each write I/O. This results in an even distribution of data across all nodes in a system, with each node managing a part of the hash value address space. The distribution mechanism means system expansion is a non- trivial exercise and currently XtremIO systems cannot be expanded. The XtremIO operating system (XIOS) runs a number of processes (called modules) that manage data flow in the XtremIO system. As write I/Os are received, the Routing module splits the data into 4KiB chunks and calculates the hash value of each chunk. The Control module maintains a hash table list of data and checks to see if the hash value represents data already stored by the system. If the data is unique, the hash value is recorded and the data is passed to a data module to store on SSD. If the data is a duplicate, the data module simply increments a reference count and discards the data. The XtremIO system is therefore heavily dependent on maintaining accurate reference counters to each 4KiB of stored data. XtremIO is based on fixed 4KiB blocks, with no verification of the hash value before committing to disk. Deduplication is global across the entire XtremIO cluster, due to the use of content- based data storage. However, data is not replicated across nodes using a standard replication scheme such as RAID. Instead XtremIO uses a RAID- 6 style protection mechanism called XDP, which writes data redundantly within each X- Brick with a capacity overhead of around 8 percent. Loss of an X- Brick therefore means data becomes inaccessible. The current design of XDP means no flexibility in data protection mechanisms is available and deduplication cannot be turned off for more sensitive data. EMC claims a 5:1 deduplication ratio in their documentation when quoting usable capacity. Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 11

Conclusions and Recommendations Data deduplication is a technology that can offer significant space and cost savings in primary storage. Due to the random nature of deduplicated data, the technology has not seen traction and deployment in traditional arrays; instead it has become a key feature for all- flash solutions, which capably cope with the random I/O profile. The underlying design and architecture of the HP 3PAR StoreServ platform means it is well suited to the requirements of deduplication on flash storage. HP 3PAR StoreServ Thin Deduplication continues the evolution of space savings features of the platform, adding to savings customers are already achieving through thin provisioning, thin reclaim, thin conversion and thin persistence. Thin Deduplication leverages the 3PAR StoreServ custom ASIC to perform hashing and data integrity checking at line speed; the ASIC continues to be a key differentiator in the primary array marketplace. In comparison to other platforms, HP 3PAR StoreServ implements Thin Deduplication with little or no performance overhead and provides the customer with the ability to choose which data should be considered for deduplication on a volume by volume basis. In true 3PAR StoreServ ethos, space saving settings can be changed dynamically without requiring work by the customer or restricting the array design or layout. Interpreting Savings Explanations of space savings are murky and not transparently explained. Some vendors exclude their RAID overhead; some include all space saving techniques (including thin provisioning) without providing a breakdown of the savings and how they are achieved. There is also typically no discussion on how much space metadata occupies on backend storage. In the product comparisons, EMC XtremIO quotes a saving ratio of 5:1 (without any detail on how this is achieved), Pure Storage quotes 5.72:1 and SolidFire quotes values from 4:1 to 10:1. Note that figures from Pure and SolidFire also include compression savings (which has considerable processor overhead), which is not currently an HP 3PAR StoreServ feature. Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 12

HP 3PAR StoreServ Systems achieve deduplication ratios of up to 10:1 without including savings from other Thin Technologies. Space savings from Inline Zero Detect, for example, are not included but can be significant, making overall savings much greater. Data deduplication ratios alone are not a true indication of the benefit of deduplication technology. HP 3PAR StoreServ integrates deduplication with existing thin technologies and features such as Thin Clones to deliver a comprehensive integrated space saving solution. With the release of thin deduplication, HP 3PAR StoreServ continues to maintain leadership in delivering customers highly efficient primary storage solutions. 4AA5-3223ENW Edison: HP 3PAR StoreServ Thin Deduplication A Competitive Comparison Page 13