UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Similar documents
UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

The Business Value of Data Deduplication DDSR SIG

Restoration Technologies. Mike Fishman / EMC Corp.

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

Introduction to Data Protection: Backup to Tape, Disk and Beyond

Thomas Rivera / Hitachi Data Systems

Deduplication Demystified: How to determine the right approach for your business

Backup to Tape, Disk and Beyond. Jason Iehl, NetApp

3Gen Data Deduplication Technical

Efficient Backup with Data Deduplication Which Strategy is Right for You?

WAN Optimization and Thin Client: Complementary or Competitive Application Delivery Methods? Josh Tseng, Riverbed

Trends in Application Recovery. Andreas Schwegmann, HP

How To Migrate To A Network (Wan) From A Server To A Server (Wlan)

Deduplication has been around for several

EMC Disk Library with EMC Data Domain Deployment Scenario

DXi Accent Technical Background

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Understanding Enterprise NAS

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

WAN Optimization and Cloud Computing. Josh Tseng, Riverbed

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup

Cloud-integrated Storage What & Why

Fundamental Approaches to WAN Optimization. Josh Tseng, Riverbed

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento

WHITE PAPER Data Deduplication for Backup: Accelerating Efficiency and Driving Down IT Costs

Server and Storage Virtualization with IP Storage. David Dale, NetApp

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

Data Deduplication Background: A Technical White Paper

How To Protect Data On Network Attached Storage (Nas) From Disaster

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

Protect Microsoft Exchange databases, achieve long-term data retention

Next Generation Backup Solutions

Creating a Catalog for ILM Services. Bob Mister Rogers, Application Matrix Paul Field, Independent Consultant Terry Yoshii, Intel

Protecting Information in a Smarter Data Center with the Performance of Flash

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved.

Get Success in Passing Your Certification Exam at first attempt!

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation

Cloud File Services: October 1, 2014

Long term retention and archiving the challenges and the solution

Data Center Convergence. Ahmad Zamer, Brocade

Optimizing Backup and Data Protection in Virtualized Environments. January 2009

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

Cloud-integrated Enterprise Storage. Cloud-integrated Storage What & Why. Marc Farley

Turnkey Deduplication Solution for the Enterprise

CISCO WIDE AREA APPLICATION SERVICES (WAAS) OPTIMIZATIONS FOR EMC AVAMAR

EMC DATA DOMAIN OPERATING SYSTEM

EMC AVAMAR. a reason for Cloud. Deduplication backup software Replication for Disaster Recovery

HP LeftHand SAN Solutions

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

Using HP StoreOnce Backup systems for Oracle database backups

HP StoreOnce: reinventing data deduplication

Protect Data... in the Cloud

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

IBM Tivoli Storage Manager Version Introduction to Data Protection Solutions IBM

Take Advantage of Data De-duplication for VMware Backup

High Performance Computing OpenStack Options. September 22, 2015

Cloud OS Vision. Modern platform for the world s apps

EMC DATA DOMAIN PRODUCT OvERvIEW

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

09'Linux Plumbers Conference

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

EMC DATA DOMAIN OPERATING SYSTEM

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

EMC BACKUP-AS-A-SERVICE

Symantec NetBackup Appliances

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

How it can benefit your enterprise. Dejan Kocic Hitachi Data Systems (HDS)

White. Paper. Addressing NAS Backup and Recovery Challenges. February 2012

Transcription:

UNDERSTANDING DATA DEDUPLICATION Tom Sas Hewlett-Packard

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

About the SNIA DPCO This tutorial has been developed, reviewed and approved by members of the Data Protection and Capacity Optimization (DPCO) committee which any SNIA member can join for free The mission of the DPCO is to foster the growth and success of the market for data protection and capacity optimization technologies 2010 goals include educating the vendor and user communities, market outreach, and advocacy and support of any technical work associated with data protection and capacity optimization 3

Abstract Data deduplication is a capacity optimization technology that is being used to dramatically improve storage efficiency. This technical session will: Review various data deduplication methodologies Identify the factors that influence space savings Provide scenarios where data deduplication is used 4

Data Deduplication Benefits Data Deduplication can help organizations: Satisfy ROI/TCO requirements Manage data growth Increase efficiency of storage and backup Reduce overall cost of storage Reduce network bandwidth Reduce operational costs including: Infrastructure costs required space, power and cooling Movement toward a greener data center Reduce administrative costs 5

Space Reduction Terminology Data Deduplication is the replacement of multiple copies of data at variable levels of granularity with references to a shared copy in order to save storage space and/or bandwidth Single Instance Storage is a form of data deduplication that operates at a granularity of an entire file or data object Subfile Data Deduplication is a form of data deduplication that operates at a finer granularity than an entire file or data object Compression is the encoding of data to reduce its storage requirement - deduplicated data can also be compressed 6

Space Reduction Ratio & Percent Bytes In Capacity Optimization Bytes Out Ratio = Bytes In Bytes Out % = Bytes In Bytes Out Bytes In 1 1 - % 1 % = 1 ( ) Ratio Ratio = ( ) 7

Space Reduction Ratio & Percent Space Reduction Ratio (In:Out) Space Reduction Percent 2:1 1/2 = 50% 5:1 4/5 = 80% 10:1 9/10 = 90% 20:1 19/20 = 95% Ratios can meaningfully be compared only under the same set of assumptions Relatively low space reduction ratios still provide significant space savings 100:1 99/100 = 99% 500:1 499/500 = 99.8% 8

Capacity Savings over Time Deduplication Ratio Typically Depends on Use Case and Time Primary storage has less duplicate data Periodic archives have moderate duplicate data Repeated backups have significant duplicate data Logical Capacity Dedupe Primary Dedupe Archive Time Dedupe Backup

Data Deduplication How it Works Evaluate Data Identify Redundancy Create or Update Reference Information Store and/or Transmit Unique Data Once Read and/or Reproduce Data Data Deletion/Space Reclamation 10

Data Deduplication Simplified = new unique data = repeat data 11

Data Deduplication Simplified = new unique data = repeat data = pointer to unique data segment 12

Data Deduplication Simplified Dump #1 Dump #2 Dump #3 Dump #4 = new unique data = repeat data = pointer to unique data segment 13

Reading Deduplicated Data Read Request Read Fulfilled Application/CIFS/NFS/VTL Interface Deduplication Engine Reconstitution/ Verification Metadata References Deduplicated Data 14

Design Approach Application-specific protocol CIFS, NFS, FC, iscsi, VTL WAN Deduplicated Replication Agent or Component Gateway Virtual Appliance Appliance Storage System Grid Storage Multiple deployment examples are illustrated Specific deployments selected based on customer situation 15

Source or Target Source Deduplication Identifies duplicate data at the client Transfers unique segments to a central repository Separate client and server components Reduces network traffic during backup and replication Target Deduplication Identifies duplicate data where the data is being stored Stores unique segments Standalone system Reduces network traffic during subsequent replication 16

Inline or Post-Process Inline Deduplication Data deduplication performed before writing the deduplicated data Post-Process Deduplication Data deduplication performed after the data to be deduplicated has been initially stored Considerations A product may implement both methods A product may provide methods to control when particular data is deduplicated May impact replication, usable capacity, scalability, etc. 17

Fixed or Variable Size Segment Fixed Length Segment Deduplication Evaluation of data includes a fixed reference window used to look at segments of data during deduplication process Provides fixed granularity, e.g. 4KB, or 8KB, or 128KB Variable length Segment Deduplication Evaluation of data uses a variable length window to find duplicate data in stream or volume of data processed Provides variable granularity, e.g. ranges 4KB-12KB, 32KB- 96KB Method Chosen May Affect Deduplication Results Effects observed will vary by method Segmentation may not apply to all deduplication 18

Data Deduplication Scope Multiple Repositories Per Storage Controller Single Repository Per Storage Controller Single Repository Shared by Multiple Storage Controllers Application Servers Application Servers Application Servers Dedupe Dedupe Repository Repository Dedupe Repository Dedupe Repository System capacity varies independently from the scope 19

Applications for Deduplication Primary Storage Lower physical capacity required for storage of active data Data Protection Backup to disk efficiently with longer retention and recoverability Replication for offsite data movement, disaster recovery and business continuity Archive Repository Long-term retention and preservation 20

Backup: What to Consider Capacity Optimization Data type Operational Policies Design and Implementation Performance Optimization(bandwidth and latency) Technology Resiliency Clustering Grid Replication 21

Backup: Factors Impacting Space Savings Factors associated with higher data deduplication ratios Data created by users Low change rates Reference data and inactive data Use of full backups Longer retention of deduplicated data Wider scope of data deduplication Continuous business process improvement Smaller segment size Variable-length segment size Format awareness Temporal data deduplication Factors associated with lower data deduplication ratios Data captured from mother nature High change rates Active data, encrypted data, compressed data Use of incremental backups Shorter retention of deduplicated data Narrower scope of data deduplication Business as usual operational procedures Larger segment size Fixed-length segment size No format awareness Spatial data deduplication www.snia.org/forums/dmf/knowledge 22

Backup: Capacity Savings Over Time Deduplication Ratio Typically Improves Over Time Logical Storage savings Dedupe storage Time

Backup: Deduplication for Data Movement Disaster Recovery Replicate all Data after Deduplication for Bandwidth Efficiency Meet Offsite Requirements without Physical Transport Bandwidth Optimization Increasing WAN Efficiency Transfer more information per pipe Support Remote Office Protection Enable Backup Centralization Consolidate Physical Tape Creation Check out SNIA Tutorial: Deduplication s Role in Disaster Recovery 24

Backup: Removable Media Creation Create Long-term Removable Media Storage for Compliance and Archive Backup Server Different Data Path Approaches (#1) Path through backup server # 1 (#2) Path direct from deduplication system to removable media storage Deduplication System # 2 Removable Media 25

Deduplication within Secondary Storage Use Cases 26

Deduplication for Backup and Recovery Deduplicating Agents Backup Clients Backup Clients Primary systems Primary systems Primary systems B2D B2D Deduplicating Backup Server B2D Backup Server Appliance Storage System Deduplicating VTL or NAS

Backup Remote Office Source Deduplication Large Remote Site Agents Source Deduplication Small Remote Site Agents Only Primary systems WAN Primary systems Appliance Optimized Replication Encryption Remote Recovery Site Data Center Agents Agents Primary systems Primary systems Appliance Appliance Tape Vault 28

Deduplicated Archive Repositories Policy-based software moves inactive data on primary storage to deduplicated storage archive Application Servers Policy-based software Local archive may be replicated offsite and copied to tape for long term retention Optimized Replication Remote Site Primary Storage Deduplicated Local Archive Deduplicated Offsite Archive Tape Vault 29

Deduplication of Primary Storage Use Cases 30

Primary Storage with Deduplication LAN Application Server SAN Primary Storage Bring the benefits of deduplication to primary storage Supports networked storage (SAN & LAN) Storage efficiency supports replicating more data Space/performance tradeoff 31

Virtual Machine Storage Balance the tradeoff between savings and performance impact VM Clone Copies APP APP DATA DATA OS OS APP DATA OS APP DATA OS APP DATA OS Examples of Active Data Unstructured data Structured data Virtual Machines APP DATA OS VMs After Deduplication Duplicate Data Is Eliminated 32

Q&A / Feedback Please send any questions or comments on this presentation to SNIA: trackdatamgmt@snia.org Many thanks to the following individuals for their contributions to this tutorial. - SNIA Education Committee Matthew Brisse Don Deel Mike Dutch Larry Freeman David Hill Judy Leach Gene Nagle Richard Reitmeyer Thomas Rivera Tom Sas Gideon Senderov It s easy to get involved with the DPCO! - Find a passion - Join a committee - Gain knowledge & influence - Make a difference www.snia.org/dpco 33