DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group



Similar documents
Data deduplication technology: A guide to data deduping and backup

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Take Advantage of Data De-duplication for VMware Backup

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

Technology Fueling the Next Phase of Storage Optimization

Backup and Recovery Redesign with Deduplication

ESG REPORT. Data Deduplication Diversity: Evaluating Software- vs. Hardware-Based Approaches. By Lauren Whitehouse. April, 2009

Protect Data... in the Cloud

EMC NETWORKER AND DATADOMAIN

How To Deduplication

Next Generation Backup Solutions

Deduplication School 2010

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Vendor Landscape: Disk Backup

Protect Microsoft Exchange databases, achieve long-term data retention

Best Practices Guide. Symantec NetBackup with ExaGrid Disk Backup with Deduplication ExaGrid Systems, Inc. All rights reserved.

E-Guide. Sponsored By:

How to Get Started With Data

Sales Tool. Summary DXi Sales Messages November NOVEMBER ST00431-v06

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

Maximize Your Virtual Environment Investment with EMC Avamar. Rob Emsley Senior Director, Product Marketing

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant

ExaGrid s EX32000E is its newest and largest appliance, taking in a 32TB full backup with an ingest rate of 7.5TB/hour.

Detailed Product Description

The do s and don ts. E-Guide

Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

Deduplication has been around for several

efficient protection, and impact-less!!

White. Paper. Improving Backup Effectiveness and Cost-Efficiency with Deduplication. October, 2010

Oracle Data Protection Concepts

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

Optimizing Backup and Data Protection in Virtualized Environments. January 2009

Availability Digest. Data Deduplication February 2011

EMC Backup solutions. Aleksandar Antić EMC BRS Territory Sales Adriatic region. Copyright 2011 EMC Corporation. All rights reserved.

Turbo Charge Your Data Protection Strategy

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

ExaGrid - A Backup and Data Deduplication appliance

Eight Considerations for Evaluating Disk-Based Backup Solutions

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

ABOUT DISK BACKUP WITH DEDUPLICATION

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento

LDA, the new family of Lortu Data Appliances

Symantec NetBackup deduplication general deployment guidelines

ExaGrid Stress-free Backup Storage

Turnkey Deduplication Solution for the Enterprise

HP Data Protector software and HP StoreOnce backup systems for federated deduplication and flexible deployment

Symantec Backup Appliances

Recoup with data dedupe Eight products that cut storage costs through data deduplication

DXi Accent Technical Background

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

EMC BACKUP MEETS BIG DATA

EMC DATA DOMAIN PRODUCT OvERvIEW

SYMANTEC NETBACKUP APPLIANCE FAMILY OVERVIEW BROCHURE. When you can do it simply, you can do it all.

Data Deduplication Background: A Technical White Paper

ExaGrid Product Description. Cost-Effective Disk-Based Backup with Data Deduplication

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved.

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Best-practice Backup Scenarios from Symantec and Fujitsu

EMC Data de-duplication not ONLY for IBM i

Dell PowerVault DL2200 & BE 2010 Power Suite. Owen Que. Channel Systems Consultant Dell

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

EMC DATA DOMAIN OPERATING SYSTEM 5.2

HP StoreOnce: reinventing data deduplication

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

WHITE PAPER: customize. Best Practice for NDMP Backup Veritas NetBackup. Paul Cummings. January Confidence in a connected world.

EMC DATA DOMAIN OPERATING SYSTEM

Symantec NetBackup 7.1 What s New and Version Comparison Matrix

Choosing an Enterprise-Class Deduplication Technology

Protecting Information in a Smarter Data Center with the Performance of Flash

DeltaStor Data Deduplication: A Technical Review

Checklist and Tips to Choosing the Right Backup Strategy

Protecting enterprise servers with StoreOnce and CommVault Simpana

Mayur Dewaikar Sr. Product Manager Information Management Group Symantec Corporation

EMC DATA DOMAIN OPERATING SYSTEM

Data deduplication is more than just a BUZZ word

DATASHEET FUJITSU ETERNUS CS800 DATA PROTECTION APPLIANCE

MIDRANGE DEDUPLICATING BACKUP APPLIANCE BUYER S GUIDE

WHITE PAPER Backup and Recovery: Accelerating Efficiency and Driving Down IT Costs Using Data Deduplication

WHITE PAPER. Effectiveness of Variable-block vs Fixedblock Deduplication on Data Reduction: A Technical Analysis

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Extending the Power of Your Datacenter

Long term retention and archiving the challenges and the solution

Using HP StoreOnce Backup systems for Oracle database backups

HP StoreOnce Product Line

EMC AVAMAR. a reason for Cloud. Deduplication backup software Replication for Disaster Recovery

Detailed Product Description

Symantec NetBackup 7.5 What s New and Version Comparison Matrix

Business Life Insurance

Future-Proofed Backup For A Virtualized World!

Backup and Disaster Recovery Planning On a Budget. Presented by: Najam Saeed Lisa Ulrich

Deduplication Demystified: How to determine the right approach for your business

Veritas Backup Exec 15: Deduplication Option

ETERNUS CS800 data protection appliance featuring deduplication to protect your unique data

EMC BACKUP AND RECOVERY SOLUTIONS

Presentation Identifier Goes Here 1

Transcription:

DEDUPLICATION NOW AND WHERE IT S HEADING Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

Need Dedupe?

Before/After Dedupe

Deduplication Production Data Deduplication In Backup Process Backup Disk

Dedupe Evolution Block-level deduplication technology Deduplication appliances Multinode/Grid Solutions Backup with Deduplication Optimizes network bandwidth; Tape-centric aids with data disk-to-disk Multi-node Deduplication Eliminate configurations becomes redundancy Ability transport What s next? a across to create between files tapes now for optimized sites introduce more Symantec pervasive ability solves long-term to deliver feature catalog in retention that HA, tracking load backup Changes the economics of contain balancing, of deduped software copies deduplicated data performance Eliminate increase redundancy and disk-to-disk backup global deduplication within and between files? File-level deduplication OR single-instance storage WAN optimization VTL with deduplication Symantec OST Interface Dedupe on tape

Data Growth Out of Control?

Managing the Data Deluge At approximately what rate do you believe your total volume of data is growing annually? (Percent of respondents) 100 or fewer servers (N=247) More than 100 servers (N=246) 45% 42% 62% with <100 servers have <20% growth/year 40% 35% 30% 25% 20% 20% 28% 63% with >100 servers have >20% growth/year 23% 24% 30% 15% 10% 5% 9% 9% 9% 6% 0% 1% to 10% annually 11% to 20% annually 21% to 30% annually 31% to 40% annually More than 40% annually

Storage Spending Priorities In which data storage areas will your organization make the most significant investments over the next 12-18 months? (Percent of respondents, five responses accepted, N=289) Backup and recovery solutions Data replication solution for off-site disaster Purchase new SAN storage systems Improved storage management software tools Storage virtualization Data reduction technologies Purchase more power-efficient storage hardware Tiered storage Use cloud storage services as way to source Tape replacement Purchase new NAS storage systems Advanced file storage / file system technology Storage encryption solution Converged data and storage networking Unified storage systems Increase use of flash-based SSDs 24% 23% 21% 21% 18% 18% 17% 17% 15% 15% 14% 12% 9% 9% 8% 36% 0% 10% 20% 30% 40%

Why Do We Need Dedupe? Data Growth

Deduplication Creates Efficiencies in D2D Backup Financial benefits Reduce disk costs; delay capital expenditures Lower bandwidth costs Reduce power & cooling costs Tape replacement savings Operational benefits Reduce operational overhead in backup Reduces time and resource needs for recovery Business benefits Increase retention periods Improve recovery objectives Improve backup consolidation from ROBOs Improve DR

Best Dedupe Fit? Traditional file-level backup ROBO use cases Virtualized environments

and Worst Fit? Pre-compressed or encrypted data File types that don t have versions (multimedia)

What Impacts Reduction Ratios? Backup strategy (full vs. incremental or differential) Change rate between backups Retention When data is encrypted or compressed

Typical Dedupe Ratios On average, what degree of capacity reduction has your organization experienced by using data deduplication technology? (Percent of respondents, N=140) More than 20x reduction, 11% Don t know, 5% Less than 10x reduction, 29% 10x to 20x reduction, 56%

Capacity Savings Weekly full backup over 8 weeks 6 week retention 20:1 deduplication ratio Protected Capacity (TB) Stored Capacity (TB) 40 35 30 25 20 15 10 5 1.25 1.67 1.88 1.67 1.79 1.76 1.84 2.00 1 2 3 4 5 6 7 8 Retention Period (weeks)

Which Dedupe Approach Is Best? Backup Software VTL Gateway Appliance NAS Dedupe Device

Hash algorithms Identifying Duplicates More popular approach Fixed block size Variable block size Sliding window block size Hash collisions (false positives) a remote risk Central index of IDs Delta differences Faster No false positives Global deduplication across different backup streams is a limitation Hybrid approach Combines delta differencing & hash calculation Less CPU- and memory-intensive Index is smaller

Data Deduplication Where? Backup Source Backup Initiator Backup Target VMs Apps OS Apps OS Apps OS WAN ESX Server Remote or Branch Office

Data Deduplication When? Backup Source Backup Initiator Backup Target VMs Apps OS Apps OS Apps OS WAN Post-process deduplication after data is written to disk ESX Server Remote or Branch Office Inline deduplication - before data is written to disk

Inline vs. Post-Process Inline Requires less I/O Replication can begin immediately Re-assembly of data for recovery could impact performance Examples EMC Data Domain IBM ProtecTIER NEC Hydrastor Symantec NBU 5000 Series Typically all software approaches Post-Process Requires more I/O Requires disk landing zone (staging area) Dedupe & replication processes overlap Most recent full kept in native format Examples: Exagrid FalconStor GreenBytes HP VLS Quantum Dxi Sepaton DeltaStor

Single- vs. Multi-Node Solutions Single-Node Dedupe Performance & capacity is limited to upper threshold Forklift upgrade Add more islands of dedupe Over-purchase to accommodate future growth Examples EMC Data Domain Fujitsu CS GreenBytes Quantum Multi-Node Dedupe Manages multiple deduplication systems as one More linear throughput & capacity scaling Load balancing Examples IBM ProtecTIER EMC Avamar Exagrid EX Series FalconStor FDS HP VLS NEC HydraStor Sepaton DeltaStor Symantec NetBackup 5000 Series

Local vs. Global Dedupe Local Single domain backup data passes through an individual system and is compared with data passing through the same system Examples: EMC Data Domain Fujitsu GreenBytes Quantum Global Deduplication across domains means backup data is compared with data within its system as well as other systems in the domain Can result in higher dedupe ratios Examples: Exagrid FalconStor HP VLS IBM ProtecTIER NEC Sepaton Symantec NBU 5000 Series Typically most backup software solutions

Dedupe Approaches Software-Based Hardware-Based Content-aware; dedupe can be policy-based Can be more cost-effective Flexibility in disk selection End-to-end bandwidth efficiency; remote site backup Global dedupe Simplified management single console, policy engine Can extend to tape Examples: Arkeia Asigra Atempo CA Cofio CommVault Druva - EMC Avamar - I365 - IBM - PHD Virtual - Quest - Symantec NBU & BE - Veeam Multiple backup vendor environments No impact on application performance Optimized replication Scalability of some solutions may cause disruptive upgrades or dedupe islands Examples: EMC Exagrid FalconStor Fujitsu GreenBytes HP IBM NEC Quantum Sepaton Symantec

High-Value Feature Target system integration with backup catalogs and lifecycle policies Symantec OpenStorage (OST) EMC Networker

What s New in Dedupe? New dedupe techniques Example: Arkeia Progressive Dedupe on tape Example: CommVault Target solutions moving processes upstream Example: Data Domain Boost Modular dedupe Example: HP StoreOnce Dedupe in hardware/software from same vendor Example: Symantec Ongoing improvements in capacity and performance

Disruptive Trends

Purchase Considerations Which of the following considerations would you say are most important in your organization s evaluation and selection of data deduplication technology? (Percent of respondents, N=145, five responses accepted) Cost of solution 64% Ease of implementation/use 46% Impact on backup/recovery performance Integration with existing backup processes Scalability of solution 33% 31% 35% Vendor service and support Ability to deduplicate across systems/data sets as Ability to replicate deduplicated data off-site Existing relationship with vendor Where deduplication occurs Granularity of deduplication Deduplication ratio Experience of vendor in backup implementation When deduplication occurs 24% 23% 21% 17% 17% 14% 12% 10% 9% 0% 10% 20% 30% 40% 50% 60% 70%

Before Seeking Out Solutions Understand your needs Capacity and throughput requirements/planning Full backup size; incremental backup size Number of full/incremental backups per week Change rate of data Projected growth rate Retention policies Full backup window Offsite copy window Performance requirements Requirements for offsite copies Budget

How is Dedupe Evolving? Mix of hardware & software approaches Scale requirements Performance Capacity Focus on recovery considerations Speed of rehydration and restore Reliability Criticality of the index how is it protected? New architectures New packaging New dedupe techniques

THANKS! laurenw@esg-global.com Twitter: lauwhitehouse Blog: www.dataprotectionperspectives.com

APPENDIX

Fixed- vs. Variable-Length Blocks Fixed-Length Blocks Initial Examination Block A Block B Block C Block D Block E Subsequent Examination Block A Block B Block F Block G Block H Change in file Downstream blocks F, G & H change = no duplication detected after the change Variable-Length Blocks Initial Examination Block A Block B Block C Block D Subsequent Examination Block A Block E Block C Block D Change in file Downstream blocks C & D unchanged = duplication detected

Post-process dedupe Time to DR Backup Job Replication Time Inline dedupe Backup Job Replication Time