Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert

Similar documents

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

Efficient Backup with Data Deduplication Which Strategy is Right for You?

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

ExaGrid Product Description. Cost-Effective Disk-Based Backup with Data Deduplication

Data deduplication technology: A guide to data deduping and backup

Eight Considerations for Evaluating Disk-Based Backup Solutions

Protect Data... in the Cloud

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Seriously: Tape Only Backup Systems are Dead, Dead, Dead!

Backup and Disaster Recovery Planning On a Budget. Presented by: Najam Saeed Lisa Ulrich

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Data Deduplication: An Essential Component of your Data Protection Strategy

STORAGE SOURCE DATA DEDUPLICATION PRODUCTS. Buying Guide: inside

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Get Success in Passing Your Certification Exam at first attempt!

EMC BACKUP MEETS BIG DATA

WHITE PAPER: customize. Best Practice for NDMP Backup Veritas NetBackup. Paul Cummings. January Confidence in a connected world.

EMC Data de-duplication not ONLY for IBM i

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Aspirus Enterprise Backup Assessment and Implementation of Avamar and NetWorker

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Turnkey Deduplication Solution for the Enterprise

Data Backup and Archiving with Enterprise Storage Systems

Maximize Your Virtual Environment Investment with EMC Avamar. Rob Emsley Senior Director, Product Marketing

Oracle Data Protection Concepts

CIGRE 2014: Udaljena zaštita podataka

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

How to Get Started With Data

Backup and Recovery Redesign with Deduplication

We look beyond IT. Cloud Offerings

LDA, the new family of Lortu Data Appliances

Understanding EMC Avamar with EMC Data Protection Advisor

Technical White Paper for the Oceanspace VTL6000

ABOUT DISK BACKUP WITH DEDUPLICATION

Presents. Attix5 Technology. An Introduction

ExaGrid - A Backup and Data Deduplication appliance

NetApp Data Fabric: Secured Backup to Public Cloud. Sonny Afen Senior Technical Consultant NetApp Indonesia

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski Spała

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

Business-Centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved.

Enterprise-class Backup Performance with Dell DR6000 Date: May 2014 Author: Kerry Dolan, Lab Analyst and Vinny Choinski, Senior Lab Analyst

Deduplication has been around for several

09'Linux Plumbers Conference

Barracuda Backup Deduplication. White Paper

Introduction. Silverton Consulting, Inc. StorInt Briefing

Data Deduplication and Tivoli Storage Manager

Deduplication Demystified: How to determine the right approach for your business

Data deduplication is more than just a BUZZ word

Data Deduplication HTBackup

3Gen Data Deduplication Technical

FAQ RIVERBED WHITEWATER FREQUENTLY ASKED QUESTIONS

Availability Digest. Data Deduplication February 2011

The Curious Case of Database Deduplication. PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle

Storage Backup and Disaster Recovery: Using New Technology to Develop Best Practices

A Business Case for Disk Based Data Protection

Next Generation Backup Solutions

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007

Hardware Configuration Guide

Recoup with data dedupe Eight products that cut storage costs through data deduplication

Symantec Backup Appliances

DATASHEET FUJITSU ETERNUS CS800 DATA PROTECTION APPLIANCE

Backup and Recovery: The Benefits of Multiple Deduplication Policies

Understanding Disk Storage in Tivoli Storage Manager

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

TSM Family Capacity Pricing Special Bid Offering IBM Corporation

Protect Microsoft Exchange databases, achieve long-term data retention

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

EMC Disk Library with EMC Data Domain Deployment Scenario

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento

EMC BACKUP & ARCHIVE SOLUTIONS

Backup to Tape, Disk and Beyond. Jason Iehl, NetApp

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

Protecting the Microsoft Data Center with NetBackup 7.6

ExaGrid s EX32000E is its newest and largest appliance, taking in a 32TB full backup with an ingest rate of 7.5TB/hour.

EMC AVAMAR. Deduplication backup software and system. Copyright 2012 EMC Corporation. All rights reserved.

Dell PowerVault DL2200 & BE 2010 Power Suite. Owen Que. Channel Systems Consultant Dell

Trends in Enterprise Backup Deduplication

Avamar. Technology Overview

EMC DATA DOMAIN OVERVIEW

Transcription:

Backup Software Data Deduplication: What you need to know Presented by W. Curtis Preston Executive Editor & Independent Backup Expert

When I was in the IT Department When I started as backup guy at $35B company in 1993: Tape Drive: QIC 80 (80 MB capacity) Tape Drive: Exabyte 8200 (2.5 GB & 256KB/s) Biggest Server: 4 GB ( 93), 100 GB ( 96) Entire Data Center: 200 GB ( 93), 400 GB ( 96) My TIVO now has 5 times the storage my data center did! Consulting in backup & recovery since 96 Author of O Reilly s Backup & Recovery & Using SANs and NAS Webmaster of BackupCentral.com Founder/CEO of Truth in IT Follow me on Twitter @wcpreston

Agenda Understanding deduplication Advantages and disadvantages of software dedupe Dedupe isn t the answer to everything

Understanding deduplication BUT FIRST I HAVE TO TAKE YOU BACK IN TIME...

A 1993 Commercial This commercial ran the year I entered the industry. (1993)

History of My World Part I When I joined the industry (1993) Disks were 4 MB/s, tapes were 256 KB/s Networks were 10 Mb shared Seventeen years later (2010) Disks are 70 MB/s, tapes are 160 MB/s Networks are 10 Gb switched Changes in 17 years 17x increase in disk speed (luckily, RAID has created virtual disks that are way faster) 625x increase in tape speed! 1000x+ increase in network speed QIC 80 (60 KB/s) Exabyte 8200 (256 KB/s) DECStation 5000

More History Plan A: Stage to disk, spool to tape Pioneered by IBM in 90s, widely adopted in late 00s Large, very fast virtual disk as caching mechanism to tape Only need enough disk to hold one night s backups Helps backups; does not help restores Plan B: Backup to disk, leave on disk AKA the early VTL craze Helps backups and restores Disk was still way too expensive to make this feasible for most people

Plan C: Dedupe It s perfect for traditional backup Fulls backup the same data every day/week/month Incrementals backup entire file when only one byte changes Both backup file 100 times if it s in 100 locations Databases are often backed up full every day Tons of duplicate blocks! Average actual reduction of 10:1 and higher It s not perfect for everything Pre-compressed or encrypted data File types that don t have versions (multimedia)

Naysayers Eliminate all but one copy? No, just eliminate duplicates per location What about hash collisions? More on this later, but this is nothing but FUD Doesn t this have immutability concerns? Everything that changes the format of the data has immutability concerns (e.g. sector-based storage, tar, etc) Job of backup/archive applications is to verify same in/out What about the dedupe tax? Let s talk more about this one a bit

Is There a Plan D? Some pundits/analysts think target dedupe is a band-aid, and will eventually be done away with via software dedupe, delta-backups, etc. Maybe this will happen in a 3-5 year time span, maybe it won t For some environments, it s already happened One of the main purposes of this session is to examine software dedupe and see where it is appropriate

Plan D?

Appliance & software deduplication Appliance Dedupe Standalone appliance that sits behind backup server and dedupes native backups Still appliance Appliances that run backup sw in appliance Appliances that support OST & dedupe before appliance Software Dedupe Software running on backup server or client that dedupes data to generic storage At least three different flavors Generally seen as competitor to hardware dedupe

Death of Disk Drives & Dedupe Tax Two sure things in backups If doing D2D2T or large restores, single stream restore speed is very important The dedupe tax can significantly impair your ability to create a fast, single stream of data It is caused by reality that deduping data fragments each data set over time Each vendor must be doing something to address this issue, or their restore speeds will be very, very slow

Dedupe Tax & Software Dedupe Some post-process target dedupe vendors circumvent the dedupe tax for recent restores (caching recent data in native format, forward referencing) Software dedupe vendors, being inline vendors, are not able to apply these techniques

Where Is the Data Deduped? Target Dedupe Data is sent unmodified across LAN & deduped at target No LAN/WAN benefits until you replicate target to target Usually cannot compress or encrypt before sending to target Source Dedupe Redundant data is identified at backup client Only new, unique data sent across LAN/WAN LAN/WAN benefits, can back up remote/mobile data Allows for compression, encryption at source

Source dedupe vs Delta backups Delta backups store only new, unique segments between backups of a given client They do not compare Elvis backups to Apollo s backups Source dedupe compares all segments to each other, regardless of source Delta comparison is considered dedupe in appliance world, but not in software world

Hash-based Dedupe All software dedupe is hash-based Slice all data into segments or chunks Run chunk through hashing algorithm (SHA-1) Check hash value against all other hash values Chunk with identical hash value is discarded

Hash Collision FUD There is a finite number of hashes (2 160 ) It is possible that two chunks with different contents could have the same hash It is also possible that Catherine Zeta Jones will dump Michael Douglas and ask me out The odds of these two events are roughly the same

Real Odds of a Hash Collision Hash Hash Size In order to reach 1:10 15 odds (UBER on single drive) Hashes Needed Zettabytes Needed In order to reach 1:10 5 odds (Double drive failure) Hashes Needed Yottabytes Needed SHA1 160 5.4E+16 432 5.4E+21 43 Tigger 192 3.5E+21 28,345,529 3.5E+26 2,834,560 SHA256 256 1.5E+31 121,743,120,636,759,000 1.5E+36 12,174,342,499,597,800 10-15 : Odds of single disk writing incorrect data and not knowing it (AKA Undetectable Bit Error Rate or UBER) With SHA-1, we have to write 432 YB to get those odds 10-5 : Best odds of a double-disk RAID5 failure With SHA-1, we have to write 43 ZB to reach those odds Original formula here: http://en.wikipedia.org/wiki/birthday_attack Modified by Joseph Chapa, PhD using MacLaurin series expansion to mitigate Excel s lack of precision and is here: backupcentral.com/hash-odds.xls YB/ZB needed assumes an 8K chunk size. Larger chunk increases YB/ZB needed

ADVANTAGES AND DISADVANTAGES

Advantages of Source Dedupe Reduces bandwidth usage from very beginning Allows backup of relatively large datasets over relatively small WAN connections Easy backup of mobile data All dedupe is global Allows for database/application agents Can be used to backup virtual servers

Advantages of Software Target Dedupe Can use any storage as target, including repurposed storage Always global (to capacity) Integrated with backup software (including replication) Hybrid dedupe allows compression & encryption at client

Disadvantages of Software Dedupe Generally not a scalable as appliance dedupe, either in capacity or performance Largest capacity source dedupe system is 52 TB (native) Appliance systems offer PBs Fastest restore is 360 MB/s, vs 1000s of MB/s w/appliance dedupe Some of the other stats are not impressive either May need to upgrade or replace your backup software to get dedupe functionality

Name That Dedupe Integrated Target Dedupe Symantec NetBackup, TSM, ARCServe, CommVault Simpana Integrated Source Dedupe Asigra, Symantec NetBackup & Backup Exec, TSM Standalone Source Dedupe EMC Avamar, i365 evault, Symantec NetBackup

Dedupe isn t the answer to everything It s still a bulk copy of the data for backup and restore Creates backup window issues Creates RTO/RPO challenges At some point, many companies requirements will drive them to CDP and near-cdp

Continuous Data Protection Replication with a back button Continuously copy block changes to backup system Replicated copy always ready for instant restore System can roll back in time to recover from logical corruption

Near-CDP Snapshots & replication Also allows for continuous zeroimpact backups Can recover to latest snapshot Some object to the term near- CDP but there is no replacement term

THANK YOU!