Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs

Size: px
Start display at page:

Download "Data Reduction: Deduplication and Compression. Danny Harnik IBM Haifa Research Labs"

Transcription

1 Data Reduction: Deduplication and Compression Danny Harnik IBM Haifa Research Labs

2 Motivation Reducing the amount of data is a desirable goal Data reduction: an attempt to compress the huge amounts of data at hand Is it possible? information theoretically, technically Is it financially worth it? storage is becoming cheaper all the time requires resources and time

3 Compression and Deduplication Compression What is the most succinct representation of this file? Deduplication Hasn t this file appeared before? Different workloads give different results: Some favor compression, some favor dedup Sometimes the combination is best

4 Compression

5 Compression Zip runs an algorithm called DEFLATE A combination of two techniques: Lempel Ziv [1977] Huffman code [1952] Will show these 2 techniques + Arithmetic encoding

6 LZ77 Compression Go over a stream At each point, search for the longest identical string that has already appeared in the past. If none appeared, write the string If appeared, save Pointer to start of string (how many bytes back) Length of current string. Many variations How far to search back? Typically 32KB LZ78 hold a dictionary table A good approximation of the entropy for some sources

7 Huffman Code An information theoretic approach to compression: A typical text of n characters (or bytes) is not uniformly distributed. Use the skewed distribution to achieve a shorter representation. Most popular byte character gets shortest representation E.g. In a typical English text: Use the shortest encoding for e The longest for q Huffman code: A method of presenting a text using nearly its shannon entropy worth in bits. Optimal when considering just single characters

8 Huffman Code this is an example of a huffman tree Example taken from:

9 Deduplication

10 Deduplication Similar to Lempel Ziv 78, but at a whole different scale Basic Block is typically ~ 4KB, 8KB, 16KB, full file Rather than byte, or string of bytes An ongoing process. Need to address a file after it is saved and closed. Two main approaches Inline dedup process data as it arrives Offline dedup background process, first save data, then dedup in spare time.

11 How to dedupe? Fingerprint each block using a hash function Common hashes used: Sha1, Sha256, others Store an index of all the hashes already in the system New block: Compute hash Look hash up in index table If new add to index If known hash store as pointer to existing data If known hash, do you want to look at the actual data?? 11

12 Client-side deduplication A method to save bandwidth as well as storage. Also know as source-based dedupe or WAN deduplication Client computes hash and sends to server If new server requests client for the data (upload data) Otherwise (dedupe) skip upload and add a new pointer to the data Client Server Let it be.mp3 hash Index 2fd4e1 2fd4e1 2fd4e1 12 Let it be.mp3

13 Choice of hash function In most deduplication systems this is done using a cryptographic hash Usually SHA-1 which has an output of 160 bits Probability of a collision: 1. n is the number of blocks 2. b is the number of bits in the hash p n( n 1) b The above is true for any random hash function. However, a malicious adversary may choose blocks especially to create a collision. This is why a cryptographic hash is used Typically more expensive than any random like hash function

14 Issues Smaller blocks = Better Dedup But smaller blocks = more work More fingerprints More searches More metadata Bottom line: the choice of block size depends on the workload E.g. a file system with a 1KB page size

15 Alignment issues What if we insert 1 byte into an existing file. Almost identical data Dedup will fail miserably. Solution: variable block size Rabin-Karp fingerprinting: Compute a rolling hash Cut when hash equals 0 mod p Average block size = p

16 Existing data reduction solutions (A sample of solutions for storage systems)

17 Deduplication some systems and applications Content Adressable Storage (CAS) mainly for archiving Venti (Lucent), Centera(EMC), JumboStore (HP), Hydrastor(NEC) Backup Virtual Tape Library (VTL) Backup Dilligent (IBM), DataDomain (EMC), D2D (HP) Backup with client side dedup Cloud backup services: Mozy(EMC), DropBox,. Avamar(EMC), Ocarina (Dell), Netbackup (Symantec) Tivoli Storage Manager (IBM) Primary (mainly file systems) useful for VM images Netapp Filer 2 to 1 ratio guarantee on some VMWare usage. ZFS (Sun open source file system) Dell (planned for next year)

18 Compression in storage systems Real-time (Inline) RTC (IBM) ZFS (Oracle) Nimble Storage Offline Mix EMC Data Compression Dell (planned for next year dedupe inline, compression offline) Netapp Writes online, updates offline. Backup

19 Dedup vs. Compression vs. both Compression and Deduplication for Various Data Types 1.2 Data Reduction Ratio (Compressed size / Original size) Compress (Gzip) DedupV (4K, var) DedupV+Compress DedupF (4K, fix) DedupF+Compress Compress+DedupV Compress+DedupF 0 VM Images Medical Images Website Archive Project Repository DB2 TPC Laptop1 (29.9GB) Data type Data taken from C. Constantinescu, J. Glider, D. Chambliss: Mixing Deduplication and Compression on Active Data Sets. DCC 2011

20 Summary Data reduction is a useful concept, but not for all cases Compression and Deduplication 2 similar concepts at the two ends of the same scale The large scale in dedupe creates new challenges Different challenges and use cases No one solution fits all

Security of Cloud Storage: - Deduplication vs. Privacy

Security of Cloud Storage: - Deduplication vs. Privacy Security of Cloud Storage: - Deduplication vs. Privacy Benny Pinkas - Bar Ilan University Shai Halevi, Danny Harnik, Alexandra Shulman-Peleg - IBM Research Haifa 1 Remote storage and security Easy to encrypt

More information

Estimating Deduplication Ratios in Large Data Sets

Estimating Deduplication Ratios in Large Data Sets IBM Research labs - Haifa Estimating Deduplication Ratios in Large Data Sets Danny Harnik, Oded Margalit, Dalit Naor, Dmitry Sotnikov Gil Vernik Estimating dedupe and compression ratios some motivation

More information

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Efficient Backup with Data Deduplication Which Strategy is Right for You? Efficient Backup with Data Deduplication Which Strategy is Right for You? Rob Emsley Senior Director, Product Marketing CPU Utilization CPU Utilization Exabytes Why So Much Interest in Data Deduplication?

More information

Side channels in cloud services, the case of deduplication in cloud storage

Side channels in cloud services, the case of deduplication in cloud storage Side channels in cloud services, the case of deduplication in cloud storage Danny Harnik, Benny Pinkas, Alexandra Shulman-Peleg Presented by Yair Yona Yair Yona (TAU) Side channels in cloud services Advanced

More information

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside Managing the information that drives the enterprise STORAGE Buying Guide: DEDUPLICATION inside What you need to know about target data deduplication Special factors to consider One key difference among

More information

Seriously: Tape Only Backup Systems are Dead, Dead, Dead!

Seriously: Tape Only Backup Systems are Dead, Dead, Dead! Seriously: Tape Only Backup Systems are Dead, Dead, Dead! Agenda Overview Tape backup rule #1 So what s the problem? Intelligent disk targets Disk-based backup software Overview We re still talking disk

More information

Data Compression and Deduplication. LOC 2010 2010 Cisco Systems, Inc. All rights reserved.

Data Compression and Deduplication. LOC 2010 2010 Cisco Systems, Inc. All rights reserved. Data Compression and Deduplication LOC 2010 2010 Systems, Inc. All rights reserved. 1 Data Redundancy Elimination Landscape VMWARE DeDE IBM DDE for Tank Solaris ZFS Hosts (Inline and Offline) MDS + Network

More information

Data Backup and Archiving with Enterprise Storage Systems

Data Backup and Archiving with Enterprise Storage Systems Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia slavjan_ivanov@yahoo.com,

More information

EMC Backup solutions. Aleksandar Antić EMC BRS Territory Sales Adriatic region. Copyright 2011 EMC Corporation. All rights reserved.

EMC Backup solutions. Aleksandar Antić EMC BRS Territory Sales Adriatic region. Copyright 2011 EMC Corporation. All rights reserved. EMC Backup solutions Aleksandar Antić EMC BRS Territory Sales Adriatic region 1 EMC BRS Division Approximately 3,000 employees 10 R&D locations Market Leadership #1 in Deduplication #1 in Purpose Built

More information

09'Linux Plumbers Conference

09'Linux Plumbers Conference 09'Linux Plumbers Conference Data de duplication Mingming Cao IBM Linux Technology Center cmm@us.ibm.com 2009 09 25 Current storage challenges Our world is facing data explosion. Data is growing in a amazing

More information

EMC Data de-duplication not ONLY for IBM i

EMC Data de-duplication not ONLY for IBM i EMC Data de-duplication not ONLY for IBM i Maciej Mianowski EMC BRS Advisory TC May 2011 1 EMC is a TECHNOLOGY company EMC s focus is IT Infrastructure 2 EMC Portfolio Information Security Authentica Network

More information

Next Generation Backup Solutions

Next Generation Backup Solutions Next Generation Backup Solutions Aleksandar Antić EMC BRS Territory Sales Adriatic region 1 Data Protection Software Market Appearance Same Players Similar Share Backup to tape No major changes for decades

More information

Reducing Backups with Data Deduplication

Reducing Backups with Data Deduplication The Essentials Series: New Techniques for Creating Better Backups Reducing Backups with Data Deduplication sponsored by by Eric Beehler Reducing Backups with Data Deduplication... 1 Explaining Data Deduplication...

More information

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges September 2011 Table of Contents The Enterprise and Mobile Storage Landscapes... 3 Increased

More information

STORAGE SOURCE DATA DEDUPLICATION PRODUCTS. Buying Guide: inside

STORAGE SOURCE DATA DEDUPLICATION PRODUCTS. Buying Guide: inside Managing the information that drives the enterprise STORAGE Buying Guide: inside 2 Key features of source data deduplication products 5 Special considerations Source dedupe products can efficiently protect

More information

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant DISCOVER HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant HP StorageWorks Data Protection Solutions HP has it covered Near continuous data protection Disk Mirroring Advanced Backup

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information

IJESRT. Scientific Journal Impact Factor: 3.449 (ISRA), Impact Factor: 2.114

IJESRT. Scientific Journal Impact Factor: 3.449 (ISRA), Impact Factor: 2.114 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY Optimized Storage Approaches in Cloud Environment Sri M.Tanooj kumar, A.Radhika Department of Computer Science and Engineering,

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

Cloud-integrated Storage What & Why

Cloud-integrated Storage What & Why Cloud-integrated Storage What & Why Table of Contents Overview...3 CiS architecture...3 Enterprise-class storage platform...4 Enterprise tier 2 SAN storage...4 Activity-based storage tiering and data ranking...5

More information

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved. EMC DATA DOMAIN OVERVIEW 1 2 With Data Domain Deduplication Storage Systems, You Can WAN Retain longer Keep backups onsite longer with less disk for fast, reliable restores, and eliminate the use of tape

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

An In-Depth Look at Deduplication Technologies

An In-Depth Look at Deduplication Technologies An In-Depth Look at Deduplication Technologies White Paper Juan Orlandini, Datalink Mike Spindler, Datalink August 2008 Abstract: Deduplication is all the rage today, with a myriad of vendors offering

More information

Get Success in Passing Your Certification Exam at first attempt!

Get Success in Passing Your Certification Exam at first attempt! Get Success in Passing Your Certification Exam at first attempt! Exam : E22-290 Title : EMC Data Domain Deduplication, Backup and Recovery Exam Version : DEMO 1.A customer has a Data Domain system with

More information

Demystifying Deduplication for Backup with the Dell DR4000

Demystifying Deduplication for Backup with the Dell DR4000 Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett

More information

Copyright 2015 EMC Corporation. All rights reserved. 1

Copyright 2015 EMC Corporation. All rights reserved. 1 Copyright 2015 EMC Corporation. All rights reserved. 1 DATA DOMAIN BOOST: UNMATCHED SPEED & CONTROL PHILIP FOTE & GENE MAXWELL Copyright 2015 EMC Corporation. All rights reserved. 2 TWEET US! Are you already

More information

NetApp Data Fabric: Secured Backup to Public Cloud. Sonny Afen Senior Technical Consultant NetApp Indonesia

NetApp Data Fabric: Secured Backup to Public Cloud. Sonny Afen Senior Technical Consultant NetApp Indonesia NetApp Data Fabric: Secured Backup to Public Cloud Sonny Afen Senior Technical Consultant NetApp Indonesia Agenda Introduction Solution Overview Solution Technical Overview 2 Introduction 3 Hybrid cloud:

More information

Rose Business Technologies

Rose Business Technologies Primary Storage Data Reduction Data reduction on primary storage is a reality today and with the unchecked growth of data, it will undoubtedly become a key part of storage efficiency. Standard in many

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert

Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert Backup Software Data Deduplication: What you need to know Presented by W. Curtis Preston Executive Editor & Independent Backup Expert When I was in the IT Department When I started as backup guy at $35B

More information

Cloud-integrated Enterprise Storage. Cloud-integrated Storage What & Why. Marc Farley

Cloud-integrated Enterprise Storage. Cloud-integrated Storage What & Why. Marc Farley Cloud-integrated Enterprise Storage Cloud-integrated Storage What & Why Marc Farley Table of Contents Overview... 3 CiS architecture... 3 Enterprise-class storage platform... 4 Enterprise tier 2 SAN storage...

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features to reduce the complexity of managing data protection

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

The Curious Case of Database Deduplication. PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle

The Curious Case of Database Deduplication. PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle The Curious Case of Database Deduplication PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle Agenda Introduction Deduplication Databases and Deduplication All Flash Arrays and Deduplication 2 Quick Show

More information

Data deduplication technology: A guide to data deduping and backup

Data deduplication technology: A guide to data deduping and backup Tutorial Data deduplication technology: A guide to data deduping and backup Data deduplication is now a mainstream feature in data backup and recovery with an extensive range of vendors offering many different

More information

Backup and Recovery Redesign with Deduplication

Backup and Recovery Redesign with Deduplication Backup and Recovery Redesign with Deduplication Why the move is on September 9, 2010 1 Major trends driving the transformation of backup environments UNABATED DATA GROWTH Backup = 4 to 30 times production

More information

HP StoreOnce: reinventing data deduplication

HP StoreOnce: reinventing data deduplication HP : reinventing data deduplication Reduce the impact of explosive data growth with HP StorageWorks D2D Backup Systems Technical white paper Table of contents Executive summary... 2 Introduction to data

More information

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011 the Availability Digest Data Deduplication February 2011 What is Data Deduplication? Data deduplication is a technology that can reduce disk storage-capacity requirements and replication bandwidth requirements

More information

Backup of NAS devices with Avamar

Backup of NAS devices with Avamar Backup of NAS devices with Avamar Extremely fast / no load Video describing NAS backup using Avamar based on this ppt: https://youtu.be/swg1ejldgmw The most fresh version of this document, you will find

More information

DXi Accent Technical Background

DXi Accent Technical Background TECHNOLOGY BRIEF NOTICE This Technology Brief contains information protected by copyright. Information in this Technology Brief is subject to change without notice and does not represent a commitment on

More information

CIGRE 2014: Udaljena zaštita podataka

CIGRE 2014: Udaljena zaštita podataka CIGRE 2014: Udaljena zaštita podataka Žarko Stupar Product Manager zstupar@mds.rs "" 1 Agenda Udaljena zaštita podataka - pristup Replikacija podataka između data centara Napredna backup rešenja Replikacija

More information

Technical White Paper for the Oceanspace VTL6000

Technical White Paper for the Oceanspace VTL6000 Document No. Technical White Paper for the Oceanspace VTL6000 Issue V2.1 Date 2010-05-18 Huawei Symantec Technologies Co., Ltd. Copyright Huawei Symantec Technologies Co., Ltd. 2010. All rights reserved.

More information

Multimedia Systems WS 2010/2011

Multimedia Systems WS 2010/2011 Multimedia Systems WS 2010/2011 31.01.2011 M. Rahamatullah Khondoker (Room # 36/410 ) University of Kaiserslautern Department of Computer Science Integrated Communication Systems ICSY http://www.icsy.de

More information

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group DEDUPLICATION NOW AND WHERE IT S HEADING Lauren Whitehouse Senior Analyst, Enterprise Strategy Group Need Dedupe? Before/After Dedupe Deduplication Production Data Deduplication In Backup Process Backup

More information

Enterprise-class Backup Performance with Dell DR6000 Date: May 2014 Author: Kerry Dolan, Lab Analyst and Vinny Choinski, Senior Lab Analyst

Enterprise-class Backup Performance with Dell DR6000 Date: May 2014 Author: Kerry Dolan, Lab Analyst and Vinny Choinski, Senior Lab Analyst ESG Lab Review Enterprise-class Backup Performance with Dell DR6000 Date: May 2014 Author: Kerry Dolan, Lab Analyst and Vinny Choinski, Senior Lab Analyst Abstract: This ESG Lab review documents hands-on

More information

zdelta: An Efficient Delta Compression Tool

zdelta: An Efficient Delta Compression Tool zdelta: An Efficient Delta Compression Tool Dimitre Trendafilov Nasir Memon Torsten Suel Department of Computer and Information Science Technical Report TR-CIS-2002-02 6/26/2002 zdelta: An Efficient Delta

More information

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved. Redefining Backup for VMware Environment 1 Agenda VMware infrastructure backup and recovery challenges Introduction to EMC Avamar Avamar solutions for VMware infrastructure Key takeaways Copyright 2009

More information

Analysis of Compression Algorithms for Program Data

Analysis of Compression Algorithms for Program Data Analysis of Compression Algorithms for Program Data Matthew Simpson, Clemson University with Dr. Rajeev Barua and Surupa Biswas, University of Maryland 12 August 3 Abstract Insufficient available memory

More information

Backup and Disaster Recovery Planning On a Budget. Presented by: Najam Saeed Lisa Ulrich

Backup and Disaster Recovery Planning On a Budget. Presented by: Najam Saeed Lisa Ulrich Backup and Disaster Recovery Planning On a Budget Presented by: Najam Saeed Lisa Ulrich Aging Backup System Symantec Backup Exec 11 Hardware Dell PowerEdge2950 Overland REO9000 7.4TB Overland REO4000 4TB

More information

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard UNDERSTANDING DATA DEDUPLICATION Tom Sas Hewlett-Packard SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material

More information

EMC BACKUP AND RECOVERY SOLUTIONS

EMC BACKUP AND RECOVERY SOLUTIONS EMC BACKUP AND RECOVERY SOLUTIONS Backup to the future BRS PARTNER UPDATE Sofia, March 14 th, 2011 horia.constantinescu@emc.com dumitru.taraianu@emc.com 1 Agenda EMC backup and recovery solutions Backup

More information

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b

LZ77. Example 2.10: Let T = badadadabaab and assume d max and l max are large. phrase b a d adadab aa b LZ77 The original LZ77 algorithm works as follows: A phrase T j starting at a position i is encoded as a triple of the form distance, length, symbol. A triple d, l, s means that: T j = T [i...i + l] =

More information

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved.

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved. DPAD Introduction EMC Data Protection and Availability Division 1 EMC 的 備 份 與 回 復 的 解 決 方 案 Data Domain Avamar NetWorker Data Protection Advisor 2 EMC 雙 活 資 料 中 心 的 解 決 方 案 移 動 性 ( Mobility ) 可 用 性 ( Availability

More information

Protecting Information in a Smarter Data Center with the Performance of Flash

Protecting Information in a Smarter Data Center with the Performance of Flash 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 Protecting Information in a Smarter Data Center with the Performance of Flash IBM FlashSystem and IBM ProtecTIER Printed in

More information

Checklist and Tips to Choosing the Right Backup Strategy

Checklist and Tips to Choosing the Right Backup Strategy E-Guide Checklist and Tips to Choosing the Right Backup Strategy Data deduplication is no longer just a cool technology, it's become a fairly common component of modern data backup strategies. Learn how

More information

Contents. WD Arkeia Page 2 of 14

Contents. WD Arkeia Page 2 of 14 Contents Contents...2 Executive Summary...3 What Is Data Deduplication?...4 Traditional Data Deduplication Strategies...5 Deduplication Challenges...5 Single-Instance Storage...5 Fixed-Block Deduplication...6

More information

sulbhaghadling@gmail.com

sulbhaghadling@gmail.com www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 3 March 2015, Page No. 10715-10720 Data DeDuplication Using Optimized Fingerprint Lookup Method for

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve

More information

efficient protection, and impact-less!!

efficient protection, and impact-less!! Converged, Hyper- or Flash Sample photo. Replace if desired. efficient protection, and impact-less!! Bogdan Stefanescu (..aka Bogs) EMC Data Protection Solutions bogdan.stefanescu@emc.com 1 ALL DATA HAS

More information

A block based storage model for remote online backups in a trust no one environment

A block based storage model for remote online backups in a trust no one environment A block based storage model for remote online backups in a trust no one environment http://www.duplicati.com/ Kenneth Skovhede (author, kenneth@duplicati.com) René Stach (editor, rene@duplicati.com) Abstract

More information

Reducing Costs and Complexity with CommVault

Reducing Costs and Complexity with CommVault Reducing Costs and Complexity with CommVault Agenda The CommVault approach to Data Management Infrastructure De-duplication Snapshots VM backup and recovery Reducing costs with CommVault new pricing options

More information

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON UNDERSTANDING DATA DEDUPLICATION Thomas Rivera SEPATON SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material

More information

Oracle Data Protection Concepts

Oracle Data Protection Concepts Oracle Data Protection Concepts Matthew Ellis Advisory Systems Engineer BRS Database Technologist, EMC Corporation Accelerating Transformation EMC Backup Recovery Systems Division 1 Agenda Market Conditions.

More information

Deduplication Demystified: How to determine the right approach for your business

Deduplication Demystified: How to determine the right approach for your business Deduplication Demystified: How to determine the right approach for your business Presented by Charles Keiper Senior Product Manager, Data Protection Quest Software Session Objective: To answer burning

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

idedup: Latency-aware, inline data deduplication for primary storage

idedup: Latency-aware, inline data deduplication for primary storage idedup: Latency-aware, inline data deduplication for primary storage Kiran Srinivasan, Tim Bisson, Garth Goodson, Kaladhar Voruganti NetApp, Inc. {skiran, tbisson, goodson, kaladhar}@netapp.com Abstract

More information

Data Deduplication in a Virtual Tape Library Environment

Data Deduplication in a Virtual Tape Library Environment Data Deduplication in a Virtual Tape Library Environment Mathias Defiebre IBM Lab Services mathias.defiebre@de.ibm.com STG Technical Conferences 2010 Agenda Data Deduplication Overview Data Deduplication

More information

Dell Data Protection. Marek Istok Ŋ Dell Slovakia

Dell Data Protection. Marek Istok Ŋ Dell Slovakia Dell Marek Istok Ŋ Dell Slovakia The Dell Portfolio Everything. Every time. On time.! Protect the full spectrum of your data across physical, virtual, and cloud. Shrink backup windows to just minutes;

More information

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved. Cost Effective Backup with Deduplication Agenda Today s Backup Challenges Benefits of Deduplication Source and Target Deduplication Introduction to EMC Backup Solutions Avamar, Disk Library, and NetWorker

More information

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007 Tiered Data Protection Strategy Data Deduplication Thomas Störr Sales Director Central Europe November 8, 2007 Overland Storage Tiered Data Protection = Good = Better = Best! NEO / ARCvault REO w/ expansion

More information

Side channels in cloud services, the case of deduplication in cloud storage

Side channels in cloud services, the case of deduplication in cloud storage Side channels in cloud services, the case of deduplication in cloud storage Danny Harnik IBM Haifa Research Lab dannyh@il.ibm.com Benny Pinkas Bar Ilan University benny@pinkas.net Alexandra Shulman-Peleg

More information

EMC DATA DOMAIN PRODUCT OvERvIEW

EMC DATA DOMAIN PRODUCT OvERvIEW EMC DATA DOMAIN PRODUCT OvERvIEW Deduplication storage for next-generation backup and archive Essentials Scalable Deduplication Fast, inline deduplication Provides up to 65 PBs of logical storage for long-term

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave annon Tivoli Storage Manager rchitect March 2009 Topics Tivoli Storage, IM Software Group Deduplication technology Data reduction and deduplication in

More information

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s. UNDERSTANDING DATA DEDUPLICATION Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s. SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual

More information

E-Guide. Sponsored By:

E-Guide. Sponsored By: E-Guide An in-depth look at data deduplication methods This E-Guide will discuss the various approaches to data deduplication. You ll learn the pros and cons of each, and will benefit from independent

More information

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Table of Contents Introduction... 3 Shortest Possible Backup Window... 3 Instant

More information

Barracuda Backup Deduplication. White Paper

Barracuda Backup Deduplication. White Paper Barracuda Backup Deduplication White Paper Abstract Data protection technologies play a critical role in organizations of all sizes, but they present a number of challenges in optimizing their operation.

More information

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

Real-time Compression: Achieving storage efficiency throughout the data lifecycle Real-time Compression: Achieving storage efficiency throughout the data lifecycle By Deni Connor, founding analyst Patrick Corrigan, senior analyst July 2011 F or many companies the growth in the volume

More information

Understanding the HP Data Deduplication Strategy

Understanding the HP Data Deduplication Strategy Understanding the HP Data Deduplication Strategy Why one size doesn t fit everyone Table of contents Executive Summary... 2 Introduction... 4 A word of caution... 5 Customer Benefits of Data Deduplication...

More information

89 Fifth Avenue, 7th Floor. New York, NY 10003. www.theedison.com 212.367.7400. White Paper. HP 3PAR Thin Deduplication: A Competitive Comparison

89 Fifth Avenue, 7th Floor. New York, NY 10003. www.theedison.com 212.367.7400. White Paper. HP 3PAR Thin Deduplication: A Competitive Comparison 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 White Paper HP 3PAR Thin Deduplication: A Competitive Comparison Printed in the United States of America Copyright 2014 Edison

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features that reduce the complexity of managing data protection

More information

vsphere Data Protection 6.0 VDP 6.0

vsphere Data Protection 6.0 VDP 6.0 vsphere Data Protection 6.0 VDP 6.0 How to backup VMware environments? Daniel Olkowski EMC Data Protection and Availability Division Europe EAST 1 Goal of the meeting Where to use new vsphere Data Protection

More information

Compression techniques

Compression techniques Compression techniques David Bařina February 22, 2013 David Bařina Compression techniques February 22, 2013 1 / 37 Contents 1 Terminology 2 Simple techniques 3 Entropy coding 4 Dictionary methods 5 Conclusion

More information

Choosing an Enterprise-Class Deduplication Technology

Choosing an Enterprise-Class Deduplication Technology WHITE PAPER Choosing an Enterprise-Class Deduplication Technology 10 Key Questions to Ask Your Deduplication Vendor 400 Nickerson Road, Marlborough, MA 01752 P: 866.Sepaton or 508.490.7900 F: 508.490.7908

More information

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski 19-05-2011 Spała

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski 19-05-2011 Spała Data Deduplication in Tivoli Storage Manager Andrzej Bugowski 19-05-2011 Spała Agenda Tivoli Storage, IBM Software Group Deduplication concepts Data deduplication in TSM 6.1 Planning for data deduplication

More information

LDA, the new family of Lortu Data Appliances

LDA, the new family of Lortu Data Appliances LDA, the new family of Lortu Data Appliances Based on Lortu Byte-Level Deduplication Technology February, 2011 Copyright Lortu Software, S.L. 2011 1 Index Executive Summary 3 Lortu deduplication technology

More information

WHITE PAPER. Effectiveness of Variable-block vs Fixedblock Deduplication on Data Reduction: A Technical Analysis

WHITE PAPER. Effectiveness of Variable-block vs Fixedblock Deduplication on Data Reduction: A Technical Analysis WHITE PAPER Effectiveness of Variable-block vs Fixedblock Deduplication on Data Reduction: A Technical Analysis CONTENTS Executive Summary... 3 Fixed vs. Variable-block Deduplication... 3 Test Configuration...

More information

METHODOLOGY FOR OPTIMIZING STORAGE ON CLOUD USING AUTHORIZED DE-DUPLICATION A Review

METHODOLOGY FOR OPTIMIZING STORAGE ON CLOUD USING AUTHORIZED DE-DUPLICATION A Review METHODOLOGY FOR OPTIMIZING STORAGE ON CLOUD USING AUTHORIZED DE-DUPLICATION A Review 1 Ruchi Agrawal, 2 Prof.D.R. Naidu 1 M.Tech Student, CSE Department, Shri Ramdeobaba College of Engineering and Management,

More information

On the Use of Compression Algorithms for Network Traffic Classification

On the Use of Compression Algorithms for Network Traffic Classification On the Use of for Network Traffic Classification Christian CALLEGARI Department of Information Ingeneering University of Pisa 23 September 2008 COST-TMA Meeting Samos, Greece Outline Outline 1 Introduction

More information

We look beyond IT. Cloud Offerings

We look beyond IT. Cloud Offerings Cloud Offerings cstor Cloud Offerings As today s fast-moving businesses deal with increasing demands for IT services and decreasing IT budgets, the onset of cloud-ready solutions has provided a forward-thinking

More information

Trends in Enterprise Backup Deduplication

Trends in Enterprise Backup Deduplication Trends in Enterprise Backup Deduplication Shankar Balasubramanian Architect, EMC 1 Outline Protection Storage Deduplication Basics CPU-centric Deduplication: SISL (Stream-Informed Segment Layout) Data

More information

Clash of the Titans. I/O System Performance. mag. Sergej Rožman; Abakus plus d.o.o. http://www.abakus.si/

Clash of the Titans. I/O System Performance. mag. Sergej Rožman; Abakus plus d.o.o. http://www.abakus.si/ Clash of the Titans I/O System Performance mag. Sergej Rožman; Abakus plus d.o.o. The latest version of this document is available at: http://www.abakus.si/ Clash of the Titans I/O System Performance Abakus

More information

Платформа NetBackup 7.6. What's new in NetBackup 7.6? 1

Платформа NetBackup 7.6. What's new in NetBackup 7.6? 1 Платформа NetBackup 7.6 What's new in NetBackup 7.6? 1 Building the NetBackup Platform 3 Key Investment Areas 1. Optimize for Source Workloads Physical Virtual Arrays Big Data Accelerator V-Ray Replication

More information

Two-Level Metadata Management for Data Deduplication System

Two-Level Metadata Management for Data Deduplication System Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,

More information

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento EMC DATA PROTECTION Backup ed Archivio su cui fare affidamento 1 Challenges with Traditional Tape Tightening backup windows Lengthy restores Reliability, security and management issues Inability to meet

More information

EMC AVAMAR. Deduplication backup software and system. Copyright 2012 EMC Corporation. All rights reserved.

EMC AVAMAR. Deduplication backup software and system. Copyright 2012 EMC Corporation. All rights reserved. EMC AVAMAR Deduplication backup software and system 1 IT Pressures 2009 2020 0.8 zettabytes 35.2 zettabytes DATA DELUGE BUDGET DILEMMA Transformation INFRASTRUCTURE SHIFT COMPLIANCE and DISCOVERY 2 EMC

More information

The do s and don ts. E-Guide

The do s and don ts. E-Guide E-Guide The do s and don ts of data deduplication Data Deduplication continues to gain momentum as one of the most popular backup trends hitting the storage market today. This E-Guide will highlight the

More information

Data Domain & Deduplication Basics 101

Data Domain & Deduplication Basics 101 Data Domain & Deduplication Basics 101 Data Domain & Avamar Solutions vs. Traditional Tape Solutions A webcast presented by IT Convergence, June 19 th, 2014 How to Ask Questions Feel free to ask questions

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 58.7 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM ESSENTIALS HIGH-SPEED, SCALABLE DEDUPLICATION Up to 58.7 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability DATA INVULNERABILITY ARCHITECTURE Inline write/read

More information