E-Guide. Sponsored By:



Similar documents
Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Inline Deduplication

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

The do s and don ts. E-Guide

DeltaStor Data Deduplication: A Technical Review

Deduplication Demystified: How to determine the right approach for your business

Checklist and Tips to Choosing the Right Backup Strategy

Data Deduplication and Tivoli Storage Manager

Reducing Backups with Data Deduplication

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

ESG REPORT. Data Deduplication Diversity: Evaluating Software- vs. Hardware-Based Approaches. By Lauren Whitehouse. April, 2009

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Availability Digest. Data Deduplication February 2011

Data Deduplication: An Essential Component of your Data Protection Strategy

How to Get Started With Data

Eight Considerations for Evaluating Disk-Based Backup Solutions

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

Data Deduplication and Tivoli Storage Manager

Deduplication has been around for several

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

3Gen Data Deduplication Technical

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

Accelerating Backup/Restore with the Virtual Tape Library Configuration That Fits Your Environment

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January Permabit Technology Corporation

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

WHITE PAPER. How Deduplication Benefits Companies of All Sizes An Acronis White Paper

White Paper. Experiencing Data De-Duplication: Improving Efficiency and Reducing Capacity Requirements

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant

Business Solutions. TOTA L LY O p e n D ata Protec t i o n

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Don t Get Duped By. Dedupe. 7 Technology Circle Suite 100 Columbia, SC Phone: sales@unitrends.com URL:

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

Data Deduplication Background: A Technical White Paper

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

An In-Depth Look at Deduplication Technologies

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

Demystifying Deduplication for Backup with the Dell DR4000

Evaluation Guide. Software vs. Appliance Deduplication

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup

ExaGrid Product Description. Cost-Effective Disk-Based Backup with Data Deduplication

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

HP StoreOnce: reinventing data deduplication

Don t Get Duped By Dedupe or Dedupe Vendors

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper

Technology Fueling the Next Phase of Storage Optimization

Restoration Technologies. Mike Fishman / EMC Corp.

Data Deduplication for Corporate Endpoints

Understanding the HP Data Deduplication Strategy

Seriously: Tape Only Backup Systems are Dead, Dead, Dead!

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

LDA, the new family of Lortu Data Appliances

Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert

WHITE PAPER. Dedupe-Centric Storage. Hugo Patterson, Chief Architect, Data Domain. Storage. Deduplication. September 2007

Data deduplication technology: A guide to data deduping and backup

EMC Disk Library with EMC Data Domain Deployment Scenario

Every organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its

Protect Data... in the Cloud

WHITE PAPER Data Deduplication for Backup: Accelerating Efficiency and Driving Down IT Costs

Enterprise Backup and Restore technology and solutions

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

Choosing an Enterprise-Class Deduplication Technology

ABOUT DISK BACKUP WITH DEDUPLICATION

Deduplication, Incremental Forever, and the. Olsen Twins. 7 Technology Circle Suite 100 Columbia, SC 29203

Optimizing Backup and Data Protection in Virtualized Environments. January 2009

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

Introduction to Data Protection: Backup to Tape, Disk and Beyond

CONSOLIDATE MORE: HIGH- PERFORMANCE PRIMARY DEDUPLICATION IN THE AGE OF ABUNDANT CAPACITY

Optimizing Data Protection Operations in VMware Environments

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski Spała

Optimizing storage efficiency through deduplication

Backup and Disaster Recovery Planning On a Budget. Presented by: Najam Saeed Lisa Ulrich

The Business Value of Data Deduplication DDSR SIG

An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space

How To Protect Data On Network Attached Storage (Nas) From Disaster

Disaster Recovery Strategies: Business Continuity through Remote Backup Replication

Protect Microsoft Exchange databases, achieve long-term data retention

09'Linux Plumbers Conference

Lab Benchmark Testing Report. Joint Solution: Syncsort Backup Express (BEX) and NetApp Deduplication. Comparative Data Reduction Tests

Alternatives to Big Backup

Backup and Recovery: The Benefits of Multiple Deduplication Policies

Backup Software? Article on things to consider when looking for a backup solution. 11/09/2015 Backup Appliance or

EMC VELOCITY BACKUP & RECOVERY SPECIALTY SIMPLE. PREDICTABLE. PROFITABLE.

Backup to Tape, Disk and Beyond. Jason Iehl, NetApp

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

Doc. Code. OceanStor VTL6900 Technical White Paper. Issue 1.1. Date Huawei Technologies Co., Ltd.

Protecting Information in a Smarter Data Center with the Performance of Flash

Sales Tool. Summary DXi Sales Messages November NOVEMBER ST00431-v06

Take Advantage of Data De-duplication for VMware Backup

Understanding data deduplication ratios June 2008

Data Deduplication HTBackup

HP Store Once. Backup to Disk Lösungen. Architektur, Neuigkeiten. rené Loser, Senior Technology Consultant HP Storage Switzerland

(Formerly Double-Take Backup)

Virtualize Without Compromise. Protecting and Storing Virtualized Data

Archiving, Backup, and Recovery for Complete the Promise of Virtualization

Using the NDMP File Service for DMA- Driven Replication for Disaster Recovery. Hugo Patterson

Transcription:

E-Guide An in-depth look at data deduplication methods This E-Guide will discuss the various approaches to data deduplication. You ll learn the pros and cons of each, and will benefit from independent expert insight that will help you get the most out of the approach you take with data deduplication technology. Sponsored By:

Table of Contents E-Guide An in-depth look at data deduplication methods Table of Contents: The pros and cons of file-level vs. block-level datadeduplication technology Data deduplication methods: Block-level versusbyte-level dedupe Resources from FalconStor Software Sponsored by: Page 2 of 8

The pros and cons of file-level vs. block-level data deduplication technology The pros and cons of file-level vs. block-level data deduplication technology Lauren Whitehouse Data Deduplication has dramatically improved the value proposition of disk-based data protection as well as WAN-based remote- and branch-office backup consolidation and disaster recovery (DR) strategies. It identifies duplicate data, removing redundancies and reducing the overall capacity of data transferred and stored. Some deduplication approaches operate at the file level, while others go deeper to examine data at a sub-file, or block, level. Determining uniqueness at either the file or block level will offer benefits, though results will vary. The differences lie in the amount of reduction each produces and the time each approach takes to determine what's unique. File-level deduplication Also commonly referred to as single-instance storage (SIS), file-level data deduplication compares a file to be backed up or archived with those already stored by checking its attributes against an index. If the file is unique, it is stored and the index is updated; if not, only a pointer to the existing file is stored. The result is that only one instance of the file is saved and subsequent copies are replaced with a "stub" that points to the original file. Block-level deduplication Block-level data deduplication operates on the sub-file level. As its name implies, the file is typically broken down into segments -- chunks or blocks -- that are examined for redundancy vs. previously stored information. The most popular approach for determining duplicates is to assign an identifier to a chunk of data, using a hash algorithm, for example, that generates a unique ID or "fingerprint" for that block. The unique ID is then compared with a central index. If the ID exists, then the data segment has been processed and stored before. Therefore, only a pointer to the previously stored data needs to be saved. If the ID is new, then the block is unique. The unique ID is added to the index and the unique chunk is stored. The size of the chunk to be examined varies from vendor to vendor. Some have fixed block sizes, while others use variable block sizes (and to make it even more confusing, a few allow end users to vary the size of the fixed block). Fixed blocks could be 8 KB or maybe 64 KB -- the difference is that the smaller the chunk, the more likely the opportunity to identify it as redundant. This, in turn, means even greater reductions as even less data is stored. The only issue with fixed blocks is that if a file is modified and the deduplication product uses the same fixed blocks from the last inspection, it might not detect redundant segments because as the blocks in the file are changed or moved, they shift downstream from the change, offsetting the rest of the comparisons. Variable-sized blocks help increase the odds that a common segment will be detected even after a file is modified. Sponsored by: Page 3 of 8

The pros and cons of file-level vs. block-level data deduplication technology This approach finds natural patterns or break points that might occur in a file and then segments the data accordingly. Even if blocks shift when a file is changed, this approach is more likely to find repeated segments. The tradeoff? A variable-length approach may require a vendor to track and compare more than just one unique ID for a segment, which could affect index size and computational time. The differences between file- and block-level deduplication go beyond just how they operate. There are advantages and disadvantages to each approach. File-level approaches can be less efficient than block-based deduplication: A change within the file causes the whole file to be saved again. A file, such as a PowerPoint presentation, can have something as simple as the title page changed to reflect a new presenter or date -- this will cause the entire file to be saved a second time. Block-based deduplication would only save the changed blocks between one version of the file and the next. Reduction ratios may only be in the 5:1 or less range whereas block-based deduplication has been shown to reduce capacity in the 20:1 to 50:1 range for stored data. File-level approaches can be more efficient than block-based data deduplication: Indexes for file-level deduplication are significantly smaller, which takes less computational time when duplicates are being determined. Backup performance is, therefore, less affected by the deduplication process. File-level processes require less processing power due to the smaller index and reduced number of comparisons. Therefore, the impact on the systems performing the inspection is less. The impact on recovery time is low. Block-based deduplication will require "reassembly" of the chunks based on the master index that maps the unique segments and pointers to unique segments. Since file-based approaches store unique files and pointers to existing unique files there is less to reassemble. Sponsored by: Page 4 of 8

We selected FalconStor because we were confident they could offer a highly scalable VTL solution that provides data de-duplication, offsite replication, and tape integration, with zero impact to our backup performance. Henry Denis, IT Director Epiq Systems FalconStor VirtualTape Library (VTL) provides disk-based data protection and de-duplication to vastly improve the reliability, speed, and predictability of backups. To learn more about our industry-leading VTL solutions with de-duplication: Contact FalconStor at 866-NOW-FALC (866-669-3252) or visit www.falconstor.com

Data deduplication methods: Block-level versus byte-level dedupe Data deduplication methods: Block-level versus byte-level dedupe Lauren Whitehouse Data deduplication identifies duplicate data, removing redundancies and reducing the overall capacity of data transferred and stored. In my last article, I reviewed the differences between file-level and block-level data deduplication. In this article, I'll assess byte-level versus block-level deduplication. Byte-level deduplication provides a more granular inspection of data than block-level approaches, ensuring more accuracy, but it often requires more knowledge of the backup stream to do its job. Block-level approaches Block-level data deduplication segments data streams into blocks, inspecting the blocks to determine if each has been encountered before (typically by generating a digital signature or unique identifier via a hash algorithm for each block). If the block is unique, it is written to disk and its unique identifier is stored in an index; otherwise, only a pointer to the original, unique block is stored. By replacing repeated blocks with much smaller pointers rather than storing the block again, disk storage space is saved. The criticism of block-based approaches are 1) the use of a hash algorithm to calculate the unique ID brings the risk of generating a false positive; and 2) storing unique IDs in an index can slow the inspection process as it grows larger and requires disk I/O (unless the index size is kept in check and data comparison occurs in memory). Hash collisions could spell a false positive when use a hash-based algorithm for determining duplicates. Hash algorithms, such as MD5 and SHA-1, generate a unique number for the chunk of data being examined. While hash collisions and the resulting data corruption are possible, the chances are slim that a hash collision will occur. Byte-level data deduplication Analyzing data streams at the byte level is another approach to deduplication. By performing a byte-by-byte comparison of new data streams versus previously stored ones, a higher level of accuracy can be delivered. Deduplication products that use this method have one thing in common: It's likely that the incoming backup data stream has been seen before, so it is reviewed to see if it matches similar data received in the past. Products leveraging a byte-level approach are typically "content aware," which means the vendor has done some reverse engineering of the backup application's data stream to understand how to retrieve information such as the file name, file type, date/time stamp, etc. This method reduces the amount of computation required to determine unique versus duplicate data. The caveat? This approach typically occurs post-process -- performed on backup data once the backup has completed. Backup jobs, therefore, complete at full disk performance, but require a reserve of disk cache to perform the deduplication process. It's also likely that the deduplication process is limited to a backup stream from a single backup set and not applied "globally" across backup sets. Sponsored by: Page 6 of 8

Data deduplication methods: Block-level versus byte-level dedupe Once the deduplication process is complete, the solution reclaims disk space by deleting the duplicate data. Before space reclamation is performed, an integrity check can be performed to ensure that the deduplicated data matches the original data objects. The last full backup can also be maintained so recovery is not dependent on reconstituting deduplicated data, enabling rapid recovery. Which approach Is best? Both block- and byte-level methods deliver the benefit of optimizing storage capacity. When, where, and how the processes work should be reviewed for your backup environment and its specific requirements before selecting one approach over another. Your vetting process should also include references from organizations with similar characteristics and requirements. Sponsored by: Page 7 of 8

Resources from FalconStor Software Resources from FalconStor Software Book Chapter: SAN For Dummies, Chapter 13: Using Data De-duplication to Lighten the Load White Paper: Demystifying Data De-Duplication: Choosing the Best Solution Webcast: Enhancing Disk-to-Disk Backup with Data Deduplication About FalconStor Software FalconStor Software, Inc. (NASDAQ: FALC) is the premier provider of TOTALLY Open data protection solutions. We deliver proven, comprehensive data protection solutions that facilitate the continuous availability of business-critical data with speed, integrity, and simplicity. Our technology independent solutions, built upon the award-winning IPStor virtualization platform, include the industry s leading Virtual Tape Library (VTL) with deduplication, Continuous Data Protector (CDP), File-interface Deduplication System, and Network Storage Server (NSS), each enabled with WAN-optimized replication for disaster recovery and remote office protection. Our products are available from major OEMs and solution providers and are deployed by thousands of customers worldwide, from small businesses to Fortune 1000 enterprises. For more information visit www.falconstor.com. Sponsored by: Page 8 of 8