Data Deduplication HTBackup



Similar documents
Data Deduplication and Corporate PC Backup

Deduplication Demystified: How to determine the right approach for your business

3Gen Data Deduplication Technical

Data Deduplication for Corporate Endpoints

The Business Value of Data Deduplication DDSR SIG

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007

The Curious Case of Database Deduplication. PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle

Understanding EMC Avamar with EMC Data Protection Advisor

Demystifying Deduplication for Backup with the Dell DR4000

Data Deduplication: An Essential Component of your Data Protection Strategy

LDA, the new family of Lortu Data Appliances

Data Deduplication and Tivoli Storage Manager

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Understanding EMC Avamar with EMC Data Protection Advisor

09'Linux Plumbers Conference

Barracuda Backup Deduplication. White Paper

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup

STORAGE SOURCE DATA DEDUPLICATION PRODUCTS. Buying Guide: inside

Backup Software Data Deduplication: What you need to know. Presented by W. Curtis Preston Executive Editor & Independent Backup Expert

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Data Deduplication Background: A Technical White Paper

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

WHITE PAPER Data Deduplication for Backup: Accelerating Efficiency and Driving Down IT Costs

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski Spała

Data Deduplication and Tivoli Storage Manager

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

DeltaStor Data Deduplication: A Technical Review

efficient protection, and impact-less!!

Symantec NetBackup PureDisk Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

Hardware Configuration Guide

SPECIAL REPORT. Data Deduplication. Deep Dive. Put your backups on a diet. Copyright InfoWorld Media Group. All rights reserved.

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

Inline Deduplication

Availability Digest. Data Deduplication February 2011

CEMEX en Concreto con EMC. Jose Luis Bedolla EMC Corporation Back Up Recovery and Archiving

Veritas Backup Exec 15: Deduplication Option

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Turnkey Deduplication Solution for the Enterprise

Efficient Backup with Data Deduplication Which Strategy is Right for You?

DXi Accent Technical Background


WHITE PAPER Backup and Recovery: Accelerating Efficiency and Driving Down IT Costs Using Data Deduplication

EMC DATA DOMAIN OPERATING SYSTEM

ExaGrid Product Description. Cost-Effective Disk-Based Backup with Data Deduplication

Hvordan sikrer du ditt virtuelle datasenter?

HP StoreOnce: reinventing data deduplication

Protect Data... in the Cloud

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

EMC AVAMAR. a reason for Cloud. Deduplication backup software Replication for Disaster Recovery

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

EMC AVAMAR. Deduplication backup software and system ESSENTIALS DRAWBACKS OF CONVENTIONAL BACKUP AND RECOVERY

Reducing Replication Bandwidth for Distributed Document Databases

Backup Exec 15: Deduplication Option

Backup Exec 2014: Deduplication Option

Protect Microsoft Exchange databases, achieve long-term data retention

Get Success in Passing Your Certification Exam at first attempt!

Evaluation Guide. Software vs. Appliance Deduplication

How to Get Started With Data

Avamar. Technology Overview

EMC Disk Library with EMC Data Domain Deployment Scenario

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

Eight Considerations for Evaluating Disk-Based Backup Solutions

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

Data deduplication is more than just a BUZZ word

W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System

Effective Planning and Use of TSM V6 Deduplication

How To Make A Backup System More Efficient

NETAPP WHITE PAPER Looking Beyond the Hype: Evaluating Data Deduplication Solutions

WAN Optimization and Thin Client: Complementary or Competitive Application Delivery Methods? Josh Tseng, Riverbed

Optimizing Backup and Data Protection in Virtualized Environments. January 2009

Brian LaGoe, Systems Administrator Benjamin Jellema, Systems Administrator Eastern Michigan University

TEST REPORT SUMMARY MAY 2010 Symantec Backup Exec 2010: Source deduplication advantages in database server, file server, and mail server scenarios

Module: Business Continuity

Deduplication on SNC NAS: UI Configurations and Impact on Capacity Utilization

Effective Planning and Use of IBM Tivoli Storage Manager V6 and V7 Deduplication

Best Practices for Architecting Storage in Virtualized Environments

Transcription:

Data Deduplication HTBackup

HTBackup and it s Deduplication technology is touted as one of the best ways to manage today's explosive data growth. If you're new to the technology, these key facts will help you get up to speed. Deduplication is used for a variety of purposes Deduplication is used in any number of different products. Compression utilities such as WinZip perform deduplication, but so do many of the WAN optimization solutions. HTBackup, HTBase s backup & recovery product, also offers deduplication. Higher rations produce diminishing returns The effectiveness of data deduplication is measured as a ratio. Although higher ratios do convey a higher degree of deduplication, they can be misleading. It is impossible to deduplicate a file in a way that shrinks the file by 100%. Hence, higher compression ratios have diminishing returns. To show you what I mean, consider what happens when you deduplicate 1 TB of data. A 20:1 compression ratio reduces the size of the data from 1 TB to 51.2 GB. However, a 25:1 compression ratio reduces the size of the data to only 40.96 GB. Going from 20:1 to 25:1 only yields an extra 1% savings and reduces the data by about 10 GB more than using 20:1. Deduplication can be CPU intensive Many deduplication algorithms work by hashing chunks of data and then comparing the hashes for duplicates. This hashing process is CPU intensive. This isn't usually a big deal if the deduplication process is offloaded to an appliance or if it occurs on a backup target, but when source deduplication takes place on a production server, the process can sometimes affect the server's performance.

A more practical use of drives One of the benefits of performing deduplication across virtual machines on a host server is that doing so reduces the amount of physical disk space consumed by virtual machines. For some organizations, this might make the use of solid-state storage more practical for use with virtualization hosts. Solid-state drives have a much smaller capacity than traditional hard drives, but they deliver better performance because there are no moving parts. Technological Classification The practical benefits of this technology depend upon various factors like 1 Point of Application - Source vs. Target 2 Time of Application - Inline vs. Post-Process 3 Granularity - File vs. Sub-File level 4 Algorithm - Fixed size blocks vs. Variable length data segments A simple relation between these factors can be explained using the diagram below: Target vs. Source Target based deduplication acts on the target data storage media. In this case the client is unmodified and not aware of any deduplication. The deduplication engine can embedded in the hardware array, which can be used as NAS/SAN device with deduplication capabilities. Alternatively it can also be offered as an independent software or hardware appliance which acts as intermediary between backup server and storage arrays. In both cases it improves only the storage utilization.

Source based deduplication, on the contrary, acts on the data at the source before it s moved. A deduplication aware backup agent is installed on the client which backs up only unique data. The result is improved bandwidth and storage utilization. But, this imposes additional computational load on the backup client. Fixed vs. Variable Fixed-length block approach, as the name suggests, divides the files into fixed size length blocks and uses simple checksum (MD5/SHA etc.) based approach to find duplicates. Although it's possible to look for repeated blocks, the approach provides very limited effectiveness. The reason is that the primary opportunity for data reduction is in finding duplicate blocks in two transmitted datasets that are made up mostly - but not completely - of the same data segments. For example, similar data blocks may be present at different offsets in two different datasets. In other words the block boundary of similar data may be different. This is very common when some bytes are inserted in a file, and when the changed file processes again and divides into fixed-length blocks, all blocks appear to have changed.

Therefore, two datasets with a small amount of difference are likely to have very few identical fixed length blocks. Variable-Length Data Segment technology divides the data stream into variable length data segments using a methodology that can find the same block boundaries in different locations and contexts. This allows the boundaries to "float" within the data stream so that changes in one part of the dataset have little or no impact on the boundaries in other locations of the dataset. ROI Benefits Each organization has a capacity to generate data. The extent of savings depends upon but not directly proportional to the number of applications or end users generating data. Overall the deduplication savings depend upon following parameters 1 No. of applications or end users generating data 2 Total data 3 Daily change in data 4 Type of data (emails/ documents/ media etc.) 5 Backup policy (weekly-full daily-incremental or daily-full) 6 Retention period (90 days, 1 year etc.) 7 Deduplication technology in place The actual benefits of deduplication are realized once the same dataset is processed multiple times over a span of time for weekly/daily backups. This is especially true for variable length data segment technology, which has a much better capability for dealing with arbitrary byte insertions.

Contacts Bruno Andrade Product Director HTBase Canada e- Mail: bdeandrade@htbase.com Samuel Ayres Sales Director HTBase Latin America e- Mail: sayres@htbase.com HTBase Canada 140 Yonge St, Toronto, ON Canada contact@htbase.com