DeltaStor Data Deduplication: A Technical Review



Similar documents
Choosing an Enterprise-Class Deduplication Technology

Maximizing Deduplication ROI in a NetBackup Environment

Improving Efficiency in Tivoli Storage Manager Environments Sepaton S2100 Data Protection Platform

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem

How To Make A Backup System More Efficient

Data Deduplication: An Essential Component of your Data Protection Strategy

E-Guide. Sponsored By:

LDA, the new family of Lortu Data Appliances

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

ESG REPORT. Data Deduplication Diversity: Evaluating Software- vs. Hardware-Based Approaches. By Lauren Whitehouse. April, 2009

Technology Fueling the Next Phase of Storage Optimization

Protect Data... in the Cloud

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski Spała

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

Dell PowerVault DL Backup to Disk Appliance Powered by CommVault. Backup-to-disk and recovery with deduplication

Sales Tool. Summary DXi Sales Messages November NOVEMBER ST00431-v06

The Curious Case of Database Deduplication. PRESENTATION TITLE GOES HERE Gurmeet Goindi Oracle

HP StoreOnce: reinventing data deduplication

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Hitachi Protection Platform

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007

Understanding the HP Data Deduplication Strategy

Turnkey Deduplication Solution for the Enterprise

EMC ISILON OneFS OPERATING SYSTEM Powering scale-out storage for the new world of Big Data in the enterprise

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Lab Benchmark Testing Report. Joint Solution: Syncsort Backup Express (BEX) and NetApp Deduplication. Comparative Data Reduction Tests

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

Data Deduplication and Tivoli Storage Manager

Reducing Backups with Data Deduplication

NetApp Syncsort Integrated Backup

Six approaches to storing more intelligently.

EMC Disk Library with EMC Data Domain Deployment Scenario

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

How To Protect Data On Network Attached Storage (Nas) From Disaster

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup

SYMANTEC NETBACKUP APPLIANCE FAMILY OVERVIEW BROCHURE. When you can do it simply, you can do it all.

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Backup and Recovery: The Benefits of Multiple Deduplication Policies

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

DEDUPLICATION BASICS

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Protecting Information in a Smarter Data Center with the Performance of Flash

Protecting enterprise servers with StoreOnce and CommVault Simpana

I D C T E C H N O L O G Y S P O T L I G H T

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

Accelerating Data Compression with Intel Multi-Core Processors

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

W H I T E P A P E R R e a l i z i n g t h e B e n e f i t s o f Deduplication in a Backup and Restore System

A CommVault Business-Value White Paper Unlocking the Value of Global Deduplication for Enterprise Data Management

ExaGrid - A Backup and Data Deduplication appliance

Keys to Successfully Architecting your DSI9000 Virtual Tape Library. By Chris Johnson Dynamic Solutions International

Understanding EMC Avamar with EMC Data Protection Advisor

Introduction to NetApp Infinite Volume

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Protect Microsoft Exchange databases, achieve long-term data retention

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

3Gen Data Deduplication Technical

Understanding EMC Avamar with EMC Data Protection Advisor

Reduce your data storage footprint and tame the information explosion

EMC Backup Storage Solutions: The Value of EMC Disk Library with TSM

EMC BACKUP MEETS BIG DATA

Deduplication s Role in Disaster Recovery. Gene Nagle, EXAR Thomas Rivera, SEPATON

EMC DATA DOMAIN OPERATING SYSTEM

Creating a Cloud Backup Service. Deon George

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

Data Deduplication Background: A Technical White Paper

Future-Proofed Backup For A Virtualized World!

White Paper. Experiencing Data De-Duplication: Improving Efficiency and Reducing Capacity Requirements

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

Solving Data Growth Issues using Deduplication

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

IBM PROTECTIER: FROM BACKUP TO RECOVERY

Symantec NetBackup deduplication general deployment guidelines

Effective Planning and Use of TSM V6 Deduplication

W H I T E P A P E R E n t e r p r i s e V T L s : S t r a t e g ic for Large-Scale Datacenters

Deduplication has been around for several

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

Archiving, Backup, and Recovery for Complete the Promise of Virtualization

EMC DATA DOMAIN OPERATING SYSTEM

The Business Value of Data Deduplication DDSR SIG

Detailed Product Description

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Data Deduplication and Tivoli Storage Manager

DXi Accent Technical Background

Optimizing Backup and Data Protection in Virtualized Environments. January 2009

Business Benefits of Data Footprint Reduction

June Blade.org 2009 ALL RIGHTS RESERVED

Rapid Data Backup and Restore Using NFS on IBM ProtecTIER TS7620 Deduplication Appliance Express IBM Redbooks Solution Guide

Using EMC SourceOne Management in IBM Lotus Notes/Domino Environments

Aspirus Enterprise Backup Assessment and Implementation of Avamar and NetWorker

Using HP StoreOnce Backup systems for Oracle database backups

Availability Digest. Data Deduplication February 2011

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved.

Hitachi NAS Platform and Hitachi Content Platform with ESRI Image

Data Deduplication HTBackup

Transcription:

White Paper DeltaStor Data Deduplication: A Technical Review DeltaStor software is a next-generation data deduplication application for the SEPATON S2100 -ES2 virtual tape library that enables enterprises to store more data online at a cost that is comparable to physical tape systems. DeltaStor leverages SEPATON s patented ContentAware architecture to enable the industry s fastest backups and restore performance and highest deduplication ratios. By changing the economics of data storage, DeltaStor software allows enterprises to reduce risk, handle exponential data growth, cut backup times, and significantly lengthen online retention periods.

Table of Contents Overview... 1 Next-Generation Technology... 2 Benefits of SEPATON Deduplication Software... 3 Technology Fundamentals... 4 Designed to Meet Enterprise Storage Needs... 8

DeltaStor Software page 1 Overview Exponential data growth and the increasing need for data availability are creating a variety of challenges for enterprise IT departments. These challenges include backing up and restoring rapidly growing data volumes, maintaining compliance with stringent regulatory requirements, meeting increasingly aggressive recovery point objectives and staying within tight budgets. DeltaStor software is a next-generation data deduplication application for the SEPATON S2100-ES2 virtual tape library that enables enterprises to store more data online at a cost that is comparable to physical tape systems. By changing the economics of data storage, DeltaStor software allows enterprises to reduce risk, handle exponential data growth, cut backup times, and significantly lengthen online retention periods for faster restores. DeltaStor software draws on SEPATON s unique ContentAware architecture, which contains built-in intelligence about file content and the backup data relationships of leading backup applications, to deliver unparalleled speed, simplicity, scalability, and data integrity. This whitepaper will discuss the technology fundamentals of the product and how it can be used to save money, save time, and increase efficiency of backup and restore operations.

DeltaStor Software page 2 Next-Generation Technology DeltaStor s fundamental design leapfrogs the capabilities of existing data compression software to provide deduplication that is many times more efficient. Its unique approach to deduplication enables it to identify duplicate data at the byte-level for optimal capacity reduction. Common deduplication technologies, such as hash-based technologies cannot handle the volume or complexity of an enterprise backup efficiently. Without a view into the content of backups, these technologies rely on educated guesses to identify duplicate data. They also impede the backup process itself, potentially jeopardizing backup windows. For enterprises, restoring data with this technology can be time consuming because they have to reconstitute deduplicated data from pointers and fragments. In contrast, DeltaStor software is designed to address the needs of enterprise backup environments with a deduplication solution that ensures highly efficient backup, restore, and deduplication. DeltaStor uses our patented ContentAware to gather and analyze metadata from the backup stream. This enables it to find duplicate data at the byte level for maximum capacity reduction. It also performs deduplication without impeding the SEPATON VTL s industry-leading backup speeds. As a result, DeltaStor ensures that even the largest enterprise backup volumes are backed up at wire speed. The technology is also designed to ensure instantaneous restores by eliminating the need to reconstitute the most recent backup.

DeltaStor Software page 3 Benefits of SEPATON Deduplication Software DeltaStor software allows enterprise storage managers to take advantage of the speed, flexibility and efficiency of disk storage at a cost that is comparable to physical tape. In addition, by storing more data on disk in less physical space than tape, DeltaStor software significantly reduces power, cooling, security and other operating and infrastructure costs. Instant data restores. Data is online and instantly accessible through the random access power of disk. Dramatically faster backups. DeltaStor software performs data deduplication outside of the primary data path, enabling the S2100-ES2 to deliver the industry s fastest backups as much as 30 times faster than tape. Scalability to handle exponential data growth. The S2100-ES2 has a powerful grid architecture that can handle any size backup set. In addition, the S2100-ES2 allows simple, non-disruptive scaling of capacity and performance to let you purchase what you need, when you need it. Scale physical capacity of a single appliance from 10 TB to more than 1.6 PB as much as twice that capacity with the built-in hardware compression enabled. With DeltaStor enabled, you can store tens of petabytes of data on a single S2100-ES2 VTL. Reduce time-consuming tape management tasks. Keeping more data on disk reduces the labor needed to handle tapes, address tape failure issues and manage capacity provisioning. Eliminate physical threats to data. Unlike physical tapes that can be lost, stolen or damaged, data on disk is maintained in a secure, highly available environment. Simplify data management. Adding DeltaStor software is as easy as checking a box in the S2100-ES2 management console. As deduplication reduces data volume, capacity is automatically made available and managed through its built-in selffunctions. Advanced customization. DeltaStor enables you to select the data you want to deduplicate by application, backup server or backup software. You can also tune deduplication performance by data type for optimal efficiency. Retain more data on disk to meet compliance/recovery time requirements. DeltaStor lets you store as much as fifty times more data online without adding new capacity.

DeltaStor Software page 4 Technology Fundamentals SEPATON ContentAware architecture was designed from the ground up to serve as a comprehensive data protection platform. With this architecture, the SEPATON S2100-ES2 virtual tape library appliance contains powerful software, such as the Dynamic Disk File System (DDFS) and the SEPATON I/O Subsystem (SiS) that work with the DeltaStor software to build an intelligent grid-based data protection platform. At the heart of DeltaStor software is SEPATON s patented ContentAware technology. During backup sessions, as data is stored on the disk arrays, this technology captures and analyzes metadata related to individual backup data sets as well as information about each object (Excel spreadsheet, Word doc, data base file, etc.). For example, when the metadata indicates that a relationship exists between objects in two backup sessions, the software performs a more careful comparison of the objects at the byte level. Deduplication Process Overview The DeltaStor deduplication process is conducted concurrently with the backup process. The software manages the scheduling and execution of jobs using all computing power in the appliance in a way that load balances the work. This capability yields unparalleled scalability of the solution to meet the deduplication processing requirements. Additionally, customers can optionally add DeltaStor nodes that provide additional computing power to accelerate deduplication activities. The five phases of the deduplication process (see figure 1) are as follows: object-level comparison, byte-level comparison, pointer assignment, integrity check (optional), space reclamation. 1. Object-Level Comparison This phase narrows the search for duplicates using knowledge of relationships between objects. By doing so, it enables closer examination with less processing required. It also ensures a higher level of accuracy because, unlike hash-based technologies, DeltaStor identifies duplicate data based on actual metadata gathered from the backup stream. Hash-based technologies break data into chunks according to an algorithm designed to guess the location of duplicate data. As a result, data offsets in subsequent backups have to be adjusted for and a significant percentage of duplicate data is not identified. DeltaStor software compares the in-coming backup data to the previous backup. It identifies relationships between objects that indicate the existence of duplicate data. For example, if the filename \root\documents\abc.txt exists twice in backups from the same

DeltaStor Software page 5 client, the DeltaStor software determines that duplicate data is likely and chooses the most efficient means of deduplicating it from the two levels of deduplication. If the data is a modified version of existing data, specific changes will be identified in the next phase (byte-level comparison). If it is an identical duplicate of existing data, then the data duplication will be verified in the next phase. Additional operations, include identifying copies of the same object kept in different places (i.e. different clients, directories, etc.) are also performed. The software creates a list of redundant data, which is sent to the next phase for further analysis. 2. Byte-Level Comparison In this phase, the software performs a byte-level analysis of the objects identified as containing duplicate data in the object-level comparison phase. If the list created in the object-level comparison phase identifies the need for data discrimination, then the software uses a technique called delta differencing to determine which data in the backup set is unique and which data is redundant.

DeltaStor Software page 6 The DeltaStor software efficiently maps changed data to byte-level granularity, and is insensitive to offset or positional changes within the data objects. This makes it capable of locating the redundant data even in related objects that have undergone significant structural changes. If object-level comparison determines that data in the backup set is identical to data in the previous backup at the metadata level, then a data comparison is performed at the byte level in byte-level comparison phase. In this step, the software identifies nonidentical files in which the data has changed even though its metadata stayed the same. Unlike hash-based technologies that only compare hashes in incoming data streams to hashes in an index table, DeltaStor compares actual bytes in the data stream to identify duplicates accurately and efficiently. Figure 1: DeltaStor deduplication process uses built in intelligence about backup data to identify deduplicate data with maximum speed and efficiency while maintaining complete data integrity.

DeltaStor Software page 7 3. Pointers Assigned to Duplicates In this phase, new data is stored and duplicate data that was identified in the previous phases is replaced with pointers to the stored data. Data is stored in a temporary holding cartridge that appears to backup software applications as identical to a native cartridge. However, the volume of data actually stored on this new cartridge is much smaller. The data appears to the backup software as if it is laid out sequentially and not deduplicated. The SEPATON software follows pointers embedded in the file system to read the singleinstance copy of the redundant data along with the unique data stored separately. The result is a deduplicated view of a backup set. Because pointers are continuously updated to indicate data in the most recent backup, DeltaStor does slow performance over time like a hash technology. 4. Integrity Check (Optional) Before any redundant data is removed, the software can be set to perform an additional data integrity analysis. In this phase, the software validates the structure and entirety of data content of the holding cartridge (which represents the DeltaStor deduplicated data) by comparing it to the original representation. 5. Space Reclamation In this phase, the software removes the redundant data from the file system, freeing the previously occupied disk capacity for other uses. The holding cartridge exchanges its place (assuming barcode, slot position, properties, etc) with the original, non-deduplicated version. The software then intelligently frees the duplicate data and moves them back to the free space pool. Once there, any other ongoing data process that requires storage can re-use the space previously occupied by redundant data.

DeltaStor Software page 8 Forward Referencing for Instantaneous Restores A key design feature of the DeltaStor deduplication technology ensures that the most recent backup is always maintained in a form that enables the VTL to restore it immediately. This feature, called forward referencing, uses the most recent full backup as a reference copy and replaces older duplicate data with pointers forward to it. As new backups are completed, it replaces the previous backup as the reference copy. All pointers are adjusted forward to it. As a result, newer, more frequently restored data requires little or no reconstitution Forward Referencing 1. First full backup is stored. 2. Second full backup is stored. Duplicate data identified. Duplicate data 3. Duplicate data in first backup replaced with pointers forward to most recent backup. Reclaimed data First Backup (Oldest) Second Backup (Most Recent) First Backup (Oldest) Second Backup (Most Recent) First Backup (Oldest) 4. Duplicate data in second backup replaced with pointers forward to most recent backup. Pointers in the first backup now point to third, most recent backup. Third Backup (Most Recent) Second Backup First Backup (Oldest)

DeltaStor Software page 9 Designed to Meet Enterprise Storage Needs SEPATON s DeltaStor software is leading the evolution to tapeless data centers by delivering a simple, cost-effective, highly secure way to protect data. DeltaStor software leapfrogs other data reduction technologies to provide a solution that was designed specifically to meet enterprise storage needs. As a result, SEPATON S2100 VTLs with DeltaStor deliver the following benefits: Meet backup windows with the industry s fastest backups Meet restore SLAs and maintain end-user productivity with the industry s fastest restores Save capital expense and data center footprint by storing more data in a single system with enterprise scalability Control data growth with the industry s most efficient deduplication. Copyright 2008. SEPATON, Inc. All rights reserved. SEPATON, S2100 and SRE are registered trademarks and ContentAware, DeltaStor, and Site 2 are trademarks of SEPATON, Inc. Other product and company names mentioned herein are or may be trademarks and/or registered trademarks of their respective companies. deltastor_wp_0308v4