Evaluation Guide. Software vs. Appliance Deduplication



Similar documents
Eight Considerations for Evaluating Disk-Based Backup Solutions

ExaGrid Product Description. Cost-Effective Disk-Based Backup with Data Deduplication

ABOUT DISK BACKUP WITH DEDUPLICATION

ExaGrid - A Backup and Data Deduplication appliance

ExaGrid Stress-free Backup Storage

Best Practices Guide. Symantec NetBackup with ExaGrid Disk Backup with Deduplication ExaGrid Systems, Inc. All rights reserved.

Protect Data... in the Cloud

ExaGrid s EX32000E is its newest and largest appliance, taking in a 32TB full backup with an ingest rate of 7.5TB/hour.

I D C T E C H N O L O G Y S P O T L I G H T

Future-Proofed Backup For A Virtualized World!

Optimizing Backup and Data Protection in Virtualized Environments. January 2009

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Business-Centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

The Economics of Backup. 5 Ways Disk Backup with Deduplication Improves Backup Effectiveness, Cost- Efficiency and Data Protection

CISCO WIDE AREA APPLICATION SERVICES (WAAS) OPTIMIZATIONS FOR EMC AVAMAR

Data deduplication is more than just a BUZZ word

Hybrid Business Cloud Backup

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Efficient Backup with Data Deduplication Which Strategy is Right for You?

LDA, the new family of Lortu Data Appliances

Demystifying Deduplication for Backup with the Dell DR4000

Data Deduplication: An Essential Component of your Data Protection Strategy

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Mayur Dewaikar Sr. Product Manager Information Management Group Symantec Corporation

Global Headquarters: 5 Speen Street Framingham, MA USA P F

Barracuda Backup Deduplication. White Paper

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

Redefining Oracle Database Management

Total Cost of Ownership Analysis

EMC NETWORKER AND DATADOMAIN

EMC AVAMAR. Deduplication backup software and system. Copyright 2012 EMC Corporation. All rights reserved.

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

Avamar. Technology Overview

Riverbed Whitewater/Amazon Glacier ROI for Backup and Archiving

How To Store Data On Disk On Data Domain

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007

Barracuda Backup Server. Introduction

How To Protect Data On Network Attached Storage (Nas) From Disaster

Data Backup and Restore (DBR) Overview Detailed Description Pricing... 5 SLAs... 5 Service Matrix Service Description

Protect Microsoft Exchange databases, achieve long-term data retention

Aspirus Enterprise Backup Assessment and Implementation of Avamar and NetWorker

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

Using HP StoreOnce Backup systems for Oracle database backups

EMC DATA DOMAIN PRODUCT OvERvIEW

A Business Case for Disk Based Data Protection

Every organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its

Understanding EMC Avamar with EMC Data Protection Advisor

Turnkey Deduplication Solution for the Enterprise

Maximize Your Virtual Environment Investment with EMC Avamar. Rob Emsley Senior Director, Product Marketing

We look beyond IT. Cloud Offerings

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

DATA BACKUP & RESTORE

Veritas Backup Exec 15: Deduplication Option

NetApp Syncsort Integrated Backup

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup

Cloud Storage Backup for Storage as a Service with AT&T

Disaster Recovery Strategies: Business Continuity through Remote Backup Replication

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

Business Benefits of Data Footprint Reduction

Cloud, Appliance, or Software? How to Decide Which Backup Solution Is Best for Your Small or Midsize Organization.

3Gen Data Deduplication Technical

CIGRE 2014: Udaljena zaštita podataka

Backup and Recovery: The Benefits of Multiple Deduplication Policies

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

Deduplication Demystified: How to determine the right approach for your business

How To Use An Npm On A Network Device

ETERNUS CS High End Unified Data Protection

A CBTS White Paper. Offsite Backup. David Imhoff Product Manager, CBTS 4/22/2012

Presents. Attix5 Technology. An Introduction

VMware vsphere Data Protection

What You Need to Know NOW about Next Generation Data Protection. Kenny Wong Senior Consultant June 2015

EMC BACKUP MEETS BIG DATA

Understanding EMC Avamar with EMC Data Protection Advisor

Actifio Big Data Director. Virtual Data Pipeline for Unstructured Data

Dell PowerVault DL Backup to Disk Appliance Powered by CommVault. Centralized data management for remote and branch office (Robo) environments

Backup Software? Article on things to consider when looking for a backup solution. 11/09/2015 Backup Appliance or

Call: Disaster Recovery/Business Continuity (DR/BC) Services From VirtuousIT

Introduction. Silverton Consulting, Inc. StorInt Briefing

Cloud Services. May 28 th, 2014 Athens, Greece

Dell Data Protection. Marek Istok Ŋ Dell Slovakia

15-MINUTE GUIDE. SMARTER BACKUP Transform your future

Backup and Recovery Solutions for Exadata. Ľubomír Vaňo Principal Sales Consultant

Protecting enterprise servers with StoreOnce and CommVault Simpana

Transcription:

Evaluation Guide Software vs. Appliance Deduplication

Table of Contents Introduction... 2 Data Deduplication Overview... 3 Backup Requirements... 6 Backup Application Client Side Deduplication... 7 Backup Application Deduplication in the Media Server... 9 Purpose-Built Target Side Deduplication Appliances... 11 Summary... 12 About ExaGrid... 13 Evaluation Guide - Software vs. Appliance Deduplication Page 1

Introduction Businesses are facing increasing pressure to fix their backups, as detailed in many sources including the Gartner report, Best Practices for Addressing the Broken State of Backup. Gartner found that for many organizations, backup has become an increasingly daunting and brittle task fraught with significant challenges. The pressure of data growth has increased sharply as businesses need to store both onsite and offsite copies of their data. This can mean storing 40 to 100 times the volume of their primary dataset, due to storing weeks of retention onsite and weeks, months and in some cases, years of retention off site. Longer data retention is driven by business needs, legal discovery requirements, Service Level Agreements (SLA), and many other business or legal reasons. Backup software is just one part of a new equation emerging where near real- time business continuity and disaster recovery are becoming business imperatives. Busy IT staffs looking to replace tape to get relief from backup headaches often find it confusing to understand the strengths and weaknesses of deduplication in the backup software vs. a disk backup appliance. This guide is intended to help sort through this confusion. It presents a general overview of data deduplication and different disk-based backup approaches including: Backup application deduplication in the media server writing to standard disk Backup application deduplication on server agents (client) writing to standard disk Purpose-built target side appliances with deduplication Information about each of these potential solutions is presented in this document, including the pros and cons of each approach. Evaluation Guide - Software vs. Appliance Deduplication Page 2

Data Deduplication Overview One of the few remaining arguments for tape for backups is that tape libraries will technically never "run out of retention capacity". As soon as a tape cartridge fills up, it can be replaced with another tape cartridge and the full cartridges can be stored. When writing to disk, storing the same amount of data that is stored on tape would require a massive amount of disk, resulting in high cost. However, if you could use a fraction of the space required to store the data on disk and bring the cost of disk storage close to the cost of tape, then disk is clearly the better alternative. From Figure 1 - Data Deduplication Taxonomy week to week, only about 2% of the bytes change. However, with tape backup 98% of the unchanged data is backed up repeatedly, resulting in saving the identical data dozens and even hundreds of times. With disk, deduplication software can intelligently save only the 2% of the data that changes from week to week, saving only the changed data. The net result of using disk storage and data deduplication together is you only need 1/20 th to 1/50 th of the storage you would need on tape. Since tape costs about 1/20 th the price of disk per TB of usable capacity, using data deduplication effectively neutralizes the price gap between tape and disk by using far less disk space than is required to store the same data on tape. Evaluation Guide - Software vs. Appliance Deduplication Page 3

There are many methods to data deduplication including: Fixed data block (64KB to 128KB) - used in Backup Software Applications Changed storage blocks - used in primary storage SNAPS Byte level - used in target side appliances Data block with variable content splitting - used in target side appliances Zone-byte level - used in target side appliances All of these methods reduce redundant data in backups. For example, if a full backup of 50TB of data is completed every Friday night, and 10 weeks are kept onsite, it would take 500TB of disk space to store the backup. However, most of the full backup is unchanged from week to week. Only the data that has been changed, edited or created that week needs to be stored. On average, only about 2% of the data changes from week to week. In this example, 2% is about 1TB per week. Figure 2 - Deduplication Reduces Storage over Time Evaluation Guide - Software vs. Appliance Deduplication Page 4

If you were to take out all of the redundant data, over time the storage required can be reduced by as much as 50:1, depending on the deduplication method used. Factors Impacting Deduplication Results In general, the higher the deduplication ratios, the better. A higher deduplication ratio uses less disk space over time and needs far less WAN bandwidth to replicate data to the offsite disaster recover site. Deduplication Approach The deduplication approach selected impacts the amount of storage savings that will result. 64KB to 128KB fixed block will average about 7 to 1 Byte, Segment-block and Zone will average from average from about 20: 1 to 50: 1 reduction in data storage Data Mix Affects Results The deduplication ratio can range from 10: 1 to as much as 50: 1, depending on the mix of data types being backed up. Databases can get very high deduplication ratios of over 100: 1. Unstructured file data will see an average ratio of 7-10:1. Deduplicating compressed or encrypted files does not yield a high ratio or significant space savings. Retention Period The longer the retention period, the higher the deduplication ratio will be. Getting the Best Results The best deduplication ratios will be achieved in environments that are: Using byte, data block or zone-level deduplication Backing up no compressed or encrypted data Retaining data for longer-term periods, on the order of 18 weeks The worst deduplication ratio will be achieved in environments that are: Using 64KB or 128KB fixed block deduplication Backing up a large amount of compressed or encrypted data Retaining data for shorter-term periods, on the order of 4 weeks or less The net is that not all deduplication approaches achieve the same results. Deduplication ratios are clearly impacted by data types and retention periods. All of these factors need to be taken into consideration when choosing the proper disk backup approach. Evaluation Guide - Software vs. Appliance Deduplication Page 5

Backup Requirements The chart below shows the top backup requirements of most IT shops, arranged in priority order. Each of the approaches, including staying with tape, is shown in its own column. As you can see, not all approaches can meet all requirements. The key is to list your requirements and match them against each of the solutions to see which solutions best meet your requirements. The following sections show the strengths and limitations of each of the solutions. Evaluation Guide - Software vs. Appliance Deduplication Page 6

Backup Application Client Side Deduplication Some backup applications offer a form of data deduplication in the application server agents or clients. The intent is to be able to eliminate tape using standard disk along with the backup application. The deduplication occurs at the backup agent/client on each application server. Data deduplication is a very compute-intensive process. Resource utilization will increase significantly if deduplication is run in the application server (client side), and slow down backups dramatically. To minimize this impact, client side deduplication software approaches use a less-efficient form of deduplication. Typically they use 64KB or 128KB fixed blocks where they achieve a data reduction rate of about 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segmentblock with variable length content-splitting average from about a 20:1 to as much as a 50:1 data reduction ratio, or a minimum of approximately three times that of the software approach. Figure 3 - Client Side Deduplication Running a compute intensive deduplication process on your application servers creates other performance and availability challenges. Furthermore, databases and email, which are 80% of the Monday through Thursday backups, are still sent as full backups. This means that only 20% of the nightly data is actually deduplicated, by client side deduplication, during the week. The true impact is on the Friday night full backup, where 80% of the data is unstructured file data. In addition, the software approach to deduplication can only process data that comes from its own proprietary agents. It cannot deduplicate data from other sources including other backup applications, utilities or data base dumps. Strengths Great fit for deduplicating data from small remote sites, then replicating it back to a corporate datacenter for backup. This approach can shorten the backup window, but only on the Friday full backup. During the week, backups are still full backups for data base and email. Weaknesses Requires new agents on servers; added risk and cost of changing agents. Evaluation Guide - Software vs. Appliance Deduplication Page 7

Deduplication ratio is only 6-7:1 and the disk space required increases quickly. Bandwidth usage to a second site is high as the deduplication ratio is only 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-block with variable length content splitting average from about 20: 1 to 50: 1 data reduction ratio, or at a minimum three times that of software deduplication. Cannot deduplicate data from: Veeam, Quest vranger Lightspeed, SQL Safe, Redgate Direct SQL Dumps, Direct Oracle RMAN Dumps Bridgehead for Meditech data Direct UNIX TAR files Other traditional backup applications Summary Very good for replicated remote site data back to a corporate datacenter Very few businesses actually use this approach due to its risk to application servers and weaknesses Evaluation Guide - Software vs. Appliance Deduplication Page 8

Backup Application Deduplication in the Media Server Some backup applications have a data deduplication feature that can be deployed as an agent in the media server. The intent is to be able to eliminate tape using standard disk in conjunction with the backup application. [was a repeat of paragraph in earlier section] As is the case with client side deduplication, because data deduplication is compute- intensive, if deduplication is run in the media server, backups can slow down dramatically and backup windows expand. To avoid this hit to overall backup performance, backup software uses a form of deduplication that results in a lower reduction rate. Using the least possible processor and memory resources for the deduplication process avoids starving the media server tasks of resources, but at the cost of lowering deduplication performance. Figure 4 - Running Deduplication on Media Server Typically this approach uses 64KB or 128KB fixed blocks and will yield a data reduction ratio of about 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segment-block with variable length content-splitting average from about a 20:1 to as much as a 50:1 data reduction ratio, or a minimum of approximately three times that of the software approach. In addition, software deduplication can only process data that comes from its own proprietary agents. It cannot deduplicate data from other sources including other backup applications, utilities or data base dumps. Some vendors bundle the media server software on a storage server that includes a CPU, memory and disk and position the solution as an appliance. This does not change the deduplication rate or the heterogeneous nature of the solution. Strengths Relatively simple to manage through the backup application Good for environments that have less than 3TB of data to backup, use a single backup application and do not plan to replicate to a second site for disaster recovery Evaluation Guide - Software vs. Appliance Deduplication Page 9

Weaknesses Disk usage is high as the deduplication ratio is only 6-7:1. Over time the disk space required grows sharply, and as data grows, software deduplication will use disk at about 3 times the rate of a target-side appliance. Bandwidth needed to send backups to a second site is high as the deduplication ratio is only 6-7:1. By comparison, target-side appliances that use byte, zone-byte or segmentblock with variable length content-splitting average from about a 20:1 to as much as a 50:1 data reduction ratio, or a minimum of approximately three times that of the software approach. Cannot deduplicate data from: Veeam, Quest vranger Lightspeed, SQL Safe, Redgate Direct SQL Dumps, Direct Oracle RMAN Dumps Bridgehead for Meditech data Direct UNIX TAR files Other traditional backup applications Summary Deduplication in the backup software is good for short-term retention and low amounts of data in environments that are not heterogeneous and where offsite disaster recovery data is not required. Evaluation Guide - Software vs. Appliance Deduplication Page 10

Purpose-Built Target Side Deduplication Appliances Target-side deduplication appliances are built specifically to replace the tape library in the backup process onsite and, optionally, offsite. Because they are dedicated appliances, the hardware and the deduplication methods used can be optimized for that single purpose. Future disk space requirements to deal with data growth are drastically reduced because deduplication ratios from 20:1 to as much as 50:1 can be achieved, Only the data that changes, about 2% of the backup size, is replicated offsite and requires far less bandwidth. In addition, target-side appliances can process data from a variety of utilities and backup applications. Strengths No change to your backup environment. Use all backup applications, utilities and dumps you are currently using. Can take in data from: o o o o o o Traditional backup applications Veeam, Quest vranger Lightspeed, Redgate, SQLSafe SQL Dumps, Oracle RMAN dumps Direct UNIX TAR files Many other backup applications and utilities Figure 5 - Target Side Deduplication Appliance 20:1 to as much as 50:1 deduplication ratios use less disk space and far less bandwidth for replication. Special features for: Tracking data to offsite Disaster Recovery Improving Disaster Recovery RPO (recovery point objective) and RTO (recover time objective) Purging data as the retention policy calls for aging out data Weaknesses Backup window improves over using a tape library, but not by as much as client side deduplication for the Friday night full backup Evaluation Guide - Software vs. Appliance Deduplication Page 11

Summary When evaluating different approaches to deploying data deduplication, take the time to ask the right questions and understand the strengths and weaknesses of each alternative. Evaluation Guide - Software vs. Appliance Deduplication Page 12

About ExaGrid ExaGrid offers the only disk-based backup appliance with data deduplication purpose-built for backup that leverages a unique architecture optimized for performance, scalability and price. The combination of post-process deduplication, most recent backup cache, and GRID scalability enables IT departments to achieve the shortest backup window and the fastest, most reliable restores, tape copy, and disaster recovery without performance degradation or forklift upgrades as data grows. With offices and distribution worldwide, ExaGrid has thousands of systems installed at over 1,000 customers and hundreds of published customer success stories and video testimonials. ExaGrid Systems, Inc 2000 West Park Drive Westborough, MA 01581 1-800-868-6985 www.exagrid.com 2012 ExaGrid Systems, Inc. All rights reserved. ExaGrid is a registered trademark of ExaGrid Systems, Inc.