Deduplication and Beyond: Optimizing Performance for Backup and Recovery

Similar documents
Disk-to-Disk Backup and Recovery: A Guide for the Technical Decision Maker

Evaluating EVault Software Backup and Recovery Technology

The EVault Portfolio

FUJITSU Backup as a Service Rapid Recovery Appliance

cloud-con nect ed stor age so lu tion (compound noun)

Reducing Backups with Data Deduplication

Demystifying Deduplication for Backup with the Dell DR4000

WHITE PAPER. How Deduplication Benefits Companies of All Sizes An Acronis White Paper

Secure Your Business with EVault Cloud-Connected Solutions

Presents. Attix5 Technology. An Introduction

Always On: Unitrends Disaster Recovery Services (DRaaS)

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Barracuda Backup for Managed Services Providers Barracuda makes it easy and profitable. White Paper

Uni Vault. An Introduction to Uni Systems Hybrid Cloud Data Protection as a Service. White Paper Solution Brief

Data Deduplication: An Essential Component of your Data Protection Strategy

Protect Data... in the Cloud

Backup Over 2TB: Best Practices for Cloud Backup and DR with Large Data Sets

Cloud and EVault Endpoint Protection Your best friend in Data Protection

Cloud Storage Backup for Storage as a Service with AT&T

DXi Accent Technical Background

TABLE OF CONTENTS. pg. 02 pg. 02 pg. 02 pg. 03 pg. 03 pg. 04 pg. 04 pg. 05 pg pg. 10. Feature-Benefit Summary How It Works. 1

Every organization has critical data that it can t live without. When a disaster strikes, how long can your business survive without access to its

How To Choose Veeam Backup & Replication

Get Success in Passing Your Certification Exam at first attempt!

How To Protect Data On Network Attached Storage (Nas) From Disaster

Turnkey Deduplication Solution for the Enterprise

DATA BACKUP & RESTORE

Things You Need to Know About Cloud Backup

Disaster Recovery Strategies: Business Continuity through Remote Backup Replication

Putting your cloud to work

Protect Microsoft Exchange databases, achieve long-term data retention

Backup Exec Private Cloud Services. Planning and Deployment Guide

Long term retention and archiving the challenges and the solution

Don t be duped by dedupe - Modern Data Deduplication with Arcserve UDP

Solving Data Growth Issues using Deduplication

Deduplication has been around for several

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

Cloud, Appliance, or Software? How to Decide Which Backup Solution Is Best for Your Small or Midsize Organization.

LDA, the new family of Lortu Data Appliances

WHY DO I NEED FALCONSTOR OPTIMIZED BACKUP & DEDUPLICATION?

Take Advantage of Data De-duplication for VMware Backup

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Accelerating Backup/Restore with the Virtual Tape Library Configuration That Fits Your Environment

HIPAA Security Matrix

G-Cloud 6 Service Definition DCG Cloud Backup Service

Backup and Recovery. Introduction. Benefits. Best-in-class offering. Easy-to-use Backup and Recovery solution.

Veritas Backup Exec 15: Deduplication Option

FAQ RIVERBED WHITEWATER FREQUENTLY ASKED QUESTIONS

Best Practices for Protecting Laptop Data

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

EOH Cloud Services - EOH Cloud Backup - Server

Dell PowerVault DL Backup to Disk Appliance Powered by CommVault. Centralized data management for remote and branch office (Robo) environments

Virtualization Support - Real Backups of Virtual Environments

Complete Storage and Data Protection Architecture for VMware vsphere

Service Overview CloudCare Online Backup

Best Practices for Using Symantec Online Storage for Backup Exec

Online Backup Plus Frequently Asked Questions

Creating a Cloud Backup Service. Deon George

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

ESG REPORT. Data Deduplication Diversity: Evaluating Software- vs. Hardware-Based Approaches. By Lauren Whitehouse. April, 2009

Data Backup and Restore (DBR) Overview Detailed Description Pricing... 5 SLAs... 5 Service Matrix Service Description

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski Spała

Key Considerations and Major Pitfalls

Hybrid Business Cloud Backup

How To Create A Large Enterprise Cloud Storage System From A Large Server (Cisco Mds 9000) Family 2 (Cio) 2 (Mds) 2) (Cisa) 2-Year-Old (Cica) 2.5

IBM Storage Management within the Infrastructure Laura Guio Director, WW Storage Software Sales October 20, 2008

Only 8% of corporate laptop data is actually backed up to corporate servers. Pixius Advantage Outsourcing Managed Services

Solutions for Encrypting Data on Tape: Considerations and Best Practices

Talk With Someone Live Now: (760) One Stop Data & Networking Solutions PREVENT DATA LOSS WITH REMOTE ONLINE BACKUP SERVICE

REMOTE BACKUP-WHY SO VITAL?

Backup Exec 15: Deduplication Option

Backup Exec 2014: Deduplication Option

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007

Backup Software? Article on things to consider when looking for a backup solution. 11/09/2015 Backup Appliance or

Best Practices Guide. Symantec NetBackup with ExaGrid Disk Backup with Deduplication ExaGrid Systems, Inc. All rights reserved.

We take care of backup and recovery so you can take care of your business. INTRODUCING: HOSTED BACKUP

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Backup and Recovery. Backup and Recovery. Introduction. DeltaV Product Data Sheet. Best-in-class offering. Easy-to-use Backup and Recovery solution

EVault Technology Build Cloud-Connected Backup and Recovery Services for Datacenter

Mayur Dewaikar Sr. Product Manager Information Management Group Symantec Corporation

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

EMC CLOUDARRAY PRODUCT DESCRIPTION GUIDE

Business Benefits of Data Footprint Reduction

Complete Data Protection & Disaster Recovery Solutions

Future-Proofed Backup For A Virtualized World!

Restoration Technologies. Mike Fishman / EMC Corp.

HyperQ DR Replication White Paper. The Easy Way to Protect Your Data

S O L U T I O N P R O F I L E. Riverbed and EMC Deliver Capacity-Optimized Cloud Storage for Backup, Recovery, Archiving, and DR

How To Use An Npm On A Network Device

Data Deduplication Background: A Technical White Paper

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

Backup and Recovery: The Benefits of Multiple Deduplication Policies

E-Guide. Sponsored By:

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Understanding EMC Avamar with EMC Data Protection Advisor

ExaGrid Product Description. Cost-Effective Disk-Based Backup with Data Deduplication

Data Deduplication and Tivoli Storage Manager

Barracuda Backup Deduplication. White Paper

Transcription:

Beyond: Optimizing Gartner clients using deduplication for backups typically report seven times to 25 times the reductions (7:1 to 25:1) in the size of their data, and sometimes higher than 100:1 for file system data or server virtualized images when data deduplication is used. Hype Cycle for Storage Technologies Gartner Research Report (2010) The real point: gain optimized performance with minimal sacrifice. With cost cutting so important for most organizations, data deduplication and the allure of significantly reduced storage and bandwidth usage has become the hot topic in data protection circles. But deduplication is just one component of performance optimization for data protection systems. This paper describes the benefits and potential risks of three optimization technologies commonly used for disk-based backup and recovery, enabling you to more confidently optimize your own data protection environment in the following ways: Reducing costs through optimized use of your backup storage Minimizing backup data transmitted and the amount of network bandwidth consumed Backing up and recovering data quickly and reliably Using encryption with less impact on performance Key Optimization Technologies: Deduplication, Compression, Encryption Data deduplication, compression, and encryption are three key interrelated technologies used for efficiently and securely transferring and storing data. Each vendor s approach has implications for an organization s environment. Some vendors focus heavily on only one method or technology, which creates additional risks. Understanding each of these technologies will help you ask the right questions as you evaluate specific solutions. Evaluating Deduplication Data deduplication is one of the primary optimization technologies used by today s disk-based data protection vendors. Deduplication reduces the overall size of backup data (or backup data footprints ) to be transmitted or stored to a secondary disk target by identifying and then removing or omitting any duplicate data in an organization s backup data sets. Duplicate data removed or omitted is often replaced by some type of pointer to the original file or data block. The following questions highlight areas to be aware of when exploring different deduplication implementations. www.i365.com 2010 i365, Inc. All Righs Reserved.

Six Key Questions to Ask About Deduplication 1) What type of data deduplication is used in the backup process? There are two main types of deduplication used in disk-based backup and recovery. Time-based deduplication This method compares today s planned backup data against the last backup job s data. Because most data remains unchanged from backup to backup, time-based deduplication finds, transmits, and stores only the new blocks or differences (deltas) found since the last backup. Horizontal deduplication This type of deduplication compares differences and similarities across data targeted for backup, either within a single data set or across a set of volumes containing backup data. The comparison can happen between two or more files or two or more data blocks. If two or more duplicate files or blocks are found, the solution will retain one copy of the file/block and replace the other file/block with pointers to the retained copy. 2) At what point in the backup process does deduplication occur? Deduplication typically occurs at the following junctures of a backup process. Source-side deduplication (via server-side software or an inline appliance) This process typically occurs prior to the backup data being compressed and transmitted to its ultimate target. Watch for impact on CPU and network resources, nightly backup or hot backup windows, and recovery time objectives. Also inquire about the relative ease of restoring data, and the need for specialized hardware to sustain performance. Back-end deduplication Often referred to as post-process deduplication, this usually occurs on the backup disk target after a backup process has already been completed. Watch for negative impact on LAN/WAN resources when sending larger backup data sets over the wire, and the ease of restoring individual files or full data sets. Mixed source-side and back-end deduplication Some solutions blend sourceside and back-end deduplication, which can result in additional footprint savings. 3) What is the size of data processed for deduplication? Backup solutions can deduplicate different data sizes, which can impact the optimization and efficiency of the solution. File-level deduplication This method finds duplicate files only, which can be simpler and less resource-intensive than other methods. But it can require significantly more storage space than more granular methods; small changes to one file require backing up the entire changed file, instead of just backing up the changed portion of the file. Block-level deduplication Blocks are chunks of data that are smaller than a file, but there is wide variation in the block sizes different vendors deploy with deduplication. Overall, block-level deduplication will save more storage space for backup than file-level deduplication. Watch for performance impact based on smaller, more granular block sizes in use; the smaller the size, the greater the performance impact on production systems. Check for overhead and extra CPU resources needed to process and manage data blocks. Additionally, oversized blocks may adversely affect database backups. Some database page or chunk sizes may perform better when the deduplication process uses 32K or 64K data blocks (which can be more readily compressed), although deduplication savings may be reduced.

Byte-level deduplication The most granular data size and potentially the largest storage savings. Watch for a large performance hit and data fragmentation of large, growing files that now may be dispersed across multiple disk targets or backup tapes. Also watch for any validation and recovery checks needed, and more time needed to rebuild or restore when something goes wrong. 4) Where is the deduplicated data stored? Deduplicated data can be stored on a local disk target (located on premise), transmitted to a remote disk target (either at a physical site or a third-party cloud or data center), or sent to a disk-based virtual tape library before being offloaded to tape. Watch for ease of restoring data, especially in tape environments where data may be fragmented, offsite, dispersed, and subject to loss. 5) How will deduplication impact performance? In some cases, the deduplication process may reduce performance and extend your backup window. Watch for performance bottlenecks and lack of optimization for LAN or WAN implementations, especially for solutions that must first transmit full backup data sets prior to deduplication. Also watch for over-dependence on local deduplication hardware, such as the need to transmit deduplicated data to a local appliance prior to replicating it over the WAN. Repeated checks and verifications of local nodes to ensure existence of deduplicated blocks as part of each backup process may also slow performance. 6) What are the costs to deploy or grow the deduplication environment? Some solutions require licensed hardware nodes or controllers; keep in mind that as the data environment grows, the need for extra nodes may grow as well. Watch for excessive investment in underlying hardware to support deduplication. With disk prices continuing to drop, you may want to contrast regaining the most storage space against the ROI you ll gain from any one solution over another. Evaluating Compression Data compression is often performed as part of deduplication, or somewhere else on the front end. Its purpose is to compress backup data before the data is transmitted across a LAN or WAN connection, thereby reducing impact on your network traffic and storage requirements. Three Key Questions to Ask About Compression 1) How will compression impact the rest of the production environment? Compression reduces the amount of backup data being pushed across your network, so it can reduce the amount of network bandwidth needed, which may allow room for other common network activities. However, data compression may soak up other computational resources to compress and uncompress the data. 2) How much can you control the bandwidth used for backup? Compression doesn t operate in isolation from the rest of your environment. Watch for how well it allows you to adjust the amount of bandwidth used for backup. Some solutions may be set a certain way for maintaining backup speed with less customer control.

3) How well does compression technology adapt to the amount of available bandwidth in your environment? Some networks have large amounts of bandwidth available for backup, while others are seriously bandwidth-constrained, and bandwidth may vary during off-hours and peak production times. Explore whether transmission processes can adapt to different bandwidths or spikes in usage, or let you define how much bandwidth may be used for backup during various times. Evaluating Encryption Encryption is key to any data protection solution. Many data breaches occur when unencrypted backup data ends up in the wrong hands, either accidentally or intentionally. In addition, regulatory obligations require organizations to use encryption. Ask the right questions to optimize its use and ensure the process meets all your security goals. Two Key Questions to Ask About Encryption 1) Where in the backup process can encryption be applied? Determine where you are most likely to need your backup data encrypted and how well the vendor solution meets that need. Is backup data encrypted at the start of a backup job? As it is transmitted over the wire (in flight)? While the data resides at rest on the target disk device? Some vendors offer only one or two encryption options. Some offer all three. 2) Who holds the keys? In information security circles, a Trust No One security paradigm means that only the customer holds and manages the encryption key and is capable of decrypting backup data. Your data will be more secure if your vendor follows this paradigm even when the data has been replicated or transmitted to a data center for added retention and off-site disaster recovery. Learn how security roles and encryption key safeguards are applied. EVault: Meeting Customer Needs for End-to-End Optimization EVault is a highly efficient, adaptable backup and recovery solution that is fast and easy to use, affordable to implement and grow, with a simple architecture for lowkey maintenance. EVault offers a number of technologies and strategies to optimize the backup process at the source, in transit, and at rest. The result is end-to-end efficiencies that minimize traffic and reduce data footprints, striking the right balance between security, efficiency, affordability and adaptability. The EVault Approach to Deduplication, Compression, Encryption EVault s WAN-optimized data transfer method uses minimal bandwidth and produces the smallest possible local footprint. Its simplified, single recovery path avoids use of differentials and incrementals as part of the recovery process. Building synthetic full backups dynamically each time a backup job arrives at the vault, EVault restores even deduplicated data from its disk-based storage pools in a straightforward way. EVault s time-based deduplication technology has been actively developed and refined for efficiency gains, both over the wire and at rest, since 1997 long before deduplication took center stage. EVault ensures the best use of network resources, while its encryption approach has also proven effective and secure for a wide range of customers. The result is a balanced, affordable solution that readily adapts to an organization s needs no overhauls, costly upgrades, or re-architecting required.

EVault Backup and Recovery Highlights Time-based deduplication with extremely low up-front processing time to reduce backup windows Source-side and back-end, blocklevel deduplication to minimize backup footprints Adaptive compression and dynamic bandwidth throttling to optimize LAN/ WAN network performance End-to-end data encryption with a strict Trust No One security approach and choice of 128- or 256-bit AES encryption to provide the highest level of security Balanced approach to data protection offering speed, efficiency, affordability and adaptability Figure 1. EVault data transfer process, including two-part deduplication, compression and encryption for optimized backup, transmission and data storage. EVault Optimized at the Source and In Transit: Time-based Deduplication with Adaptive Compression EVault can dramatically reduce the amount of backup data transmitted to the disk target, or vault. EVault technology has been continually enhanced and field-proven to provide the best balance between storage optimization and backup performance. EVault uses a source-side, time-based deduplication technology that searches, finds, and transmits only those new or changed data blocks since the last backup job. This up-front functionality offers storage capacity savings of upwards of 50:1 over traditional file-based backup methods. Some solutions that claim to process block-level changes are working with blocks closer in the size to traditional files. In contrast, EVault works with 32KB data blocks, the most efficient block size for reducing the amount of data to be transferred while speeding block-level processing. Day 1 Initial Seed Backup A C B D System with EVault Agent Day 2 Backup A C E B D System with EVault Agent A B C D E All New Data Blocks Sent to the EVault Director Vault Adaptive Compression Reduces Size of Blocks Transmitted by an Average of 50% A B C EVault Director D QFS Process Identifies/Sends New Data Block E to the EVault Director Adaptive Compression Reduces Size of Block Transmitted by an Average of 50% A B C D E EVault Director Figure 2. Example of EVault deduplication techniques and Adaptive Compression.

EVault incorporates a number of optimized processes to ensure rapid, front-end processing and reductions in data footprints. Quick File Scan (QFS) Patented process rapidly scans files on every system or server containing an EVault agent to identify any data blocks added or changed since the prior backup. Up-front processing time is low, ensuring optimal backup windows. Adaptive Compression Reduces data blocks transmitted by an average of 50 90 percent. EVault selects the best compression algorithm based on available CPU and network bandwidth. Enhanced CPU Utilization Automatically splits backup jobs across multiple CPUs. Self-Healing Functionality Automatically recreates the delta index file if it is corrupted or missing. This way, EVault continues to function and identify only new or changed blocks. Additional EVault optimizations include the following. Dynamic Bandwidth Throttling Enables customers to control the amount of network bandwidth used for backup jobs. Especially useful for more frequent backups of more critical data, or for environments with limited bandwidth available. Backup/Restore Transfer Protocol (BRTP) This custom-developed, secure protocol runs transparently on top of TCP/IP. BRTP offers many levels of error checking, error recovery, and connection recovery for ensured data integrity and continuation, or resumption, of backup sessions after a network outage. If a backup can t be completed, data backed up that far is valid. BRTP also ensures data encryption in transit. EVault Post-Process Deduplication Technology for Additional Optimization Beyond its front-end capabilities to reduce backup data footprints, EVault goes one step further by also performing deduplication at the target. This ensures backup jobs run optimally while looking for duplicate data within each backup job pool that may have been renamed, but was not subsequently identified as a duplicate during the source-side deduplication process. This back-end deduplication can help reclaim an additional 20 percent of storage space on the vault. Data deduplication at the target uniquely identifies matching blocks across all files in a backup job and eliminates duplicates. This back-end process does not adversely impact the backup window or slow performance of backup jobs. Other target-level EVault optimization technologies include the following: Storage Pool Optimization Enables longer data retention in smaller footprints. Storage pool management includes: Automatic removal/reclamation of expired blocks Per the customer s retention policy, the oldest backup blocks no longer required are deleted or moved to a different storage tier, with related storage space reclaimed. Defragmentation of the vault data This further tunes the storage pool. Verification of backup data integrity Migration to Secondary Storage Pools EVault can automatically migrate or copy data to less expensive, secondary storage pools. This functionality also enables agents to recover data directly from a secondary pool. Some organizations may choose to store long-term archival backups on a less-costly system that can be detached and turned off when not in use to cut utility expenses.

EVault End-to-End Encryption Technology EVault customers enjoy the ability to easily encrypt all aspects of their backup data transactions, end to end from the point of backup job creation through data transmission to data at-rest on the vault. Role-based security options enable EVault customers to assign the ultimate authority to restore, encrypt, and decrypt data, or perform other backup and restore functions. Only the EVault customer holds the encryption key. Once the customer sets the encryption password and settings during a backup job, backup data cannot be decrypted without the same encryption password. No i365 personnel ever have access to a customer s encryption key. Maintaining a strict Trust No One approach to encryption ensures that, even as i365 manages data in the EVault cloud or at the customer s offsite location, no i365 personnel can ever view the encrypted data. With EVault, customers can choose to encrypt their backup data at different phases of the backup process: at creation, in transit, and at rest. When configuring backup jobs, customers can choose from either 128-bit or 256-bit AES encryption. Encrypted data also remains encrypted while in the vault. Most data is transmitted as block-level changes, offering further security during transmission. Customers transmitting data across external WAN connections can choose 128-bit AES encryption. Conclusion A vendor s approach to data deduplication, compression and encryption has implications for an organization s data protection environment. EVault has been carefully designed and improved throughout the years to offer the best mix of optimization technologies resulting in a balanced, highly efficient, adaptable solution that is fast and easy to use, affordable to implement and grow, with a simple architecture that doesn t require significant extra investments. EVault data protection technology helps over 30,000 customers protect over 35 petabytes of data and has become a successful, fieldproven example of what works when it comes to affordable, efficient disk-based data protection and recovery. For More Information To learn more about i365 and EVault storage solutions, please: Visit us at www.i365.com Email us at concierge@i365.com In North America, call us at 1.877.901.DATA (3282) In France, call us at +33 (0) 1 55 27 35 24 In Germany, call us at +49 (0) 89 28890 434 In the Netherlands, call us at +31 (0) 20 6556 474 In the United Kingdom, and elsewhere in Europe, call us at +44 (0) 8452 585 500 i365 marks are either trademarks or registered trademarks of i365 Inc. or one of its affiliated companies in the United States and/or other countries. All other trademarks or registered trademarks are the property of their respective owners.