ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

Similar documents
UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

Deduplication s Role in Disaster Recovery. Gene Nagle, EXAR Thomas Rivera, SEPATON

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

PROTECTING DATA IN THE BIG DATA WORLD. Thomas Rivera, Hitachi Data Systems

15 Best Practices For Comparing Data Protection With Cloud Media Server

The Business Value of Data Deduplication DDSR SIG

Thomas Rivera / Hitachi Data Systems

Restoration Technologies. Mike Fishman / EMC Corp.

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

in Transition to the Cloud

Introduction to Data Protection: Backup to Tape, Disk and Beyond

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Thomas Rivera / Hitachi Data Systems. Author: SNIA - Data Protection & Capacity Optimization (DPCO) Committee

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

Understanding Enterprise NAS

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Backup to Tape, Disk and Beyond. Jason Iehl, NetApp

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

WAN Optimization and Thin Client: Complementary or Competitive Application Delivery Methods? Josh Tseng, Riverbed

Today s Agile, Complex and Heterogeneous Data Centers

How To Migrate To A Network (Wan) From A Server To A Server (Wlan)

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

Server and Storage Virtualization with IP Storage. David Dale, NetApp

Trends in Application Recovery. Andreas Schwegmann, HP

How to Cost Effectively Retain Reference Data for Analytics and Big Data. Molly Rector, EVP Product Management & WW Marketing, Spectra Logic

Cloud-integrated Storage What & Why

Creating a Catalog for ILM Services. Bob Mister Rogers, Application Matrix Paul Field, Independent Consultant Terry Yoshii, Intel

3Gen Data Deduplication Technical

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Fundamental Approaches to WAN Optimization. Josh Tseng, Riverbed

EMC Disk Library with EMC Data Domain Deployment Scenario

How it can benefit your enterprise. Dejan Kocic Hitachi Data Systems (HDS)

June Blade.org 2009 ALL RIGHTS RESERVED

SSD and Deduplication The End of Disk?

The Role of WAN Optimization in Cloud Infrastructures

How it can benefit your enterprise. Dejan Kocic Netapp

Cloud-integrated Enterprise Storage. Cloud-integrated Storage What & Why. Marc Farley

Cloud Archive & Long Term Preservation Challenges and Best Practices

Big Data Storage Options for Hadoop Sam Fineberg, HP Storage

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

WHITE PAPER. Dedupe-Centric Storage. Hugo Patterson, Chief Architect, Data Domain. Storage. Deduplication. September 2007

Cloud File Services: October 1, 2014

Dell Data Protection. Marek Istok Ŋ Dell Slovakia

Business Benefits of Data Footprint Reduction

WAN Optimization and Cloud Computing. Josh Tseng, Riverbed

ARCHIVING FOR DATA PROTECTION IN THE MODERN DATA CENTER. Tony Walker, Dell, Inc. Molly Rector, Spectra Logic

Reducing Backups with Data Deduplication

Building the Business Case for the Cloud

EMC DATA PROTECTION. Backup ed Archivio su cui fare affidamento

How To Write On A Flash Memory Flash Memory (Mlc) On A Solid State Drive (Samsung)

Understanding EMC Avamar with EMC Data Protection Advisor

Redefining Microsoft SQL Server Data Management. PAS Specification

High Performance Computing OpenStack Options. September 22, 2015

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

Archive and Preservation in the Cloud - Business Case, Challenges and Best Practices. Chad Thibodeau, Cleversafe, Inc. Sebastian Zangaro, HP

Get Success in Passing Your Certification Exam at first attempt!

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

Understanding EMC Avamar with EMC Data Protection Advisor

Cloud Archiving. Paul Field Consultant

Best Practice and Deployment of the Network for iscsi, NAS and DAS in the Data Center

Rethinking Archiving: Exploring the path to improved IT efficiency and maximizing value of archiving solution investments

Storage Cloud Environments. Alex McDonald NetApp

Deduplication has been around for several

This presentation contains some information about future Veeam product releases, the timing and content of which are subject to change without

SERVER VIRTUALIZATION AND STORAGE DISASTER RECOVERY. Ray Lucchesi, Silverton Consulting

TOP CONSIDERATIONS FOR BUSINESS ONLINE BACKUP

Server and Storage Consolidation with iscsi Arrays. David Dale, NetApp

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup

Cloud OS Vision. Modern platform for the world s apps

EMC AVAMAR. a reason for Cloud. Deduplication backup software Replication for Disaster Recovery

How To Create A Cloud Backup Service

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January Permabit Technology Corporation

Redefining Microsoft SQL Server Data Management

Symantec NetBackup Appliances

Understanding data deduplication ratios June 2008

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Optimizing Backup and Data Protection in Virtualized Environments. January 2009

Redefining Microsoft Exchange Data Management

Data Center Transformation. Russ Fellows, Managing Partner Evaluator Group Inc.

The safer, easier way to help you pass any IT exams. Exam : Storage Sales V2. Title : Version : Demo 1 / 5

Three Virtualization Management Myths Busted. Rich Corley, NetApp

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

Protecting Information in a Smarter Data Center with the Performance of Flash

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

Writing Storage RFP s in John Webster, Evaluator Group

OmniCube. SimpliVity OmniCube and Multi Federation ROBO Reference Architecture. White Paper. Authors: Bob Gropman

Addendum No. 1 to Packet No Enterprise Data Storage Solution and Strategy for the Ingham County MIS Department

DEDUPLICATION BASICS

Introduction to NetApp Infinite Volume

in the Cloud - What To Do and What Not To Do Chad Thibodeau / Cleversafe Sebastian Zangaro / HP

Optimizing Your Storage for Server Virtualization. Chris Lionetti, NetApp

Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved.

NETAPP SYNCSORT INTEGRATED BACKUP. Technical Overview. Peter Eicher Syncsort Product Management

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant

Backup and Recovery Redesign with Deduplication

Transcription:

ADVANCED DEDUPLICATION CONCEPTS Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

About the SNIA DPCO This tutorial has been developed, reviewed and approved by members of the Data Protection and Capacity Optimization (DPCO) committee which any SNIA member can join for free The mission of the DPCO is to foster the growth and success of the market for data protection and capacity optimization technologies 2011 goals include educating the vendor and user communities, market outreach, and advocacy and support of any technical work associated with data protection and capacity optimization Check out these SNIA Tutorials: Understanding Data Deduplication Deduplication s Role in Disaster Recovery 3

Abstract Since arriving on the scene 10 years ago, the adoption of data deduplication has become widespread throughout the storage and data protection community. This tutorial assumes a basic understanding of deduplication and covers topics that attendees will find helpful in understanding today s expanded use of this technology. Topics will include: Trends in vendor deduplication design Practical deduplication of primary storage Using deduplication to reduce storage network traffic Pervasive deduplication across storage tiers Deduplication implications with storage array cache and SSD s Integration of Data Compression and deduplication 4

Terminology Capacity Optimization Methods [Storage System] Methods which reduce the consumption of space required to store a data set, such as compression, data deduplication, thin provisioning, and delta snapshots Data Deduplication [Storage System] The replacement of multiple copies of data at variable levels of granularity with references to a shared copy in order to save storage space and/or bandwidth. Compression [General] The process of encoding data to reduce its size. Lossy compression (i.e., compression using a technique in which a portion of the original information is lost) is acceptable for some forms of data (e.g., digital images) in some applications, but for most IT applications, lossless compression (i.e., compression using a technique that preserves the entire content of the original data, and from which the original data can be reconstructed exactly) is required. 5

Vendor Deduplication Trends Original value and justification has not changed: Satisfy ROI/TCO requirements Manage data growth Increase efficiency of storage and backup Reduce overall cost of storage Reduce network bandwidth Reduce operational costs including: Infrastructure costs required space, power and cooling Movement toward a greener data center Reduce administrative costs 6

Deduplication Scope The scope of deduplication is broadening: Primary Storage Reduced capacity requirement for storage of active data Replication Reduced capacity requirement for DR and business continuity Data Protection Reduced capacity requirement for backup and/or longer retention periods Archivals Reduced capacity requirement for data retention and preservation Movement/Migration of data Reduced bandwidth requirements for data-in-transit 7

Deduplication Savings Expectation Deduplication Savings Depend on Use Case and Time Primary/replicated storage has less duplicate data Periodic archives have moderate duplicate data Repeated backups have significant duplicate data Logical Capacity Dedupe Primary Dedupe Archive Savings Time Dedupe Backup 8

Deduplication and Primary Storage Effective for specific workloads Acceptable performance Post-processing Inline ingestion Primary storage applications with high data redundancy Virtual servers and desktops Collaborative file sharing Email (Software SIS replacement) LAN Deduplicated Data Application Server SAN Primary Storage 9

Primary Storage Array Cache/SSD Array Cache Intelligent cache can be dedupe-aware Hot data is cached with dedupe attributes Reduces rotating media latencies Virtual Desktop boot storms is one example Solid State Drives Deduplication helps offset the higher cost/gb of SSD s High performance applications with highly redundant data will benefit 10

Primary Storage Considerations Balance the tradeoff between cost savings and performance impact VM Clone Copies APP APP DATA DATA OS OS APP DATA OS APP DATA OS APP DATA OS Walk before you run Use estimation tools Perform POCs Implement one workload at a time APP DATA OS VMs After Deduplication Duplicate Data Is Eliminated 11

Deduplication and Replication Remote Sites Primary Data Center Secondary Data Center Deduplicated Data Deduplicated Data Can be one-way, two-way, multi-node, or multi-hop Deduplication points can be determined by Users Bandwidth cost savings / smaller pipes For very small pipes, dedupe can make replication possible 12

Replication Considerations Focus on your Service Level Agreements (SLAs) first Needs to meet window for Replication Needs to meet SLA for System Recovery or Data Restore Is it Necessary to Dedupe All Data? Mission-critical applications May have regulatory issues for some data Some data types not conducive to deduplication 13

Deduplication and Backups The Original Promise Faster data recovery from disk Reduction in D2D cost per terabyte stored Reduction in D2D backup storage footprint Less network bandwidth required for D2D backups What s New? Deduplication embedded in backup software Scalability of deduplication appliances Deduplication across appliances Deduplication to tape 14

Backup Considerations Hardware or software deduplication? Workload burden placement e.g. a hardware appliance or one or more server/storage devices in software. Source or target deduplication? Locates the work-load. Dedupe at source conserves bandwidth. Variable or fixed-length deduplication? Describes the segmentation of data for evaluation and influences the ability to detect data changes. File or sub-file deduplication? Sub-file is more granular, has potentially further reductions but must be consistent with compliance requirements. Answers depend on the problem you are trying to solve 15

Deduplication and Archival Application Servers Primary Storage Deduplicated Archive Deduplication can reduce the cost of online archive repositories These repositories are often required for regulatory compliance No standard exists today for approved use of data reduction techniques with regulatory data Vendors should provide assurances that the ability to retrieve data in its original form is not impaired 16

Deduplication and Compression Dedupe and compression are similar Both are dependant on data patterns Both consume system resources Both can optimize required storage capacity Dedupe and compression are different Some data can only be optimized via dedupe Some data can only be optimized via compression Some data can be optimized via dedupe and compression Some data cannot be optimized at all Dedupe and compression are complementary But some knowledge about the data pattern is required 17

Deduplication and Network Traffic SAN/LAN/WAN Data in Transit Remote Office Servers Application Servers Production Storage DR/Backup Storage Archival Storage Increased SAN/LAN/WAN Efficiency Transfer data references instead of data objects Shorten data transfer times by sending less data Increasing WAN efficiency More applications per pipe 18

Consideration Matrix Application-specific protocol CIFS, NFS, FC, iscsi, VTL WAN Virtual Gateway Backup Software Agent Network Gateway Primary Storage DR/Backup Storage Archival Storage Reduce Network Traffic Reduce Physical Capacity Reduce Backup Time Reduce Recovery Time Reduce Replication Time Reduce Media Latency 19

Summary The original value of deduplication has not changed The scope of deduplication is broadening All storage tiers Bandwidth reduction Regulatory data New use cases and new technologies bring new challenges And new opportunities 20

Refer to the Hands-On Lab Check out the Hands-On Lab Data Deduplication 21

Q&A / Feedback Please send any questions or comments on this presentation to SNIA: trackdatamgmt@snia.org Many thanks to the following individuals for their contributions to this tutorial. - SNIA Education Committee Data Protection and Capacity Optimization (DPCO) Committee: Mike Dutch Larry Freeman Gene Nagle Tom Pearce Thomas Rivera Tom Sas It s easy to get involved with the DPCO! - Find a passion - Join a committee - Gain knowledge & influence - Make a difference www.s nia.org 22