File Systems for Data Protection. Windsor Hsu, Ph.D. EMC

Similar documents
Trends in Enterprise Backup Deduplication

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

Get Success in Passing Your Certification Exam at first attempt!

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM

EMC BACKUP AND RECOVERY SOLUTIONS

EMC DATA DOMAIN PRODUCT OvERvIEW

ADVANCED DEDUPLICATION CONCEPTS. Larry Freeman, NetApp Inc Tom Pearce, Four-Colour IT Solutions

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

Restoration Technologies. Mike Fishman / EMC Corp.

Data Center Energy Solutions

ClearPath Storage Update Data Domain on ClearPath MCP

Deduplication s Role in Disaster Recovery. Gene Nagle, EXAR Thomas Rivera, SEPATON

EMC DATA DOMAIN OVERVIEW

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

Deduplication s Role in Disaster Recovery. Thomas Rivera, SEPATON

GIVE YOUR ORACLE DBAs THE BACKUPS THEY REALLY WANT

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

Common. Next Generation Backup for IBMi. EMC Laurent Piguet Peter Wirth. Crowne Plaza 24 Septembre 2013

Introduction to Data Protection: Backup to Tape, Disk and Beyond. Michael Fishman, EMC Corporation

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

Trends in Data Protection and Restoration Technologies. Jason Iehl, NetApp

EMC Backup/Restore and Data Deduplication

EMC BACKUP MEETS BIG DATA

A Deduplication File System & Course Review

Long term retention and archiving the challenges and the solution

Solid State Storage in a Hard Disk Package. Brian McKean, LSI Corporation

Data Backup and Archiving with Enterprise Storage Systems

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

DPAD Introduction. EMC Data Protection and Availability Division. Copyright 2011 EMC Corporation. All rights reserved.

Understanding Enterprise NAS

Protect Data... in the Cloud

ARCHIVING FOR DATA PROTECTION IN THE MODERN DATA CENTER. Tony Walker, Dell, Inc. Molly Rector, Spectra Logic

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Cost Effective Backup with Deduplication. Copyright 2009 EMC Corporation. All rights reserved.

Accelerating Applications and File Systems with Solid State Storage. Jacob Farmer, Cambridge Computer

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

Oracle Data Protection Concepts

Backup and Recovery Redesign with Deduplication

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

How To Protect Data On Network Attached Storage (Nas) From Disaster

High Performance Computing OpenStack Options. September 22, 2015

LEVERAGING FLASH MEMORY in ENTERPRISE STORAGE. Matt Kixmoeller, Pure Storage

EMC DATA DOMAIN EXTENDED RETENTION SOFTWARE: MEETING NEEDS FOR LONG-TERM RETENTION OF BACKUP DATA ON EMC DATA DOMAIN SYSTEMS

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

EMC Data de-duplication not ONLY for IBM i

VDI Optimization Real World Learnings. Russ Fellows, Evaluator Group

Trends in Application Recovery. Andreas Schwegmann, HP

Demystifying Deduplication for Backup with the Dell DR4000

Backup and Recovery 1

How it can benefit your enterprise. Dejan Kocic Netapp

An Introduction to Storage Management. Raymond A. Clarke, Oracle

Protect Microsoft Exchange databases, achieve long-term data retention

15 Best Practices For Comparing Data Protection With Cloud Media Server

How it can benefit your enterprise. Dejan Kocic Hitachi Data Systems (HDS)

SSD and Deduplication The End of Disk?

Scale and Availability Considerations for Cluster File Systems. David Noy, Symantec Corporation

Backup to the Future. Hugo Patterson, Ph.D. Backup Recovery Systems, EMC

Writing Storage RFP s in John Webster, Evaluator Group

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

Turnkey Deduplication Solution for the Enterprise

Data Domain & Deduplication Basics 101

Reduced Complexity with Next- Generation Deduplication Innovation

Fundamental Approaches to WAN Optimization. Josh Tseng, Riverbed

Backup to Tape, Disk and Beyond. Jason Iehl, NetApp

ETERNUS CS High End Unified Data Protection

Protecting enterprise servers with StoreOnce and CommVault Simpana

EMC BACKUP AND RECOVERY SOLUTIONS

Introduction to Data Protection: Backup to Tape, Disk and Beyond

WHITE PAPER. Dedupe-Centric Storage. Hugo Patterson, Chief Architect, Data Domain. Storage. Deduplication. September 2007

Building the Business Case for the Cloud

Sales Tool. Summary DXi Sales Messages November NOVEMBER ST00431-v06

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Latency: The Heartbeat of a Solid State Disk. Levi Norman, Texas Memory Systems

Disk Library for mainframe - DLm6000 Product Overview

Archive and Preservation in the Cloud - Business Case, Challenges and Best Practices. Chad Thibodeau, Cleversafe, Inc. Sebastian Zangaro, HP

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

Quantum DXi6500 Family of Network-Attached Disk Backup Appliances with Deduplication

Next Generation Backup Solutions

Using HP StoreOnce Backup systems for Oracle database backups

EMC Disk Library with EMC Data Domain Deployment Scenario

Sujee Maniyam, ElephantScale

CIGRE 2014: Udaljena zaštita podataka

How To Write On A Flash Memory Flash Memory (Mlc) On A Solid State Drive (Samsung)

EMC Backup and Recovery Product Portfolio

Cloud Archive & Long Term Preservation Challenges and Best Practices

PARALLELS CLOUD STORAGE

Backup and Recovery: The Benefits of Multiple Deduplication Policies

Data Center Storage... Spend Less, Deliver More. Hubbert Smith, LSI

Rethinking Archiving: Exploring the path to improved IT efficiency and maximizing value of archiving solution investments

Today s Agile, Complex and Heterogeneous Data Centers

Data Protection as Part of Your Cloud Journey

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Business-Centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Transcription:

File Systems for Data Protection Windsor Hsu, Ph.D. EMC

SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA unless otherwise noted. Member companies and individual members may use this material in presentations and literature under the following conditions: Any slide or slides used must be reproduced in their entirety without modification The SNIA must be acknowledged as the source of any material used in the body of any document containing material from these presentations. This presentation is a project of the SNIA Education Committee. Neither the author nor the presenter is an attorney and nothing in this presentation is intended to be, or should be construed as legal advice or an opinion of counsel. If you need legal advice or a legal opinion please contact your attorney. The information presented herein represents the author's personal opinion and current understanding of the relevant issues involved. The author, the presenter, and the SNIA do not assume any responsibility or liability for damages arising out of any reliance on or use of this information. NO WARRANTIES, EXPRESS OR IMPLIED. USE AT YOUR OWN RISK. 2

Abstract File Systems for Data Protection There is a strong industry trend to adopt disk-based platforms for data protection. But not all such systems were designed to satisfy the requirements for data protection. This tutorial will explain some of the differences between a general purpose file system and a file system designed for data protection. Participants will appreciate that file system targeted at data protection has unique requirements understand at an overview level the different techniques for satisfying these requirements and the operational implications become equipped to make informed decisions about selecting a file system for data protection 3

Data Protection Transformation WAN Backup Clients Backup/ media servers Onsite Retention Storage Replication Offsite Disaster Recovery Storage Deduplicating storage systems take role of tape libraries Replication of reduced data for WAN Vaulting 4

Tape Use Decline 70 60 59.3 Impact on Use of Tape among Dedupe Backup Users Reduced / will reduce the amount of tape used % of Respondents 50 40 30 20 10 0 Tape Reduced 32.4 Tape Gone 5.4 1 2 Eliminated or will eliminate tape No impact on tape use Increased / will increase use of tape Don't use tape Note: Notes: N=204, data is not weighted, Base=All respondents who have seen/expect to see specific results with tape reduction/elimination Source: IDC Deduplication Study, November 2009 5

Tape Automation Market Decline Tape use in re-evaluation Tape Automation Factory Revenue % of External Storage Factory Revenue Restore is moving to disk 1999: 11.9% DR will be from WANs Source: IDC Tape Automation Worldwide Forecast 2010-2014, April 2010 6

Protection Storage versus Primary Storage Workload Primary Storage Continuous random accesses Mostly reads Lots of meta-data accesses Protection Storage Large batches of accesses Mostly writes Few meta-data accesses Cost Dollars / IOPS Dollars / TB Performance Latency and throughput, IOPS Sequential throughput Safety Backup to protection storage Storage of last resort 7

Storage for Data Protection Cost (Low $/TB) Compete with tape Think Tier Z Performance (High Sequential Throughput) Finish backups within shrinking backup window Scale and Scalability Effectively handle large and growing data sets Safety Must work, last line of defense against data loss 8

Cost: Low $/TB

Data Reduction Regular Storage Array 1:1 LZ Compression ~ 2:1* Single Instance Storage ~ 3:1* Fixed Block ~ 3:1* Variable-size Segment (Data Deduplication) ~ 20:1* For backups, data deduplication achieves massive data reduction and savings in - Replication WAN Bandwidth - Power - Heat - Cooling - Management * Typical data reduction rate for backups; actual rate depends on data and algorithm 10

Data Deduplication: Technology Overview Friday Full Backup A B C D A E F G Mon Incremental A B H Backup Estimated Data Logical Reduction Physical FRIDAY FULL 1 TB 2 4x 250 GB Tues Incremental C B I Monday Incremental 100 GB 7 10x 10 GB Weds Incremental E G J Thurs Incremental A C K Second Friday Full Backup B C D E F L G H A B C D E F G H I J K L Tuesday Incremental 100 GB 7 10x 10 GB Wednesday Incremental 100 GB 7 10x 10 GB Thursday Incremental 100 GB 7 10x 10 GB Second FRIDAY FULL 1 TB 50 60x 18 GB TOTAL 2.4 TB 7.8x 308 GB 11

Data Deduplication: Store More for Longer with Less Week 1 Week 2 Rule of Thumb: Backup Cumulative Estimated Physical Data Logical Reduction First Full 1 TB 4x 250 GB April 7 2.4 TB 8x 308 GB April 14 3.8 TB 10x 366 GB Week 3 April 21 5.2 TB 12x 424 GB Physical capacity needed for 3 months of data protection ~ logical size of data protected Month 1 April 28 6.6 TB 14x 482 GB Month 2 Month 3 May 31 12.2 TB 17x 714 GB June 30 17.8 TB 19x 946 GB Month 4 July 31 23.4 TB 20x 1,178 GB TOTAL 23.4 TB 20x 1,178 GB 12

Data Deduplication: Store More for Longer with Less 10 30 times less data stored versus fulls + incrementals with typical retention policies Data Stored 1 5 10 15 20 Weeks in Use Deduplication storage Traditional storage 13

Inline vs Post-Process Deduplication INLINE Deduplication Before Storing POST- PROCESS Deduplication After Storing Deduplication Store Deduplication 3x disk accesses to shared store Other activities unimpeded Predictable Simpler Current backup before dedupe 3 months of deduped backups 2x capacity The more processes, the more resource contention Copy to tape: Too slow to stream tape Recovery: Unpredictability in meeting SLA Replication: Poor time-to-disaster-recovery Deduplication: Delayed if interleaved with backup/restore More administration to fight these issues 14

Performance: High Sequential Throughput

Speed Matters 400 12 Hour Backup Window Data Protected (TB) 300 200 100 Faster protection storage system can protect larger data sets 5 10 15 20 25 Throughput (TB/Hour) 30 16

Challenge in Fast Deduplication Bottleneck is in looking up index to determine whether segment is already in system Each disk: 1 MB/sec E.g. for a 7.2K RPM SATA drive: <120 seeks/sec 120 seeks/sec with 8 KB segments: 0.96 MB/sec/disk 17

How to Dedupe Fast? Parallel disk lookups Need 1000 disks to go 1 GB/sec Fast disks for index of segments already in system Use FC disks, short-stroke disks, use SSDs, etc Index in RAM RAM needs to be ~1% of storage Cache index in RAM Needs temporal or spatial locality to work Stream Informed Segment Layout (SISL) Segments tend to arrive in same order so store segments in order received 18

Scale and Scalability

Proportionate Performance Scaling 4 weeks of retention Performance 8 weeks of retention For a fixed retention period, performance must scale with capacity Capacity 20

CPU-Centric versus Spindle-Bound 1,500 CPU-Centric Architecture Throughput scales with CPU performance (e.g. SISL-based) Throughput MB/s With FC disks Spindle-Bound Throughput gated by disk IOPS With SATA Disks 50 50 100 150 200 Number of Disk Spindles Data sets are growing rapidly CPU-centric architecture provides required scalability 21

Challenge of Large Consolidated Data Centers Media Manager Pool... Dedupe Pools Days 1-3 Day 4... Capacity Used Day 1 Day 2 Day 3 Day 4 If backup rerouted to new dedupe domain, capacity required is same as first full 22

Large Dedupe Domain Best practice: Send same data set consistently to same dedupe domain Challenge: As data set grows, difficult to rebalance workloads among dedupe domains Solution: Large dedupe domain Results: Simplified capacity management Higher storage utilization Higher backup success rate 23

Safety

Storage of Last Resort When primary storage fails Restore from backup When backup fails No copy to restore from Data is lost 25

Design for Data Protection Make sure data is stored correctly when originally backed up Make sure new backups do not put old backups at risk Make sure old backups stay correct and recoverable Be prepared for the worst, design the system for recoverability Maintain a copy in another location 26

End-to-End Verification Test recoverability asynchronously after backups File system consistency Data integrity on disk Primary storage can t verify after write It would be too slow Primary storage discovers problems during restore The end-to-end check verifies all file system data and metadata 27

Fault Avoidance and Containment Overwriting good data with new data puts previous backups at risk Use log-structured file system Eliminate partial-raid-stripe writes 28

Fault Detection and Recoverability Verify all data read is correct Verify all data, even those not read, remain correct Ensure all metadata structures can be rebuilt Ensure file system check (FSCK) is fast when needed 29

Flexible and Efficient WAN Vaulting Data Center (DC) Disaster Recovery WAN Primary DC Remote DC Remote Office Data Protection WAN Primary DC Remote DC Remote Sites Multi-site Tape Consolidation 30

Time to DR-Ready Inline Dedupe Replication Replicate During Backup DR- Ready Adaptive Post-Process Dedupe Replication Backup to Cache Dedupe, replicate & present: <50% ingest speed DR- Ready Scheduled Post-Process Dedupe Replication Backup to Cache Dedupe, replicate & present: <50% ingest speed DR- Ready VTL/Tape/Truck Backup to VTL Copy to Tape Recall Tapes Truck to Storage Truck from Storage? Time Backup Window 31

Summary File system for data protection is different from file system for primary storage File system for data protection must offer Fast and effective data reduction CPU-centric inline data deduplication Scalable performance Large deduplication domain Resiliency fitting storage of last resort Efficient and flexible replication 32

Q&A / Feedback Please send any questions or comments on this presentation to SNIA: trackfilemgmt@snia.org Windsor Hsu Hugo Patterson Joseph White Nancy Clay Many thanks to the following individuals for their contributions to this tutorial. - SNIA Education Committee 33