A Deduplication File System & Course Review



Similar documents
COS 318: Operating Systems. Virtual Machine Monitors

Trends in Enterprise Backup Deduplication

COS 318: Operating Systems

Data Backup and Archiving with Enterprise Storage Systems

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

Get Success in Passing Your Certification Exam at first attempt!

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

Long term retention and archiving the challenges and the solution

EMC Data de-duplication not ONLY for IBM i

EMC DATA DOMAIN OPERATING SYSTEM

Theoretical Aspects of Storage Systems Autumn 2009

Speeding Up Cloud/Server Applications Using Flash Memory

EMC DATA DOMAIN OPERATING SYSTEM

Chapter 12: Mass-Storage Systems

Backup and Recovery Redesign with Deduplication

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant

Hardware Configuration Guide

Efficient Backup with Data Deduplication Which Strategy is Right for You?

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

Promise of Low-Latency Stable Storage for Enterprise Solutions

EMC DATA DOMAIN PRODUCT OvERvIEW

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

How To Make A Backup System More Efficient

A SCALABLE DEDUPLICATION AND GARBAGE COLLECTION ENGINE FOR INCREMENTAL BACKUP

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem

June Blade.org 2009 ALL RIGHTS RESERVED

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

COS 318: Operating Systems. Virtual Memory and Address Translation

EMC DATA DOMAIN OVERVIEW. Copyright 2011 EMC Corporation. All rights reserved.

e Number: Passing Score: 800 Time Limit: 120 min File Version: 1.0

Bigdata High Availability (HA) Architecture

09'Linux Plumbers Conference

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

SAP HANA - Main Memory Technology: A Challenge for Development of Business Applications. Jürgen Primsch, SAP AG July 2011

3Gen Data Deduplication Technical

RAMCloud and the Low- Latency Datacenter. John Ousterhout Stanford University

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

Chapter 11 I/O Management and Disk Scheduling

Overcoming Backup & Recovery Challenges in Enterprise VMware Environments

CIGRE 2014: Udaljena zaštita podataka

COSC 6374 Parallel Computation. Parallel I/O (I) I/O basics. Concept of a clusters

EMC BACKUP MEETS BIG DATA

We look beyond IT. Cloud Offerings

WHITE PAPER. Dedupe-Centric Storage. Hugo Patterson, Chief Architect, Data Domain. Storage. Deduplication. September 2007

Storage Design for High Capacity and Long Term Storage. DLF Spring Forum, Raleigh, NC May 6, Balancing Cost, Complexity, and Fault Tolerance

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

Scalable Storage for Life Sciences

Lecture 18: Reliable Storage

FAST 11. Yongseok Oh University of Seoul. Mobile Embedded System Laboratory

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Chapter Introduction. Storage and Other I/O Topics. p. 570( 頁 585) Fig I/O devices can be characterized by. I/O bus connections

E4 UNIFIED STORAGE powered by Syneto

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

File System & Device Drive. Overview of Mass Storage Structure. Moving head Disk Mechanism. HDD Pictures 11/13/2014. CS341: Operating System

IOmark- VDI. Nimbus Data Gemini Test Report: VDI a Test Report Date: 6, September

Storage Backup and Disaster Recovery: Using New Technology to Develop Best Practices

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

Alternatives to Big Backup

HP StoreOnce: reinventing data deduplication

Price/performance Modern Memory Hierarchy

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

Introduction Disks RAID Tertiary storage. Mass Storage. CMSC 412, University of Maryland. Guest lecturer: David Hovemeyer.

Data Deduplication Background: A Technical White Paper

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

ntier Verde Simply Affordable File Storage

Protect Data... in the Cloud

Configuring Apache Derby for Performance and Durability Olav Sandstå

Chapter 11: File System Implementation. Operating System Concepts with Java 8 th Edition

Weighted Total Mark. Weighted Exam Mark

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

Protect Microsoft Exchange databases, achieve long-term data retention

Data Deduplication and Tivoli Storage Manager

DXi Accent Technical Background

efficient protection, and impact-less!!

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

Chapter 6, The Operating System Machine Level

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

Restoration Technologies. Mike Fishman / EMC Corp.

Deduplication has been around for several

Outline. Failure Types

VMware vsphere Data Protection 6.0

Effective Planning and Use of IBM Tivoli Storage Manager V6 and V7 Deduplication

Trends in Data Protection and Restoration Technologies. Mike Fishman, EMC 2 Corporation (Author and Presenter)

For Hyper-V Edition Practical Operation Seminar. 4th Edition

AIX NFS Client Performance Improvements for Databases on NAS

RAID. RAID 0 No redundancy ( AID?) Just stripe data over multiple disks But it does improve performance. Chapter 6 Storage and Other I/O Topics 29

File Systems Management and Examples

The Modern Virtualized Data Center

Best Practices Guide. Symantec NetBackup with ExaGrid Disk Backup with Deduplication ExaGrid Systems, Inc. All rights reserved.

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Effective Planning and Use of TSM V6 Deduplication

VERITAS NetBackup & Data Domain DD200 Restorer

DEDUPLICATION BASICS

DELL TM PowerEdge TM T Mailbox Resiliency Exchange 2010 Storage Solution

Geospatial Server Performance Colin Bertram UK User Group Meeting 23-Sep-2014

Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January Permabit Technology Corporation

Transcription:

A Deduplication File System & Course Review Kai Li 12/13/12

Topics A Deduplication File System Review 12/13/12 2

Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror Onsite Backup Storage Offsite backup 3

Evolved Data Center Storage Hierarchy Clients Network Network Attached Storage (NAS) w/ snapshots to protect data Remote mirror Onsite Backup Storage Offsite backup 4

A Data Center with Deduplication Storage Mirrored storage Clients Server Primary storage Promises Onsite Purchase: ~Tape libraries Space: 10-30X reduction WAN BW: 10-100X reduction Power: ~10X reduction WAN Remote 12/13/12 5

What Is Deduplication? Deduplication is global compression that removes the redundant segments globally (across many files) Local compression tools (gzip, winzip, ) encode redundant strings in a small window (within a file) 12/13/12 6

Idea of Deduplication Traditional local compression Deduplication Encode a sliding window of bytes (e.g. 100K) [Ziv&Lempel77] ~2-3X compression ~10-50X compression Large window more redundant data 12/13/12 7

Main Deduplication Methods Fingerprinting Computing a fingerprint as the ID for each segment Use an index to lookup if the segment is a duplicate Is the fingerprint in the index? Yes: duplicate No: new segment Deltas Computing a sketch for each segment [Broder97] Find the most similar segment by comparing sketches Yes: Compute deltas with the most similar segment Write delta and a pointer to the similar segment No: new segment 12/13/12 8

Backup Data Example View from Backup Software (tar or similar format) Data Stream First Full Backup Incr 1 Incr 2 Second Full Backup A B C D A E F G A B H A E I B J C D E F G H Deduplicated Storage: Redundancies pooled, compressed A B C D E F G H I J = Unique variable segments = Redundant data segments = Compressed unique segments 12/13/12 9

Two Segmentation Methods Fixed size 4k 4k 4k... Content-based, variable size... A X C D A Y C D A B C D A B Cannot handle adds, deletes (shifts) well Independent of adds, deletes (shifts) fp = 10110110 fp = 10110100... fp = 10110000 Rolling fingerprinting 12/13/12 10

Segment Sizes Double segment size Increase space for unique segments by 15% Decrease most metadata by about 50% Reduce disk I/Os for writes and reads Use the right size for compression ratio and speed 12/13/12 11

Components in Data Domain Deduplication File System Interfaces (NFS, CIFS, VTL, ) Object-Oriented File System Deduplication GC & Verification Data Layout Replication RAID-6 Disk Shelves 12/13/12 12

Design Challenges Extremely reliable and self-healing Corrupting a segment may corrupt multiple files NVRAM to store log (transactions) Invulnerability features: Frequent verifications Metadata reconstruction from self-describing containers Self-correction from RAID-6 High-speed high-compression at low HW cost Why high speed: data 2X/18 months and 24 hours/day Why high compression: low cost and fewer disks Use commodity server hardware 12/13/12 13

Revisit the Deduplication Process (Fingerprinting) Divide data streams into segments Lookup Fingerprint Index Index size for 80TB w/ 8KB segments = (80TB/8KB) * 20B = 200GB! Yes: Fingerprint No: pack segment into container, apply local compression, write out to disk 12/13/12 14

Problematic Alternative 1: Caching Divide data streams into segments Lookup Index Cache Miss Fingerprint Index Problem: No locality. 12/13/12 15

Problematic Alternative 2: Parallel Index [Venti02] Divide data streams into segments Lookup Index Cache Problem: Need a lot of disks. 7200RPM disk does 120 lookups/sec. 1MB/sec with 8KB segment per disk 1GB/sec needs 1,000 disks! Miss... Fingerprint Index 12/13/12 16

Problematic Alternative 3: Staging Data Streams Divide data streams into segments Very Big Disk Buffer Lookup... Fingerprint Index Problem: The Buffer needs to be as large or larger than the full backup! Big delay and may still never catch up 12/13/12 17

High-Speed High Compression at Low HW Cost Layout data on disk with duplicate locality A sophisticated cache for the fingerprint index Summary data structure for new data locality-preserved caching for old data Parallelized deduplication architecture to take full advantage of multicore processors Benjamin Zhu, Kai Li and Hugo Patterson. Avoiding the Disk Bottleneck in the Data Domain Deduplication File System. In Proceedings of The 6 th USENIX Conference on File and Storage Technologies (FAST 08). February 2008 12/13/12 18

Summary Vector Goal: Use minimal memory to test for new data Summarize what segments have been stored, with Bloom filter (Bloom 70) in RAM If Summary Vector says no, it s new segment Summary Vector Approximation Index Data Structure 12/13/12 19

Known Analysis Results Bloom filter with m bits k independent hash functions After inserting n keys, the probability of a false positive is: Examples: m/n = 6, k = 4: p = 0.0561 m/n = 8, k = 6: p = 0.0215 Experimental data validate the analysis results 12/13/12 20

Stream Informed Segment Layout Goal: Capture duplicate locality on disk Segments from the same stream are stored in the same containers Metadata (index data) are also in the containers 12/13/12 21

Locality Preserved Caching (LPC) Goal: Maintain duplicate locality in the cache Disk Index has all <fingerprint, containerid> pairs Index Cache caches a subset of such pairs On a miss, lookup Disk Index to find containerid Load the metadata of a container into Index Cache, replace if needed Miss Disk Index ContainerID Metadata Data Load metadata Index Cache Replacement Container 12/13/12 22

Putting Them Together A fingerprint Duplicate New Index Cache No Summary Vector Replacement Maybe Disk Index metadata metadata metadata data metadata data data data 12/13/12 23

Evaluation What to evaluate Disk I/O reduction Write and read throughput Deduplication results Platform (DD880) 4 Quad 2.9Ghz XeonCPUs, 32GB RAM, 10GE NIC, 2 x 1GB NVRAM, 96TB 7,200 RPM ATA disks 12/13/12 24

Disk I/O Reduction Results Exchange data (2.56TB) 135-daily full backups Engineering data (2.39TB) 100-day daily inc, weekly full # disk I/Os % of total # disk I/Os % of total No summary, No SISL/LPC 328,613,503 100.00% 318,236,712 100.00% Summary only 274,364,788 83.49% 259,135,171 81.43% SISL/LPC only 57,725,844 17.57% 60,358,875 18.97% Summary & SISL/LPC 3,477,129 1.06% 1,257,316 0.40% 12/13/12 25

NFS Deduplication Write Throughput (MB/sec) Backup Generations 12/13/12 26

NetBackup OST Deduplication Write Throughput (MB/sec) Backup Generations 12/13/12 27

NFS Deduplication Read Throughput (MB/sec) Backup Generations 12/13/12 28

NetBackup OST Deduplication Read Throughput (MB/sec) Backup Generations 12/13/12 29

Real World Example at Datacenter A 12/13/12 30

Real World Compression at Datacenter A 12/13/12 31

Real World Example at Datacenter B 12/13/12 32

Real World Compression at Datacenter B 12/13/12 33

Summary Deduplication removes redundant data globally Advanced deduplication file system Has become a de facto standard to store highly redundant data because of reduction in cost, performance, power, space, Scalable performance with multicore CPUs Use cases Backup, nearline, archival and flash 12/13/12 34

Review Topics OS structure Process management CPU scheduling I/O devices Virtual memory Disks and file systems General concepts 35

Operating System Structure Abstraction Protection and security Kernel structure Layered Monolithic Micro-kernel Virtualization Virtual machine monitor 36

Process Management Implementation State, creation, context switch Threads and processes Synchronization Race conditions and inconsistencies Mutual exclusion and critical sections Semaphores: P() and V() Atomic operations: interrupt disable, test-and-set. Monitors and Condition Variables Mesa-style monitor Deadlocks How deadlocks occur? How to prevent deadlocks? 37

CPU Scheduling Allocation Non-preemptible resources Scheduling -- Preemptible resources FIFO Round-robin STCF Lottery 38

I/O Devices Latency and bandwidth Interrupts and exceptions DMA mechanisms Synchronous I/O operations Asynchronous I/O operations Message passing 12/13/12 39

Virtual Memory Mechanisms Paging Segmentation Page and segmentation TLB and its management Page replacement FIFO with second chance Working sets WSClock 40

Disks and File Systems Disks Disk behavior and disk scheduling RAID5 and RAID6 Flash memory Write performance Wear leveling Flash translation layer Directories and implementation File layout Buffer cache Transaction and its implementation NFS and Stateless file system Snapshot Deduplication file system 41

Implementation BeginTransaction Commit Rollback Start using a write-ahead log on disk Log all updates Write commit at the end of the log Then write-behind to disk by writing updates to disk Clear the log Clear the log Crash recovery If there is no commit in the log, do nothing If there is commit, replay the log and clear the log Assumptions Writing to disk is correct (recall the error detection and correction) Disk is in a good state before we start 42

An Example: Atomic Money Transfer Move $100 from account S to C (1 thread): BeginTransaction S = S - $100; C = C + $100; Commit Steps: 1: Write new value of S to log 2: Write new value of C to log 3: Write commit 4: Write S to disk 5: Write C to disk 6: Clear the log Possible crashes After 1 After 2 After 3 before 4 and 5 Questions Can we swap 3 with 4? Can we swap 4 and 5? C = 110 S = 700 C = = 11010 S = 700 800 S=700 C=110 Commit 43

Questions Do the following transactions behave the same? BeginTransaction C = C + $100; S = S - $100; Commit BeginTransaction S = S - $100; C = C + $100; Commit Yes, this is why transactions are good Flushing buffer cache without worrying about the order of writes Group transactions in data base systems Many more convenient programming 12/13/12 44

Major Concepts Locality Spatial and temporal locality Scheduling Use the past to predict the future Layered abstractions Synchronization, transactions, file systems, etc Caching TLB, VM, buffer cache, etc 45

Operating System as Illusionist Physical reality Single CPU Interrupts Limited memory No protection Raw storage device Abstraction Infinite number of CPUs Cooperating sequential threads Unlimited virtual memory Each address has its own machine Organized and reliable storage system 46

Future Courses Spring COS 461: computer networks COS 598C: Analytics and systems of big data Fall: COS 432: computer security COS 561: Advance computer networks or (or COS 518: Advanced OS) Some grad seminars in systems 12/13/12 47