SDFS Overview. By Sam Silverberg



Similar documents
Evaluation of the deduplication file system "Opendedup"

Distributed File Systems

Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle

Analyzing Big Data with Splunk A Cost Effective Storage Architecture and Solution

The Google File System

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January Permabit Technology Corporation

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May Copyright 2014 Permabit Technology Corporation

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Apache Hadoop. Alexandru Costan

The Power of Deduplication-Enabled Per-VM Data Protection SimpliVity s OmniCube Aligns VM and Data Management

Managing MySQL Scale Through Consolidation

Tuning WebSphere Application Server ND 7.0. Royal Cyber Inc.

Technical Overview Simple, Scalable, Object Storage Software

Open Source Data Deduplication

Evaluation of Cloud ONTAP and AltaVault using AWS

StorReduce Technical White Paper Cloud-based Data Deduplication

Architecting ColdFusion For Scalability And High Availability. Ryan Stewart Platform Evangelist

Amazon Cloud Storage Options

A SCALABLE DEDUPLICATION AND GARBAGE COLLECTION ENGINE FOR INCREMENTAL BACKUP

Benchmarking Cassandra on Violin

Redefining Microsoft SQL Server Data Management. PAS Specification

Distributed Block-level Storage Management for OpenStack

Get Success in Passing Your Certification Exam at first attempt!

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Agenda. Enterprise Application Performance Factors. Current form of Enterprise Applications. Factors to Application Performance.

BlueArc unified network storage systems 7th TF-Storage Meeting. Scale Bigger, Store Smarter, Accelerate Everything

M710 - Max 960 Drive, 8Gb/16Gb FC, Max 48 ports, Max 192GB Cache Memory

Springpath Data Platform

Moving Virtual Storage to the Cloud. Guidelines for Hosters Who Want to Enhance Their Cloud Offerings with Cloud Storage

PostgreSQL Performance Characteristics on Joyent and Amazon EC2

Performance in a Gluster System. Versions 3.1.x

3Gen Data Deduplication Technical

Best Practices for Managing Storage in the Most Challenging Environments

EMC XTREMIO EXECUTIVE OVERVIEW

THE ATLAS DISTRIBUTED DATA MANAGEMENT SYSTEM & DATABASES

Flash Storage Roles & Opportunities. L.A. Hoffman/Ed Delgado CIO & Senior Storage Engineer Goodwin Procter L.L.P.

BEST PRACTICES GUIDE: VMware on Nimble Storage

Diagram 1: Islands of storage across a digital broadcast workflow

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

Sawmill Log Analyzer Best Practices!! Page 1 of 6. Sawmill Log Analyzer Best Practices

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

Moving Virtual Storage to the Cloud

Is VMware Data Recovery the replacement for VMware Consolidated Backup (VCB)? VMware Data Recovery is not the replacement product for VCB.

Redefining Oracle Database Management

A Deduplication File System & Course Review

<Insert Picture Here>

Cloud Storage. Parallels. Performance Benchmark Results. White Paper.

Amazon Elastic Beanstalk

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

Zadara Storage Cloud A

Protect Data... in the Cloud

EMC DATA DOMAIN OPERATING SYSTEM

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

EMC DATA DOMAIN OPERATING SYSTEM

How To Monitor A Server With Zabbix

On- Prem MongoDB- as- a- Service Powered by the CumuLogic DBaaS Platform

EMC SOLUTION FOR SPLUNK

Michael Thomas, Dorian Kcira California Institute of Technology. CMS Offline & Computing Week

The Methodology Behind the Dell SQL Server Advisor Tool

MaxDeploy Ready. Hyper- Converged Virtualization Solution. With SanDisk Fusion iomemory products

Best Practices for Deploying Citrix XenDesktop on NexentaStor Open Storage

Quantum StorNext. Product Brief: Distributed LAN Client

EMC - XtremIO. All-Flash Array evolution - Much more than high speed. Systems Engineer Team Lead EMC SouthCone. Carlos Marconi.

OPTIMIZING EXCHANGE SERVER IN A TIERED STORAGE ENVIRONMENT WHITE PAPER NOVEMBER 2006

A STORAGE SYSTEM JUST LIKE THE ONE YOU HAVE TODAY A STORAGE SYSTEM NOTHING LIKE THE ONE YOU HAVE TODAY.

Configuring Celerra for Security Information Management with Network Intelligence s envision

EMC VNX2 Deduplication and Compression

Veeam Best Practices with Exablox

Alternatives to Big Backup

Best Practices for Architecting Storage in Virtualized Environments

Introduction to NetApp Infinite Volume

<Insert Picture Here> Refreshing Your Data Protection Environment with Next-Generation Architectures

High Performance Server SAN using Micron M500DC SSDs and Sanbolic Software

Load Balancing and High availability using CTDB + DNS round robin

A Virtual Filer for VMware s Virtual SAN A Maginatics and VMware Joint Partner Brief

89 Fifth Avenue, 7th Floor. New York, NY White Paper. HP 3PAR Thin Deduplication: A Competitive Comparison

SQL Server 2012 Optimization, Performance Tuning and Troubleshooting

SECURE, ENTERPRISE FILE SYNC AND SHARE WITH EMC SYNCPLICITY UTILIZING EMC ISILON, EMC ATMOS, AND EMC VNX

Best Practices for Using MySQL in the Cloud

[Hadoop, Storm and Couchbase: Faster Big Data]

Cognos8 Deployment Best Practices for Performance/Scalability. Barnaby Cole Practice Lead, Technical Services

Distributed File System Choices: Red Hat Storage, GFS2 & pnfs

THE HADOOP DISTRIBUTED FILE SYSTEM

Informatica Master Data Management Multi Domain Hub API: Performance and Scalability Diagnostics Checklist

Ultimate Guide to Oracle Storage

Cloud Optimize Your IT

Data Domain Overview. Jason Schaaf Senior Account Executive. Troy Schuler Systems Engineer. Copyright 2009 EMC Corporation. All rights reserved.

REMOTE SITE RECOVERY OF ORACLE ENTERPRISE DATA WAREHOUSE USING EMC DATA DOMAIN

Redefining Microsoft Exchange Data Management

Journal of science STUDY ON REPLICA MANAGEMENT AND HIGH AVAILABILITY IN HADOOP DISTRIBUTED FILE SYSTEM (HDFS)

Testing of several distributed file-system (HadoopFS, CEPH and GlusterFS) for supporting the HEP experiments analisys. Giacinto DONVITO INFN-Bari

Nutanix Tech Note. Configuration Best Practices for Nutanix Storage with VMware vsphere

Data Compression and Deduplication. LOC Cisco Systems, Inc. All rights reserved.

Transcription:

SDFS Overview By Sam Silverberg

Why did I do this? I had an Idea that I needed to see if it worked.

Design Goals Create a dedup file system capable of effective inline deduplication for Virtual Machines Support Small Block Sizes (4k) File Based cloning Support NFS and CIFS Global Deduplication Between volumes Between Hosts High Performance Multi threaded High IO concurrency Fast reads and writes Scalable Able to store a petabyte of data Distributed architecture

Why User Space and Object Based Quicker for my Iterative Development Cycle Platform Independence Easier to scale and cluster Interesting opportunities for data manipulation Provides flexibility for integrating with other user space services such as S3 Leverage exists file system technologies such as replication and Snapshoting without re-inventing the wheel Deduplication lends itself well to an object based file system. Little or no performance impact @128k chunks 150 MB/s + Write 150 MB/s + Read 340 MB/s Re-Write 3 TB of Data 2 GB of RAM

Why Java Cross Platform Portability Threading and Garbage Collection works well for large number of objects Good integration with web/cloud based storage No performance impact - Using Direct ByteBuffers makes it as fast as native code

The Architecture

Terminology SDFS Volume : The mounted deduplicated volume. A volume contains one Dedup File Engine and one Fuse Based File System Dedup File Engine : The client side service, within a mounted SDFS Volume, that manages file level activities such as file reads, writes, and deletes. Fuse Based File System : User level file system that is used to present files, contained within the Dedup Engine as a file system volume. Dedup Storage Engine : The server side service that stores and retrieve chunks of deduped data

Overview Fuse Based File System (Written in C) Dedup File Engine (Written in Java) JNI Native IO Dedup Storage Engine (Written in Java) Local File System Amazon S3 TCP HTTP

Physical Architecture Dedup File Engines Dedup Storage Engines TCP/UDP Hashes are routed to Dedup Storage Engines based on the 1 st byte of the Tiger hash. Up to 256 Storage Engines allowed.

Fuse Interface Uses Fuse-J JNI interface File Statistics and Advanced file manipulation is done though getting and setting extended file attributes (getfattr setfattr) Each mounted volume uses it own Fuse-J interface and Dedup File Engine instance

Dedup File Engine Dedup File Engine is made of 4 basic components MetaDataDedupFile. Contains all the meta-data about a file in the mounted volume. File Path,Length, Last Accessed, Permissions DedupFile. Contains a map of Hashes to File Locations for dedup data. Persisted using Customer Persistant Hashtable (Really Fast and Light Memory) Dedup File Channel. Interfaces between Fuse-J and other DedupFile for IO commands WritableCacheBuffer. Contains a chunk of data that is being read or written to inline. DedupFile(s) cache many of these during IO activity. MetaDataDedupFiles are stored in the MetaFileStore. The MetaFileStore persists using JDBM DedupFiles are stored in the DedupFileStore. They are persisted independently as an individial disk based hashtable.

Steps to Persisting A Chunk of Data 1. WritableCacheBuffer is aged (LRU) out of the DedupFile write cache and is tagged for writing 2. Byte Array within WritableCacheBuffer is hashed and passed to HCServiceProxy 3. HCServiceProxy looks up route, to the appropriate dedup storage engine, for the hashed chunk (WritableCacheBuffer) based on the 1 st byte of the hash. 4. Based on route HCServiceProxy queries the appropriate dedup storage engine to see if the hash is already persisted. 5. It the hash is peristed then no other action is performed. If not the hash is sent to the dedup storage engine either compressed or not based on configuration settings.

Dedup Storage Engine 3 Basic Pieces to the Architecture HashStore. The hash store contains a list of all hashes and where the data represented by that hash is located. The hashstore is persisted using custom hashtable. The custom hashtable is fast and memory efficient ChunkStore. ChunkStore is responsible for storing and retrieving the data associated with a specific hash.currently three ChunkStores have been implemented FileChunkStore Saves all data in one large file FileBasedChunkStore Saves all chunks as individual files S3ChunkStore Saves all data to AWS (s3) The chunk of Data. Data is passed as byte arrays between objects within the Dedup Storage Engine.

Steps to Persisting A Chunk of Data 1. ClientThread recieves a WRITE_HASH_CMD or a WRITE_COMPRESSED_CMD and reads the md5/sha hash and data 2. ClientThread passes hash to the HashChunkService 3. HashChunkService passes hash and data to the appropriate HashStore 4. HashStore stores the hash and sends the data to the assigned ChunkStore 5. ChunkStore stores data. Only One type of ChunkStore can be used per Dedup Storage Engine Instance.

Deduplication Concerns Data Corruption Files can be corrupted if the Dedup File Engine is killed unexpectantly. This is because written data is cached in memory until persisted to a backend Dedup Storage Engine. Solution : cache data to memory and write locally until write to Dedup Storage Engine is complete, then flush both. This option is enabled using safe-sync=true. Memory Footprint The Dedup Storage Engine uses about 8 GB of memory for every TB used at 4k blocks. BTW This is linear so it would be 32 TB at 128k blocks Solution : Scale out with RAIN configuration for DSE. Files with a high rate of change Files with a high rate of change and lots of unique data, such as page files or transaction logs will not effectively use dedup. Implemented Solution : Selective block level deduplication. Performance suffers with copying large number of small files (< 5 MB) A meta data dedup file instance is created per File. This is fast, but not as fast as IO. Possible Solution : Create a pool of Hash Tables and allocate them on the fly as new files are needed

Today s Solutions - Considerations Scalability : Most dedupe solutions are per volume or per host. SDFS can share dedupe data across hosts and volumes providing expanded storage. Store at 128k chunks and not tuned for small block size deuplication. This will effect memory usage considerable and possibly increase memory requirments 32x. Large block size does not work for VMDKs Today s dedup solutions only solves the problem of storing dedup data not reading and writing inline.

Native File System Considerations A volume is monolithic which would make inter host deduplication difficult. Deduplication requires a hash table. Scalability could not be maintained if a hash table is required per instance of a volume. Hashes will be difficult dedup between volumes File Level Compression will be difficult with file system deduplication