Efficient File Storage Using Content-based Indexing

Size: px
Start display at page:

Download "Efficient File Storage Using Content-based Indexing"

Transcription

1 Efficient File Storage Using Content-based Indexing João Barreto Paulo Ferreira Distributed Systems Group - INESC-ID Lisbon Technical University of Lisbon

2 Why Using Content-based Indexing for File Storage? Extending content-based indexing (e.g. as used by LBFS [MCM01]) from network transference of file contents to the context of local file storage is a natural step. Particularly interesting as a storage-efficient support for: Versioning file systems Resource-constrained embedded file systems However, access performance must be acceptable and storage gains significative. 2

3 Existing Solutions: Chunk Repository Storage Model To some extent, all existing file storage architectures that are based on content-based indexing share a core storage model [CN02, QD02, BF04]. File contents are divided into disjoint chunks of data, each individually stored with a unique hash of its contents in a repository of chunks. The actual files are then stored as sequences of possibly shared references to chunks in the repository. file 1 file 2 file 3 Example using Chunk Repository Model: r 5 r 2 r k r 3 r n r 3 r n r 4 r 1 Legend c i chunk contents h i chunk hash r i chunk reference Chunk Repository h 1 c 1 h 2 c 2 h 3 c 3 h 14 c 14 h 5 c 5 h 1k c 1k h n c n ` ` 3

4 Problems of Chunk Repository Storage Model Higher storage penalties as chunk size is decreased: 1. Increased chunk meta-data overhead (mainly with chunk hashes); 2. Increased internal fragmentation, if the chunk repository is stored on a block-based device; 3. Lower chunk compression ratios are achieved. Trade-off restricts the choice of the expected chunk size to relatively high values, hence existing solutions do not fully exploit the similarity that may exist in a file system. Sequential read performance is penalized even if the file being accessed does not share any chunk with other files Since chunks are stored in a randomly organized repository. 4

5 The Proposed Storage Model Two distinguishing principles: 1. If a file shares no chunks with the remaining file system, it should be stored in a plain form. Succeeding files that share any chunks with that file will reference those portions of its plain contents 2. Hashes of chunks are not stored permanently along with contents of the file system Content similarity detection of new file system data requires indexing whole contents of the file system Performed periodically in background file 1 file 2 file 3 Previous Example using the Proposed Model: c 5 c 2 c k c 3 r 3 r n c 4 c n c 1 Legend c i chunk contents r i chunk reference 5

6 Chunk Coalescing Optimises cases where consecutive pointers to contiguous shared chunks are detected. Such pointers are replaced by a simple multiple-chunk pointer, thus reducing the storage overhead and allowing faster access performance. In practice, this is comparable (though not always equivalent) to considering a higher chunk size whenever resorting to a lower size would yield no additional similarity gains. Example of Chunk Coalescing file r 3 file c 3 r n c 4 c c n (before chunk coalescing) file r c 3 3,n 4 c 1 6 (after chunk coalescing)

7 Advantages In case of no similarity, no storage overhead is imposed and read access performance is identical to that of a regular file system. Storage penalties that resulted from dividing files into smaller chunks are eliminated: In case of no sharing of a chunk, no chunk storage overhead; In case of sharing, storage overhead is negligible and always compensated by the gains resulting from the increased chunk sharing: Hash values are not stored along with contents; Update coalescing optimises chunk reference overhead; In case of an underlying block-based file system, internal fragmentation is, on average, not affected; Data compression may be applied to the unshared portions of each file as a whole, thus achieving higher compression ratios than to individual smaller chunks, ; 7

8 Current Status Partially functional in a simulator. Currently being implemented as a Linux Virtual File System. References [BF04] J. Barreto and P. Ferreira. A replicated file system for resource constrained mobile devices. In Proceedings of IADIS International Conference on Applied Computing, [CN02] L. Cox and B. Noble. Pastiche: Making backup cheap and easy. In Proceedings of the Fifth ACM/USENIX Symposium on Operating Systems Design and Implementation, Boston, MA, December [MCM01] Athicha Muthitacharoen, Benjie Chen, and David Mazieres. A low-bandwidth network file system. In Symposium on Operating Systems Principles, pages , [QD02] S. Quinlan and S. Dorward. Venti: a new approach to archival storage. In First USENIX conference on File and Storage Technologies, Monterey,CA,

De-duplication-based Archival Storage System

De-duplication-based Archival Storage System De-duplication-based Archival Storage System Than Than Sint Abstract This paper presents the disk-based backup system in which only relational database files are stored by using data deduplication technology.

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

The assignment of chunk size according to the target data characteristics in deduplication backup system

The assignment of chunk size according to the target data characteristics in deduplication backup system The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,

More information

A Highly Available Replicated File System for Resource-Constrained Windows CE.Net Devices 1

A Highly Available Replicated File System for Resource-Constrained Windows CE.Net Devices 1 A Highly Available Replicated File System for Resource-Constrained Windows CE.Net Devices 1 João Barreto 2 and Paulo Ferreira INESC-ID/IST Rua Alves Redol N.º 9 1000-029 Lisboa, Portugal {joao.barreto,

More information

Byte-index Chunking Algorithm for Data Deduplication System

Byte-index Chunking Algorithm for Data Deduplication System , pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve

More information

Cumulus: filesystem backup to the Cloud

Cumulus: filesystem backup to the Cloud Michael Vrable, Stefan Savage, a n d G e o f f r e y M. V o e l k e r Cumulus: filesystem backup to the Cloud Michael Vrable is pursuing a Ph.D. in computer science at the University of California, San

More information

Read Performance Enhancement In Data Deduplication For Secondary Storage

Read Performance Enhancement In Data Deduplication For Secondary Storage Read Performance Enhancement In Data Deduplication For Secondary Storage A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Pradeep Ganesan IN PARTIAL FULFILLMENT

More information

Authorized data deduplication check in hybrid cloud With Cluster as a Service

Authorized data deduplication check in hybrid cloud With Cluster as a Service Authorized data deduplication check in hybrid cloud With Cluster as a Service X.ALPHONSEINBARAJ PG Scholar, Department of Computer Science and Engineering, Anna University, Coimbatore. Abstract Data deduplication

More information

Data Deduplication Scheme for Cloud Storage

Data Deduplication Scheme for Cloud Storage 26 Data Deduplication Scheme for Cloud Storage 1 Iuon-Chang Lin and 2 Po-Ching Chien Abstract Nowadays, the utilization of storage capacity becomes an important issue in cloud storage. In this paper, we

More information

Efficient Locally Trackable Deduplication in Replicated Systems

Efficient Locally Trackable Deduplication in Replicated Systems Efficient Locally Trackable Deduplication in Replicated Systems João Barreto and Paulo Ferreira Distributed Systems Group - INESC-ID/Technical University of Lisbon {joao.barreto,paulo.ferreira}@inesc-id.pt

More information

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Quanqing XU Quanqing.Xu@nicta.com.au YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data

More information

3 Taking Advantage of Diversity

3 Taking Advantage of Diversity The Phoenix Recovery System: Rebuilding from the ashes of an Internet catastrophe Flavio Junqueira Ranjita Bhagwan Keith Marzullo Stefan Savage Geoffrey M. Voelker Department of Computer Science and Engineering

More information

A Data De-duplication Access Framework for Solid State Drives

A Data De-duplication Access Framework for Solid State Drives JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 941-954 (2012) A Data De-duplication Access Framework for Solid State Drives Department of Electronic Engineering National Taiwan University of Science

More information

A Deduplication-based Data Archiving System

A Deduplication-based Data Archiving System 2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System

More information

Efficiently Storing Virtual Machine Backups

Efficiently Storing Virtual Machine Backups Efficiently Storing Virtual Machine Backups Stephen Smaldone, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract Physical level backups offer increased performance

More information

Two-Level Metadata Management for Data Deduplication System

Two-Level Metadata Management for Data Deduplication System Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,

More information

DEDISbench: A Benchmark for Deduplicated Storage Systems

DEDISbench: A Benchmark for Deduplicated Storage Systems DEDISbench: A Benchmark for Deduplicated Storage Systems J. Paulo, P. Reis, J. Pereira and A. Sousa High-Assurance Software Lab (HASLab) INESC TEC & University of Minho Abstract. Deduplication is widely

More information

Chapter 12 File Management

Chapter 12 File Management Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access

More information

Chapter 12 File Management. Roadmap

Chapter 12 File Management. Roadmap Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access

More information

Side channels in cloud services, the case of deduplication in cloud storage

Side channels in cloud services, the case of deduplication in cloud storage Side channels in cloud services, the case of deduplication in cloud storage Danny Harnik IBM Haifa Research Lab dannyh@il.ibm.com Benny Pinkas Bar Ilan University benny@pinkas.net Alexandra Shulman-Peleg

More information

Assessing Data Deduplication Trade-offs from an Energy and Performance Perspective

Assessing Data Deduplication Trade-offs from an Energy and Performance Perspective Assessing Data Deduplication Trade-offs from an Energy and Performance Perspective Lauro Beltrão Costa, Samer Al-Kiswany, Raquel Vigolvino Lopes and Matei Ripeanu Electrical and Computer Engineering Department

More information

Efficient and Safe Data Backup with Arrow

Efficient and Safe Data Backup with Arrow Efficient and Safe Data Backup with Arrow Technical Report UCSC-SSRC-8-2 June 28 Casey Marshall csm@soe.ucsc.edu Storage Systems Research Center Baskin School of Engineering University of California, Santa

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

Physical Data Organization

Physical Data Organization Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor

More information

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper HP StoreOnce D2D Understanding the challenges associated with NetApp s deduplication Business white paper Table of contents Challenge #1: Primary deduplication: Understanding the tradeoffs...4 Not all

More information

A DHT-based Backup System

A DHT-based Backup System A DHT-based Backup System Emil Sit, Josh Cates, and Russ Cox MIT Laboratory for Computer Science 10 August 2003 1 Introduction Distributed hash tables have been proposed as a way to simplify the construction

More information

Deduplication Demystified: How to determine the right approach for your business

Deduplication Demystified: How to determine the right approach for your business Deduplication Demystified: How to determine the right approach for your business Presented by Charles Keiper Senior Product Manager, Data Protection Quest Software Session Objective: To answer burning

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

Duplicate Data Elimination in a SAN File System

Duplicate Data Elimination in a SAN File System Duplicate Data Elimination in a SAN File System Bo Hong Univ. of California, Santa Cruz hongbo@cs.ucsc.edu Darrell D.E. Long Univ. of California, Santa Cruz darrell@cs.ucsc.edu Demyn Plantenberg IBM Almaden

More information

ABSTRACT 1 INTRODUCTION

ABSTRACT 1 INTRODUCTION DEDUPLICATION IN YAFFS Karthik Narayan {knarayan@cs.wisc.edu}, Pavithra Seshadri Vijayakrishnan{pavithra@cs.wisc.edu} Department of Computer Sciences, University of Wisconsin Madison ABSTRACT NAND flash

More information

PIONEER RESEARCH & DEVELOPMENT GROUP

PIONEER RESEARCH & DEVELOPMENT GROUP SURVEY ON RAID Aishwarya Airen 1, Aarsh Pandit 2, Anshul Sogani 3 1,2,3 A.I.T.R, Indore. Abstract RAID stands for Redundant Array of Independent Disk that is a concept which provides an efficient way for

More information

HTTP-Level Deduplication with HTML5

HTTP-Level Deduplication with HTML5 HTTP-Level Deduplication with HTML5 Franziska Roesner and Ivayla Dermendjieva Networks Class Project, Spring 2010 Abstract In this project, we examine HTTP-level duplication. We first report on our initial

More information

DeltaStor Data Deduplication: A Technical Review

DeltaStor Data Deduplication: A Technical Review White Paper DeltaStor Data Deduplication: A Technical Review DeltaStor software is a next-generation data deduplication application for the SEPATON S2100 -ES2 virtual tape library that enables enterprises

More information

Improvement of Network Optimization and Cost Reduction in End To End Process Implementing in Clouds

Improvement of Network Optimization and Cost Reduction in End To End Process Implementing in Clouds Improvement of Network Optimization and Cost Reduction in End To End Process Implementing in Clouds A. Sree Valli 1, R. Chandrasekhar 2 PG Scholar, Department of C.S.E, KIET College, JNTUK A.P 1 Assistant

More information

File Management Chapters 10, 11, 12

File Management Chapters 10, 11, 12 File Management Chapters 10, 11, 12 Requirements For long-term storage: possible to store large amount of info. info must survive termination of processes multiple processes must be able to access concurrently

More information

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Table of Contents Introduction... 3 Shortest Possible Backup Window... 3 Instant

More information

Data Deduplication in BitTorrent

Data Deduplication in BitTorrent Data Deduplication in BitTorrent João Pedro Amaral Nunes October 14, 213 Abstract BitTorrent is the most used P2P file sharing platform today, with hundreds of millions of files shared. The system works

More information

Online De-duplication in a Log-Structured File System for Primary Storage

Online De-duplication in a Log-Structured File System for Primary Storage Online De-duplication in a Log-Structured File System for Primary Storage Technical Report UCSC-SSRC-11-03 May 2011 Stephanie N. Jones snjones@cs.ucsc.edu Storage Systems Research Center Baskin School

More information

FLASH IMPLICATIONS IN ENTERPRISE STORAGE ARRAY DESIGNS

FLASH IMPLICATIONS IN ENTERPRISE STORAGE ARRAY DESIGNS FLASH IMPLICATIONS IN ENTERPRISE STORAGE ARRAY DESIGNS ABSTRACT This white paper examines some common practices in enterprise storage array design and their resulting trade-offs and limitations. The goal

More information

Efficient Deduplication in Disk- and RAM-based Data Storage Systems

Efficient Deduplication in Disk- and RAM-based Data Storage Systems Efficient Deduplication in Disk- and RAM-based Data Storage Systems Andrej Tolič and Andrej Brodnik University of Ljubljana, Faculty of Computer and Information Science, Slovenia {andrej.tolic,andrej.brodnik}@fri.uni-lj.si

More information

Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity

Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity Youngjin Nam School of Computer and Information Technology Daegu University Gyeongsan, Gyeongbuk, KOREA 712-714

More information

Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup

Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup Deepavali Bhagwat University of California 1156 High Street Santa Cruz, CA 9564 dbhagwat@soe.ucsc.edu Kave Eshghi Hewlett-Packard

More information

A Novel Approach for Calculation Based Cloud Band Width and Cost Diminution Method

A Novel Approach for Calculation Based Cloud Band Width and Cost Diminution Method A Novel Approach for Calculation Based Cloud Band Width and Cost Diminution Method Radhika Chowdary G PG Scholar, M.Lavanya Assistant professor, P.Satish Reddy HOD, Abstract: In this paper, we present

More information

OPTIMIZING VIRTUAL TAPE PERFORMANCE: IMPROVING EFFICIENCY WITH DISK STORAGE SYSTEMS

OPTIMIZING VIRTUAL TAPE PERFORMANCE: IMPROVING EFFICIENCY WITH DISK STORAGE SYSTEMS W H I T E P A P E R OPTIMIZING VIRTUAL TAPE PERFORMANCE: IMPROVING EFFICIENCY WITH DISK STORAGE SYSTEMS By: David J. Cuddihy Principal Engineer Embedded Software Group June, 2007 155 CrossPoint Parkway

More information

Data Backup and Archiving with Enterprise Storage Systems

Data Backup and Archiving with Enterprise Storage Systems Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia slavjan_ivanov@yahoo.com,

More information

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Fossil an archival file server

Fossil an archival file server Fossil an archival file server Russ Cox rsc@mit.edu PDOS Group Meeting January 7, 2003 http://pdos/~rsc/talks History... Cached WORM file server (Quinlan and Thompson): active file system on magnetic disk

More information

Prediction System for Reducing the Cloud Bandwidth and Cost

Prediction System for Reducing the Cloud Bandwidth and Cost ISSN (e): 2250 3005 Vol, 04 Issue, 8 August 2014 International Journal of Computational Engineering Research (IJCER) Prediction System for Reducing the Cloud Bandwidth and Cost 1 G Bhuvaneswari, 2 Mr.

More information

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011 the Availability Digest Data Deduplication February 2011 What is Data Deduplication? Data deduplication is a technology that can reduce disk storage-capacity requirements and replication bandwidth requirements

More information

Venti: a new approach to archival storage

Venti: a new approach to archival storage Venti: a new approach to archival storage Sean Quinlan and Sean Dorward Bell Labs, Lucent Technologies Abstract This paper describes a network storage system, called Venti, intended for archival data.

More information

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India

More information

Cumulus: Filesystem Backup to the Cloud

Cumulus: Filesystem Backup to the Cloud Cumulus: Filesystem Backup to the Cloud Michael Vrable, Stefan Savage, and Geoffrey M. Voelker Department of Computer Science and Engineering University of California, San Diego Abstract In this paper

More information

Frequency Based Chunking for Data De-Duplication

Frequency Based Chunking for Data De-Duplication Frequency Based Chunking for Data De-Duplication Guanlin Lu, Yu Jin, and David H.C. Du Department of Computer Science and Engineering University of Minnesota, Twin-Cities Minneapolis, Minnesota, USA (lv,

More information

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department

More information

CHAPTER 17: File Management

CHAPTER 17: File Management CHAPTER 17: File Management The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides

More information

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper WHITE PAPER DATA DEDUPLICATION BACKGROUND: A Technical White Paper CONTENTS Data Deduplication Multiple Data Sets from a Common Storage Pool.......................3 Fixed-Length Blocks vs. Variable-Length

More information

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007

Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007 Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication February 2007 Though data reduction technologies have been around for years, there is a renewed

More information

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:

More information

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Advanced Storage Products Group Table of Contents 1 - Introduction 2 Data Deduplication 3

More information

Data Deduplication Background: A Technical White Paper

Data Deduplication Background: A Technical White Paper Data Deduplication Background: A Technical White Paper NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice

More information

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng and Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong,

More information

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges September 2011 Table of Contents The Enterprise and Mobile Storage Landscapes... 3 Increased

More information

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Benjamin Zhu Data Domain, Inc. Kai Li Data Domain, Inc. and Princeton University Hugo Patterson Data Domain, Inc. Abstract Disk-based

More information

File-System Implementation

File-System Implementation File-System Implementation 11 CHAPTER In this chapter we discuss various methods for storing information on secondary storage. The basic issues are device directory, free space management, and space allocation

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

3Gen Data Deduplication Technical

3Gen Data Deduplication Technical 3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and

More information

How To Make A Backup System More Efficient

How To Make A Backup System More Efficient Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem October, 2006 Introduction Data de-duplication has recently gained significant industry attention,

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information

A SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA

A SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE A SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA M.Rajashekar Reddy 1, B.Ramya 2 1 M.Tech Student, Dept of

More information

Recovery Protocols For Flash File Systems

Recovery Protocols For Flash File Systems Recovery Protocols For Flash File Systems Ravi Tandon and Gautam Barua Indian Institute of Technology Guwahati, Department of Computer Science and Engineering, Guwahati - 781039, Assam, India {r.tandon}@alumni.iitg.ernet.in

More information

NEXT-GENERATION STORAGE EFFICIENCY WITH EMC ISILON SMARTDEDUPE

NEXT-GENERATION STORAGE EFFICIENCY WITH EMC ISILON SMARTDEDUPE White Paper NEXT-GENERATION STORAGE EFFICIENCY WITH EMC ISILON SMARTDEDUPE Abstract Most file systems are a thin layer of organization on top of a block device and cannot efficiently address data on a

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

Cumulus: Filesystem Backup to the Cloud

Cumulus: Filesystem Backup to the Cloud Cumulus: Filesystem Backup to the Cloud MICHAEL VRABLE, STEFAN SAVAGE, and GEOFFREY M. VOELKER University of California, San Diego Cumulus is a system for efficiently implementing filesystem backups over

More information

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines

Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra

More information

Backup and Recovery 1

Backup and Recovery 1 Backup and Recovery What is a Backup? Backup is an additional copy of data that can be used for restore and recovery purposes. The Backup copy is used when the primary copy is lost or corrupted. This Backup

More information

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory

More information

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM

STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM Albert M. K. Cheng, Shaohong Fang Department of Computer Science University of Houston Houston, TX, 77204, USA http://www.cs.uh.edu

More information

Analysis of Disk Access Patterns on File Systems for Content Addressable Storage

Analysis of Disk Access Patterns on File Systems for Content Addressable Storage Analysis of Disk Access Patterns on File Systems for Content Addressable Storage Kuniyasu Suzaki, Kengo Iijima, Toshiki Yagi, and Cyrille Artho National Institute of Advanced Industrial Science and Technology

More information

Demystifying Deduplication for Backup with the Dell DR4000

Demystifying Deduplication for Backup with the Dell DR4000 Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett

More information

Key Considerations for Managing Big Data in the Life Science Industry

Key Considerations for Managing Big Data in the Life Science Industry Key Considerations for Managing Big Data in the Life Science Industry The Big Data Bottleneck In Life Science Faster, cheaper technology outpacing Moore s law Lower costs and increasing speeds leading

More information

Data Deduplication HTBackup

Data Deduplication HTBackup Data Deduplication HTBackup HTBackup and it s Deduplication technology is touted as one of the best ways to manage today's explosive data growth. If you're new to the technology, these key facts will help

More information

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,

More information

A Method of Deduplication for Data Remote Backup

A Method of Deduplication for Data Remote Backup A Method of Deduplication for Data Remote Backup Jingyu Liu 1,2, Yu-an Tan 1, Yuanzhang Li 1, Xuelan Zhang 1, and Zexiang Zhou 3 1 School of Computer Science and Technology, Beijing Institute of Technology,

More information

Wide-area Network Acceleration for the Developing World. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

Wide-area Network Acceleration for the Developing World. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton) Wide-area Network Acceleration for the Developing World Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton) POOR INTERNET ACCESS IN THE DEVELOPING WORLD Internet access is a scarce

More information

ESG REPORT. Data Deduplication Diversity: Evaluating Software- vs. Hardware-Based Approaches. By Lauren Whitehouse. April, 2009

ESG REPORT. Data Deduplication Diversity: Evaluating Software- vs. Hardware-Based Approaches. By Lauren Whitehouse. April, 2009 ESG REPORT : Evaluating Software- vs. Hardware-Based Approaches By Lauren Whitehouse April, 2009 Table of Contents ESG REPORT Table of Contents... i Introduction... 1 External Forces Contribute to IT Challenges...

More information

Inline Deduplication

Inline Deduplication Inline Deduplication binarywarriors5@gmail.com 1.1 Inline Vs Post-process Deduplication In target based deduplication, the deduplication engine can either process data for duplicates in real time (i.e.

More information

An Oracle White Paper December 2013. Advanced Network Compression

An Oracle White Paper December 2013. Advanced Network Compression An Oracle White Paper December 2013 Advanced Network Compression Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Understanding Data Locality in VMware Virtual SAN

Understanding Data Locality in VMware Virtual SAN Understanding Data Locality in VMware Virtual SAN July 2014 Edition T E C H N I C A L M A R K E T I N G D O C U M E N T A T I O N Table of Contents Introduction... 2 Virtual SAN Design Goals... 3 Data

More information

Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality

Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble HP Labs UC Santa Cruz HP

More information

PRUN : Eliminating Information Redundancy for Large Scale Data Backup System

PRUN : Eliminating Information Redundancy for Large Scale Data Backup System PRUN : Eliminating Information Redundancy for Large Scale Data Backup System Youjip Won 1 Rakie Kim 1 Jongmyeong Ban 1 Jungpil Hur 2 Sangkyu Oh 2 Jangsun Lee 2 1 Department of Electronics and Computer

More information

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective

ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems Finding a needle in Haystack: Facebook

More information

Constrained Clustering of Territories in the Context of Car Insurance

Constrained Clustering of Territories in the Context of Car Insurance Constrained Clustering of Territories in the Context of Car Insurance Samuel Perreault Jean-Philippe Le Cavalier Laval University July 2014 Perreault & Le Cavalier (ULaval) Constrained Clustering July

More information

EMC EXAM - E20-598. Backup and Recovery - Avamar Specialist Exam for Storage Administrators. Buy Full Product. http://www.examskey.com/e20-598.

EMC EXAM - E20-598. Backup and Recovery - Avamar Specialist Exam for Storage Administrators. Buy Full Product. http://www.examskey.com/e20-598. EMC EXAM - E20-598 Backup and Recovery - Avamar Specialist Exam for Storage Administrators Buy Full Product http://www.examskey.com/e20-598.html Examskey EMC E20-598 exam demo product is here for you to

More information

Target Deduplication Metrics and Risk Analysis Using Post Processing Methods

Target Deduplication Metrics and Risk Analysis Using Post Processing Methods Target Deduplication Metrics and Risk Analysis Using Post Processing Methods Gayathri.R 1, 1 Dr. Malathi.A 2 1 Assistant Professor, 2 Assistant Professor 1 School of IT and Science, 2 PG and Research Department

More information

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy, and Hong Tang University of California at Santa Barbara, Alibaba Inc. Abstract In a virtualized

More information

Berkeley Ninja Architecture

Berkeley Ninja Architecture Berkeley Ninja Architecture ACID vs BASE 1.Strong Consistency 2. Availability not considered 3. Conservative 1. Weak consistency 2. Availability is a primary design element 3. Aggressive --> Traditional

More information

Efficient Cooperative Backup with Decentralized Trust Management

Efficient Cooperative Backup with Decentralized Trust Management 8 Efficient Cooperative Backup with Decentralized Trust Management NGUYEN TRAN, FRANK CHIANG, and JINYANG LI, New York University Existing backup systems are unsatisfactory: commercial backup services

More information