Efficient File Storage Using Content-based Indexing
|
|
- Antony Shepherd
- 8 years ago
- Views:
Transcription
1 Efficient File Storage Using Content-based Indexing João Barreto Paulo Ferreira Distributed Systems Group - INESC-ID Lisbon Technical University of Lisbon
2 Why Using Content-based Indexing for File Storage? Extending content-based indexing (e.g. as used by LBFS [MCM01]) from network transference of file contents to the context of local file storage is a natural step. Particularly interesting as a storage-efficient support for: Versioning file systems Resource-constrained embedded file systems However, access performance must be acceptable and storage gains significative. 2
3 Existing Solutions: Chunk Repository Storage Model To some extent, all existing file storage architectures that are based on content-based indexing share a core storage model [CN02, QD02, BF04]. File contents are divided into disjoint chunks of data, each individually stored with a unique hash of its contents in a repository of chunks. The actual files are then stored as sequences of possibly shared references to chunks in the repository. file 1 file 2 file 3 Example using Chunk Repository Model: r 5 r 2 r k r 3 r n r 3 r n r 4 r 1 Legend c i chunk contents h i chunk hash r i chunk reference Chunk Repository h 1 c 1 h 2 c 2 h 3 c 3 h 14 c 14 h 5 c 5 h 1k c 1k h n c n ` ` 3
4 Problems of Chunk Repository Storage Model Higher storage penalties as chunk size is decreased: 1. Increased chunk meta-data overhead (mainly with chunk hashes); 2. Increased internal fragmentation, if the chunk repository is stored on a block-based device; 3. Lower chunk compression ratios are achieved. Trade-off restricts the choice of the expected chunk size to relatively high values, hence existing solutions do not fully exploit the similarity that may exist in a file system. Sequential read performance is penalized even if the file being accessed does not share any chunk with other files Since chunks are stored in a randomly organized repository. 4
5 The Proposed Storage Model Two distinguishing principles: 1. If a file shares no chunks with the remaining file system, it should be stored in a plain form. Succeeding files that share any chunks with that file will reference those portions of its plain contents 2. Hashes of chunks are not stored permanently along with contents of the file system Content similarity detection of new file system data requires indexing whole contents of the file system Performed periodically in background file 1 file 2 file 3 Previous Example using the Proposed Model: c 5 c 2 c k c 3 r 3 r n c 4 c n c 1 Legend c i chunk contents r i chunk reference 5
6 Chunk Coalescing Optimises cases where consecutive pointers to contiguous shared chunks are detected. Such pointers are replaced by a simple multiple-chunk pointer, thus reducing the storage overhead and allowing faster access performance. In practice, this is comparable (though not always equivalent) to considering a higher chunk size whenever resorting to a lower size would yield no additional similarity gains. Example of Chunk Coalescing file r 3 file c 3 r n c 4 c c n (before chunk coalescing) file r c 3 3,n 4 c 1 6 (after chunk coalescing)
7 Advantages In case of no similarity, no storage overhead is imposed and read access performance is identical to that of a regular file system. Storage penalties that resulted from dividing files into smaller chunks are eliminated: In case of no sharing of a chunk, no chunk storage overhead; In case of sharing, storage overhead is negligible and always compensated by the gains resulting from the increased chunk sharing: Hash values are not stored along with contents; Update coalescing optimises chunk reference overhead; In case of an underlying block-based file system, internal fragmentation is, on average, not affected; Data compression may be applied to the unshared portions of each file as a whole, thus achieving higher compression ratios than to individual smaller chunks, ; 7
8 Current Status Partially functional in a simulator. Currently being implemented as a Linux Virtual File System. References [BF04] J. Barreto and P. Ferreira. A replicated file system for resource constrained mobile devices. In Proceedings of IADIS International Conference on Applied Computing, [CN02] L. Cox and B. Noble. Pastiche: Making backup cheap and easy. In Proceedings of the Fifth ACM/USENIX Symposium on Operating Systems Design and Implementation, Boston, MA, December [MCM01] Athicha Muthitacharoen, Benjie Chen, and David Mazieres. A low-bandwidth network file system. In Symposium on Operating Systems Principles, pages , [QD02] S. Quinlan and S. Dorward. Venti: a new approach to archival storage. In First USENIX conference on File and Storage Technologies, Monterey,CA,
De-duplication-based Archival Storage System
De-duplication-based Archival Storage System Than Than Sint Abstract This paper presents the disk-based backup system in which only relational database files are stored by using data deduplication technology.
More informationMulti-level Metadata Management Scheme for Cloud Storage System
, pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1
More informationThe assignment of chunk size according to the target data characteristics in deduplication backup system
The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,
More informationA Highly Available Replicated File System for Resource-Constrained Windows CE.Net Devices 1
A Highly Available Replicated File System for Resource-Constrained Windows CE.Net Devices 1 João Barreto 2 and Paulo Ferreira INESC-ID/IST Rua Alves Redol N.º 9 1000-029 Lisboa, Portugal {joao.barreto,
More informationByte-index Chunking Algorithm for Data Deduplication System
, pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko
More informationDEXT3: Block Level Inline Deduplication for EXT3 File System
DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve
More informationCumulus: filesystem backup to the Cloud
Michael Vrable, Stefan Savage, a n d G e o f f r e y M. V o e l k e r Cumulus: filesystem backup to the Cloud Michael Vrable is pursuing a Ph.D. in computer science at the University of California, San
More informationRead Performance Enhancement In Data Deduplication For Secondary Storage
Read Performance Enhancement In Data Deduplication For Secondary Storage A THESIS SUBMITTED TO THE FACULTY OF THE GRADUATE SCHOOL OF THE UNIVERSITY OF MINNESOTA BY Pradeep Ganesan IN PARTIAL FULFILLMENT
More informationAuthorized data deduplication check in hybrid cloud With Cluster as a Service
Authorized data deduplication check in hybrid cloud With Cluster as a Service X.ALPHONSEINBARAJ PG Scholar, Department of Computer Science and Engineering, Anna University, Coimbatore. Abstract Data deduplication
More informationData Deduplication Scheme for Cloud Storage
26 Data Deduplication Scheme for Cloud Storage 1 Iuon-Chang Lin and 2 Po-Ching Chien Abstract Nowadays, the utilization of storage capacity becomes an important issue in cloud storage. In this paper, we
More informationEfficient Locally Trackable Deduplication in Replicated Systems
Efficient Locally Trackable Deduplication in Replicated Systems João Barreto and Paulo Ferreira Distributed Systems Group - INESC-ID/Technical University of Lisbon {joao.barreto,paulo.ferreira}@inesc-id.pt
More informationQuanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud
Quanqing XU Quanqing.Xu@nicta.com.au YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data
More information3 Taking Advantage of Diversity
The Phoenix Recovery System: Rebuilding from the ashes of an Internet catastrophe Flavio Junqueira Ranjita Bhagwan Keith Marzullo Stefan Savage Geoffrey M. Voelker Department of Computer Science and Engineering
More informationA Data De-duplication Access Framework for Solid State Drives
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 941-954 (2012) A Data De-duplication Access Framework for Solid State Drives Department of Electronic Engineering National Taiwan University of Science
More informationA Deduplication-based Data Archiving System
2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System
More informationEfficiently Storing Virtual Machine Backups
Efficiently Storing Virtual Machine Backups Stephen Smaldone, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract Physical level backups offer increased performance
More informationTwo-Level Metadata Management for Data Deduplication System
Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,
More informationDEDISbench: A Benchmark for Deduplicated Storage Systems
DEDISbench: A Benchmark for Deduplicated Storage Systems J. Paulo, P. Reis, J. Pereira and A. Sousa High-Assurance Software Lab (HASLab) INESC TEC & University of Minho Abstract. Deduplication is widely
More informationChapter 12 File Management
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Roadmap Overview File organisation and Access
More informationChapter 12 File Management. Roadmap
Operating Systems: Internals and Design Principles, 6/E William Stallings Chapter 12 File Management Dave Bremer Otago Polytechnic, N.Z. 2008, Prentice Hall Overview Roadmap File organisation and Access
More informationSide channels in cloud services, the case of deduplication in cloud storage
Side channels in cloud services, the case of deduplication in cloud storage Danny Harnik IBM Haifa Research Lab dannyh@il.ibm.com Benny Pinkas Bar Ilan University benny@pinkas.net Alexandra Shulman-Peleg
More informationAssessing Data Deduplication Trade-offs from an Energy and Performance Perspective
Assessing Data Deduplication Trade-offs from an Energy and Performance Perspective Lauro Beltrão Costa, Samer Al-Kiswany, Raquel Vigolvino Lopes and Matei Ripeanu Electrical and Computer Engineering Department
More informationEfficient and Safe Data Backup with Arrow
Efficient and Safe Data Backup with Arrow Technical Report UCSC-SSRC-8-2 June 28 Casey Marshall csm@soe.ucsc.edu Storage Systems Research Center Baskin School of Engineering University of California, Santa
More informationA Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of
More informationPhysical Data Organization
Physical Data Organization Database design using logical model of the database - appropriate level for users to focus on - user independence from implementation details Performance - other major factor
More informationHP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper
HP StoreOnce D2D Understanding the challenges associated with NetApp s deduplication Business white paper Table of contents Challenge #1: Primary deduplication: Understanding the tradeoffs...4 Not all
More informationA DHT-based Backup System
A DHT-based Backup System Emil Sit, Josh Cates, and Russ Cox MIT Laboratory for Computer Science 10 August 2003 1 Introduction Distributed hash tables have been proposed as a way to simplify the construction
More informationDeduplication Demystified: How to determine the right approach for your business
Deduplication Demystified: How to determine the right approach for your business Presented by Charles Keiper Senior Product Manager, Data Protection Quest Software Session Objective: To answer burning
More informationDeploying De-Duplication on Ext4 File System
Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College
More informationDuplicate Data Elimination in a SAN File System
Duplicate Data Elimination in a SAN File System Bo Hong Univ. of California, Santa Cruz hongbo@cs.ucsc.edu Darrell D.E. Long Univ. of California, Santa Cruz darrell@cs.ucsc.edu Demyn Plantenberg IBM Almaden
More informationABSTRACT 1 INTRODUCTION
DEDUPLICATION IN YAFFS Karthik Narayan {knarayan@cs.wisc.edu}, Pavithra Seshadri Vijayakrishnan{pavithra@cs.wisc.edu} Department of Computer Sciences, University of Wisconsin Madison ABSTRACT NAND flash
More informationPIONEER RESEARCH & DEVELOPMENT GROUP
SURVEY ON RAID Aishwarya Airen 1, Aarsh Pandit 2, Anshul Sogani 3 1,2,3 A.I.T.R, Indore. Abstract RAID stands for Redundant Array of Independent Disk that is a concept which provides an efficient way for
More informationHTTP-Level Deduplication with HTML5
HTTP-Level Deduplication with HTML5 Franziska Roesner and Ivayla Dermendjieva Networks Class Project, Spring 2010 Abstract In this project, we examine HTTP-level duplication. We first report on our initial
More informationDeltaStor Data Deduplication: A Technical Review
White Paper DeltaStor Data Deduplication: A Technical Review DeltaStor software is a next-generation data deduplication application for the SEPATON S2100 -ES2 virtual tape library that enables enterprises
More informationImprovement of Network Optimization and Cost Reduction in End To End Process Implementing in Clouds
Improvement of Network Optimization and Cost Reduction in End To End Process Implementing in Clouds A. Sree Valli 1, R. Chandrasekhar 2 PG Scholar, Department of C.S.E, KIET College, JNTUK A.P 1 Assistant
More informationFile Management Chapters 10, 11, 12
File Management Chapters 10, 11, 12 Requirements For long-term storage: possible to store large amount of info. info must survive termination of processes multiple processes must be able to access concurrently
More informationData De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication
Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Table of Contents Introduction... 3 Shortest Possible Backup Window... 3 Instant
More informationData Deduplication in BitTorrent
Data Deduplication in BitTorrent João Pedro Amaral Nunes October 14, 213 Abstract BitTorrent is the most used P2P file sharing platform today, with hundreds of millions of files shared. The system works
More informationOnline De-duplication in a Log-Structured File System for Primary Storage
Online De-duplication in a Log-Structured File System for Primary Storage Technical Report UCSC-SSRC-11-03 May 2011 Stephanie N. Jones snjones@cs.ucsc.edu Storage Systems Research Center Baskin School
More informationFLASH IMPLICATIONS IN ENTERPRISE STORAGE ARRAY DESIGNS
FLASH IMPLICATIONS IN ENTERPRISE STORAGE ARRAY DESIGNS ABSTRACT This white paper examines some common practices in enterprise storage array design and their resulting trade-offs and limitations. The goal
More informationEfficient Deduplication in Disk- and RAM-based Data Storage Systems
Efficient Deduplication in Disk- and RAM-based Data Storage Systems Andrej Tolič and Andrej Brodnik University of Ljubljana, Faculty of Computer and Information Science, Slovenia {andrej.tolic,andrej.brodnik}@fri.uni-lj.si
More informationReliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity
Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity Youngjin Nam School of Computer and Information Technology Daegu University Gyeongsan, Gyeongbuk, KOREA 712-714
More informationExtreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup
Extreme Binning: Scalable, Parallel Deduplication for Chunk-based File Backup Deepavali Bhagwat University of California 1156 High Street Santa Cruz, CA 9564 dbhagwat@soe.ucsc.edu Kave Eshghi Hewlett-Packard
More informationA Novel Approach for Calculation Based Cloud Band Width and Cost Diminution Method
A Novel Approach for Calculation Based Cloud Band Width and Cost Diminution Method Radhika Chowdary G PG Scholar, M.Lavanya Assistant professor, P.Satish Reddy HOD, Abstract: In this paper, we present
More informationOPTIMIZING VIRTUAL TAPE PERFORMANCE: IMPROVING EFFICIENCY WITH DISK STORAGE SYSTEMS
W H I T E P A P E R OPTIMIZING VIRTUAL TAPE PERFORMANCE: IMPROVING EFFICIENCY WITH DISK STORAGE SYSTEMS By: David J. Cuddihy Principal Engineer Embedded Software Group June, 2007 155 CrossPoint Parkway
More informationData Backup and Archiving with Enterprise Storage Systems
Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia slavjan_ivanov@yahoo.com,
More informationWHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression
WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
More informationFossil an archival file server
Fossil an archival file server Russ Cox rsc@mit.edu PDOS Group Meeting January 7, 2003 http://pdos/~rsc/talks History... Cached WORM file server (Quinlan and Thompson): active file system on magnetic disk
More informationPrediction System for Reducing the Cloud Bandwidth and Cost
ISSN (e): 2250 3005 Vol, 04 Issue, 8 August 2014 International Journal of Computational Engineering Research (IJCER) Prediction System for Reducing the Cloud Bandwidth and Cost 1 G Bhuvaneswari, 2 Mr.
More informationAvailability Digest. www.availabilitydigest.com. Data Deduplication February 2011
the Availability Digest Data Deduplication February 2011 What is Data Deduplication? Data deduplication is a technology that can reduce disk storage-capacity requirements and replication bandwidth requirements
More informationVenti: a new approach to archival storage
Venti: a new approach to archival storage Sean Quinlan and Sean Dorward Bell Labs, Lucent Technologies Abstract This paper describes a network storage system, called Venti, intended for archival data.
More informationA Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose
A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India
More informationCumulus: Filesystem Backup to the Cloud
Cumulus: Filesystem Backup to the Cloud Michael Vrable, Stefan Savage, and Geoffrey M. Voelker Department of Computer Science and Engineering University of California, San Diego Abstract In this paper
More informationFrequency Based Chunking for Data De-Duplication
Frequency Based Chunking for Data De-Duplication Guanlin Lu, Yu Jin, and David H.C. Du Department of Computer Science and Engineering University of Minnesota, Twin-Cities Minneapolis, Minnesota, USA (lv,
More informationIMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS
IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department
More informationCHAPTER 17: File Management
CHAPTER 17: File Management The Architecture of Computer Hardware, Systems Software & Networking: An Information Technology Approach 4th Edition, Irv Englander John Wiley and Sons 2010 PowerPoint slides
More informationWHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper
WHITE PAPER DATA DEDUPLICATION BACKGROUND: A Technical White Paper CONTENTS Data Deduplication Multiple Data Sets from a Common Storage Pool.......................3 Fixed-Length Blocks vs. Variable-Length
More informationData Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication. February 2007
Data Reduction Methodologies: Comparing ExaGrid s Byte-Level-Delta Data Reduction to Data De-duplication February 2007 Though data reduction technologies have been around for years, there is a renewed
More informationWHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation
WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:
More informationIdentifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem
Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Advanced Storage Products Group Table of Contents 1 - Introduction 2 Data Deduplication 3
More informationData Deduplication Background: A Technical White Paper
Data Deduplication Background: A Technical White Paper NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice
More informationRevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups
RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng and Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong,
More informationReference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges
Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges September 2011 Table of Contents The Enterprise and Mobile Storage Landscapes... 3 Increased
More informationAvoiding the Disk Bottleneck in the Data Domain Deduplication File System
Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Benjamin Zhu Data Domain, Inc. Kai Li Data Domain, Inc. and Princeton University Hugo Patterson Data Domain, Inc. Abstract Disk-based
More informationFile-System Implementation
File-System Implementation 11 CHAPTER In this chapter we discuss various methods for storing information on secondary storage. The basic issues are device directory, free space management, and space allocation
More informationSpeeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,
More information3Gen Data Deduplication Technical
3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and
More informationHow To Make A Backup System More Efficient
Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem October, 2006 Introduction Data de-duplication has recently gained significant industry attention,
More informationData Deduplication and Tivoli Storage Manager
Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements
More informationA SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA
INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE A SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA M.Rajashekar Reddy 1, B.Ramya 2 1 M.Tech Student, Dept of
More informationRecovery Protocols For Flash File Systems
Recovery Protocols For Flash File Systems Ravi Tandon and Gautam Barua Indian Institute of Technology Guwahati, Department of Computer Science and Engineering, Guwahati - 781039, Assam, India {r.tandon}@alumni.iitg.ernet.in
More informationNEXT-GENERATION STORAGE EFFICIENCY WITH EMC ISILON SMARTDEDUPE
White Paper NEXT-GENERATION STORAGE EFFICIENCY WITH EMC ISILON SMARTDEDUPE Abstract Most file systems are a thin layer of organization on top of a block device and cannot efficiently address data on a
More informationSnapshots in Hadoop Distributed File System
Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any
More informationCumulus: Filesystem Backup to the Cloud
Cumulus: Filesystem Backup to the Cloud MICHAEL VRABLE, STEFAN SAVAGE, and GEOFFREY M. VOELKER University of California, San Diego Cumulus is a system for efficiently implementing filesystem backups over
More informationReconfigurable Architecture Requirements for Co-Designed Virtual Machines
Reconfigurable Architecture Requirements for Co-Designed Virtual Machines Kenneth B. Kent University of New Brunswick Faculty of Computer Science Fredericton, New Brunswick, Canada ken@unb.ca Micaela Serra
More informationBackup and Recovery 1
Backup and Recovery What is a Backup? Backup is an additional copy of data that can be used for restore and recovery purposes. The Backup copy is used when the primary copy is lost or corrupted. This Backup
More informationFAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory
CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory
More informationSTUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM
STUDY AND SIMULATION OF A DISTRIBUTED REAL-TIME FAULT-TOLERANCE WEB MONITORING SYSTEM Albert M. K. Cheng, Shaohong Fang Department of Computer Science University of Houston Houston, TX, 77204, USA http://www.cs.uh.edu
More informationAnalysis of Disk Access Patterns on File Systems for Content Addressable Storage
Analysis of Disk Access Patterns on File Systems for Content Addressable Storage Kuniyasu Suzaki, Kengo Iijima, Toshiki Yagi, and Cyrille Artho National Institute of Advanced Industrial Science and Technology
More informationDemystifying Deduplication for Backup with the Dell DR4000
Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett
More informationKey Considerations for Managing Big Data in the Life Science Industry
Key Considerations for Managing Big Data in the Life Science Industry The Big Data Bottleneck In Life Science Faster, cheaper technology outpacing Moore s law Lower costs and increasing speeds leading
More informationData Deduplication HTBackup
Data Deduplication HTBackup HTBackup and it s Deduplication technology is touted as one of the best ways to manage today's explosive data growth. If you're new to the technology, these key facts will help
More informationMAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services
MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,
More informationA Method of Deduplication for Data Remote Backup
A Method of Deduplication for Data Remote Backup Jingyu Liu 1,2, Yu-an Tan 1, Yuanzhang Li 1, Xuelan Zhang 1, and Zexiang Zhou 3 1 School of Computer Science and Technology, Beijing Institute of Technology,
More informationWide-area Network Acceleration for the Developing World. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)
Wide-area Network Acceleration for the Developing World Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton) POOR INTERNET ACCESS IN THE DEVELOPING WORLD Internet access is a scarce
More informationESG REPORT. Data Deduplication Diversity: Evaluating Software- vs. Hardware-Based Approaches. By Lauren Whitehouse. April, 2009
ESG REPORT : Evaluating Software- vs. Hardware-Based Approaches By Lauren Whitehouse April, 2009 Table of Contents ESG REPORT Table of Contents... i Introduction... 1 External Forces Contribute to IT Challenges...
More informationInline Deduplication
Inline Deduplication binarywarriors5@gmail.com 1.1 Inline Vs Post-process Deduplication In target based deduplication, the deduplication engine can either process data for duplicates in real time (i.e.
More informationAn Oracle White Paper December 2013. Advanced Network Compression
An Oracle White Paper December 2013 Advanced Network Compression Disclaimer The following is intended to outline our general product direction. It is intended for information purposes only, and may not
More informationBENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
More informationUnderstanding Data Locality in VMware Virtual SAN
Understanding Data Locality in VMware Virtual SAN July 2014 Edition T E C H N I C A L M A R K E T I N G D O C U M E N T A T I O N Table of Contents Introduction... 2 Virtual SAN Design Goals... 3 Data
More informationSparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality
Sparse Indexing: Large Scale, Inline Deduplication Using Sampling and Locality Mark Lillibridge, Kave Eshghi, Deepavali Bhagwat, Vinay Deolalikar, Greg Trezise, and Peter Camble HP Labs UC Santa Cruz HP
More informationPRUN : Eliminating Information Redundancy for Large Scale Data Backup System
PRUN : Eliminating Information Redundancy for Large Scale Data Backup System Youjip Won 1 Rakie Kim 1 Jongmyeong Ban 1 Jungpil Hur 2 Sangkyu Oh 2 Jangsun Lee 2 1 Department of Electronics and Computer
More informationECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective
ECE 7650 Scalable and Secure Internet Services and Architecture ---- A Systems Perspective Part II: Data Center Software Architecture: Topic 1: Distributed File Systems Finding a needle in Haystack: Facebook
More informationConstrained Clustering of Territories in the Context of Car Insurance
Constrained Clustering of Territories in the Context of Car Insurance Samuel Perreault Jean-Philippe Le Cavalier Laval University July 2014 Perreault & Le Cavalier (ULaval) Constrained Clustering July
More informationEMC EXAM - E20-598. Backup and Recovery - Avamar Specialist Exam for Storage Administrators. Buy Full Product. http://www.examskey.com/e20-598.
EMC EXAM - E20-598 Backup and Recovery - Avamar Specialist Exam for Storage Administrators Buy Full Product http://www.examskey.com/e20-598.html Examskey EMC E20-598 exam demo product is here for you to
More informationTarget Deduplication Metrics and Risk Analysis Using Post Processing Methods
Target Deduplication Metrics and Risk Analysis Using Post Processing Methods Gayathri.R 1, 1 Dr. Malathi.A 2 1 Assistant Professor, 2 Assistant Professor 1 School of IT and Science, 2 PG and Research Department
More informationLow-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage
Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy, and Hong Tang University of California at Santa Barbara, Alibaba Inc. Abstract In a virtualized
More informationBerkeley Ninja Architecture
Berkeley Ninja Architecture ACID vs BASE 1.Strong Consistency 2. Availability not considered 3. Conservative 1. Weak consistency 2. Availability is a primary design element 3. Aggressive --> Traditional
More informationEfficient Cooperative Backup with Decentralized Trust Management
8 Efficient Cooperative Backup with Decentralized Trust Management NGUYEN TRAN, FRANK CHIANG, and JINYANG LI, New York University Existing backup systems are unsatisfactory: commercial backup services
More information