Quanqing XU YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud

Save this PDF as:
 WORD  PNG  TXT  JPG

Size: px
Start display at page:

Download "Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud"

Transcription

1 Quanqing XU YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud

2 Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data De-duplication and Data Transmission Metadata Server Communication with Clients, Global Fingerprint Lookup and Store, and Highly Scalable Cluster of Metadata Servers Demo Preliminary experimental results Development status 2

3 Motivation Yuruware needs incremental backup in the cloud Cloud storage providers High reliability and scalability at low cost Ultra large-scale storage space 905 billion objects in Amazon S3, Q1/2012 Customers Backup and restore progressive data within short time Backup up to petabytes of data in total To build a large-scale cloud backup system System scalability Storage efficiency Backup and restoration performance NICTA Copyright [1]

4 The Architecture of YuruBackup To increase scalability to accommodate PB-scale data To improve space efficiency to reduce costs To save bandwidth to adapt to the low bandwidth of WAN Metadata of PB-scale data Backup Agent Write master Source-side De-duplication PB-scale space A cluster of metadata servers Target-side De-duplication slave Metadata Agent slave Snapshots Cloud Storage Read Read RPC, parallel transmission, data/metadata separation 4

5 Storage Hierarchy Snapshot A virtual file Collection Block Chunk Snapshot A Snapshot B Collection Block Chunk 5

6 Mapping blocks from memory to disk A block <collectionuuid, blockno, checksum, start, length> Components Memory Block, Block Proxy and TAR Store Memory Block... Memory Block... Memory Block In Memory Block Proxy TAR Store In Disk Collection Collection Collection 6

7 The Flow Chart of Backup Process Create DB connection to metadata catalog Initialize the TAR store T Initialize the Metadata Manager Scan a directory to get a file list The file list is empty? Yes Release the Metadata Manager Release the TAR store Close DB connection to metadata catalog No Remove a file and write its incremental backup into T T s size >= a given size? Yes Write T into disk and clear it No 7

8 Backup Client It provides a functional interface to users. Backup and restoration To reduce I/O requests Read/Write Buffer To locate items Compressed BF Berkeley DB Source-side dedup CD Chunking Transmission Batched RPC Parallel uploading 8

9 Source-side de-duplication Rabin s Fingerprinting Given a string A = a m a m-1 a 1 A k-bit Rabin fingerprint is computed as follows: m 1 m 2 Let, A( t) a t a t a t a m m 1 Choose an irreducible polynomial P(t) P k k 1 ( t) pkt pk 1t p0 Compute Rabin s fingerprint f(a) f ( A) A( t) mod P( t) Content-defined Chunking (SOSP 01) low_order(f, k) = c 2 1 C 1 C 2 C 3... [1] Muthitacharoen A, Chen B, Maziéres D. A low-bandwidth network file system. In: Proc. of the 18th ACM Symp. on Operating System Principles (SOSP 2001). New York: ACM Press, w 9

10 Duplication Detection based on Bloom filter Observations Most files are never changed after their creations (ATC 04) Over 2/3 of files have not been modified (FAST 07) Index Summary based on Compressed BF(ACM 70, PODC 01) Approximate set membership problem Trade-off between space and false positive probability Three functions 1) Initialize(initElementCount, desiredfpp) 2) Insert(fingerprint) 3) Lookup(fingerprint) [1] Burton H. Bloom. Space/time trade-o s in hash coding with allowable errors. ACM Communications, 13(7), [2] Mitzenmacher. Compressed Bloom Filters. In Twentieth ACM Symposium on Principles of Distributed Computing, August

11 Metadata Server Communication with Clients A single, batched and asynchronous lookup RPC for n FPs The callback function enqueues the updated request Global FP Lookup and Store Global Index Summary Global target-side deduplication FP Lookup FP Store 11

12 Highly Scalable Cluster of MDSs SQL Nodes with NDB YuruBackup Clients Load Balancer DataNodes Slaves Masters SQL Nodes with NDB+InnoDB Data replication To make reads scalable MySQL replication Failover Data partitioning To make writes scalable MySQL cluster Read Write Replication Load balancing To aware of which nodes are readable and writable 12

13 Demo of YuruBackup Chunk Partition Duplication Detection 13

14 An example of a snapshot (5 new blocks) B 1 B 3 B 5 B 7 B 12 14

15 An example of incremental backup emacs-23.2a emacs-23.3a 15

16 Comparison ReducedRatio = Datasets Hbase (97.5) ,462 4, Average (162.8) ,144 17, Nonoverlap data size (MB) # BytesSentByRsync - # BytesOfData - # BytesOfMetadata # BytesSentByRsync rsync Transferred data size (MB) Transferred data size (MB) Table 1. Dataset YuruBackup # chunks Data Metadata # old chunks # new chunks Emacs (155.9) ,731 11, Eclipse (234.9) , GCC (428.6) ,386 9, Hadoop-src (214.1) ,365 15, Hadoop-bin (110.5) , Lucene-src (64.8) , Lucene-bin (156.4) ,191 26, Hive-src (144.0) ,072 7, Hive-bin (21.7) , (%) 16

17 Others YuruBackup is deployed atop Amazon S3 metadata servers are running in EC2 will be deployed in other cloud platforms Performance evaluation De-duplication Efficiency De-duplication Overhead Scalability Backup Window Fine-granularity Restoration, etc. 17

18 Current Development Status Program directories (~12,000 LOC) include: header files, ~1,200 LOC src: source files, ~5,200 LOC 18

19 Thank you! Q&A

20 Dataset OverlapRatio = OverlapDataSize TransferredDataSize Emacs eclipse gcc Hadoopsrc Hadoopbin Objects # Files Data size (MB) 23.2a 4, a 4, galileo 2, Helios-SR2 2, , , , , # Overlap Files Overlap data size (MB) (%) (10.09) (0.21) 70, (74.86) 3, (56.56) (66.36) 20

21 Dataset lucenesrc Lucenebin Hive-src Hive-bin Objects # Files Data size (MB) , , , , , , hbase , , Linux shell: diff urnas v1 v2 # Overlap Files Overlap data size (MB) (%) 2, (73.58) (8.51) 3, (34.10) (73.88) 1, (50.81) Return 21

22 The rsync Algorithm f.old f.new A 2. A sends the checksums to B 4. B tells A how to construct file f.new from f.old and the literal data. B 1. A computes the checksum of each block S i in file f.old 3. B searches the file f.new and find the difference between f.old and f.new. The checksum consist of rolling 32-bit checksums (adler-32 checksum) and a 128-bit MD4 checksum. Return 22

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information

De-duplication-based Archival Storage System

De-duplication-based Archival Storage System De-duplication-based Archival Storage System Than Than Sint Abstract This paper presents the disk-based backup system in which only relational database files are stored by using data deduplication technology.

More information

A Deduplication-based Data Archiving System

A Deduplication-based Data Archiving System 2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract

More information

Apache Hadoop. Alexandru Costan

Apache Hadoop. Alexandru Costan 1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave annon Tivoli Storage Manager rchitect March 2009 Topics Tivoli Storage, IM Software Group Deduplication technology Data reduction and deduplication in

More information

Veeam Best Practices with Exablox

Veeam Best Practices with Exablox Veeam Best Practices with Exablox Overview Exablox has worked closely with the team at Veeam to provide the best recommendations when using the the Veeam Backup & Replication software with OneBlox appliances.

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve

More information

Cumulus: filesystem backup to the Cloud

Cumulus: filesystem backup to the Cloud Michael Vrable, Stefan Savage, a n d G e o f f r e y M. V o e l k e r Cumulus: filesystem backup to the Cloud Michael Vrable is pursuing a Ph.D. in computer science at the University of California, San

More information

Contents. WD Arkeia Page 2 of 14

Contents. WD Arkeia Page 2 of 14 Contents Contents...2 Executive Summary...3 What Is Data Deduplication?...4 Traditional Data Deduplication Strategies...5 Deduplication Challenges...5 Single-Instance Storage...5 Fixed-Block Deduplication...6

More information

A Data De-duplication Access Framework for Solid State Drives

A Data De-duplication Access Framework for Solid State Drives JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 941-954 (2012) A Data De-duplication Access Framework for Solid State Drives Department of Electronic Engineering National Taiwan University of Science

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,

More information

Creating a Cloud Backup Service. Deon George

Creating a Cloud Backup Service. Deon George Creating a Cloud Backup Service Deon George Agenda TSM Cloud Service features Cloud Service Customer, providing a internal backup service Internal Backup Cloud Service Service Provider, providing a backup

More information

NETAPP SYNCSORT INTEGRATED BACKUP. Technical Overview. Peter Eicher Syncsort Product Management

NETAPP SYNCSORT INTEGRATED BACKUP. Technical Overview. Peter Eicher Syncsort Product Management NETAPP SYNCSORT INTEGRATED BACKUP Technical Overview Peter Eicher Syncsort Product Management Current State of Data Protection Production Data Protection Storage Physical & VM App Servers Backup Servers

More information

Turnkey Deduplication Solution for the Enterprise

Turnkey Deduplication Solution for the Enterprise Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for

More information

Reducing Replication Bandwidth for Distributed Document Databases

Reducing Replication Bandwidth for Distributed Document Databases Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 #1 You can

More information

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,

More information

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India

More information

Open source Google-style large scale data analysis with Hadoop

Open source Google-style large scale data analysis with Hadoop Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: ikons@cslab.ece.ntua.gr Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical

More information

922-280 Avamar Backup and Data De-duplication Exam

922-280 Avamar Backup and Data De-duplication Exam 922-280 Avamar Backup and Data De-duplication Exam Q&A DEMO Version Copyright (c) 2007 Chinatag LLC. All rights reserved. Important Note Please Read Carefully For demonstration purpose only, this free

More information

Design and Evolution of the Apache Hadoop File System(HDFS)

Design and Evolution of the Apache Hadoop File System(HDFS) Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop

More information

Tradeoffs in Scalable Data Routing for Deduplication Clusters

Tradeoffs in Scalable Data Routing for Deduplication Clusters Tradeoffs in Scalable Data Routing for Deduplication Clusters Wei Dong Princeton University Fred Douglis EMC Kai Li Princeton University and EMC Hugo Patterson EMC Sazzala Reddy EMC Philip Shilane EMC

More information

Big data management with IBM General Parallel File System

Big data management with IBM General Parallel File System Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers

More information

Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos

Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos Symantec Research Labs Symantec FY 2013 (4/1/2012 to 3/31/2013) Revenue: $ 6.9 billion Segment Revenue Example Business

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture

More information

Berkeley Ninja Architecture

Berkeley Ninja Architecture Berkeley Ninja Architecture ACID vs BASE 1.Strong Consistency 2. Availability not considered 3. Conservative 1. Weak consistency 2. Availability is a primary design element 3. Aggressive --> Traditional

More information

Hybrid Cloud Storage System. Oh well, I will write the report on May1 st

Hybrid Cloud Storage System. Oh well, I will write the report on May1 st Project 2 Hybrid Cloud Storage System Project due on May 1 st (11.59 EST) Start early J : We have three graded milestones Milestone 1: demo part 1 by March 29 th Milestone 2: demo part 2 by April 12 th

More information

Apache Hadoop FileSystem and its Usage in Facebook

Apache Hadoop FileSystem and its Usage in Facebook Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System dhruba@apache.org Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs

More information

Leveraging Public Clouds to Ensure Data Availability

Leveraging Public Clouds to Ensure Data Availability Systems Engineering at MITRE CLOUD COMPUTING SERIES Leveraging Public Clouds to Ensure Data Availability Toby Cabot Lawrence Pizette The MITRE Corporation manages federally funded research and development

More information

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL

Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL 04.20111 High Availability with MySQL Higher Availability Shared nothing distributed cluster with MySQL Cluster Storage snapshots for disaster

More information

Hadoop & its Usage at Facebook

Hadoop & its Usage at Facebook Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System dhruba@apache.org Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction

More information

Distributed Block-level Storage Management for OpenStack

Distributed Block-level Storage Management for OpenStack Distributed Block-level Storage Management for OpenStack OpenStack APAC Conference Daniel Lee CCMA/ITRI Cloud Computing Center for Mobile Applications Industrial Technology Research Institute ( 雲 端 運 算

More information

Database Scalability {Patterns} / Robert Treat

Database Scalability {Patterns} / Robert Treat Database Scalability {Patterns} / Robert Treat robert treat omniti postgres oracle - mysql mssql - sqlite - nosql What are Database Scalability Patterns? Part Design Patterns Part Application Life-Cycle

More information

DISK IMAGE BACKUP. For Physical Servers. VEMBU TECHNOLOGIES www.vembu.com TRUSTED BY OVER 25,000 BUSINESSES

DISK IMAGE BACKUP. For Physical Servers. VEMBU TECHNOLOGIES www.vembu.com TRUSTED BY OVER 25,000 BUSINESSES DISK IMAGE BACKUP For Physical Servers VEMBU TECHNOLOGIES www.vembu.com Copyright Information Information in this document is subject to change without notice. The entire risk of the use or the results

More information

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com

Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee dhruba@apache.org dhruba@facebook.com Hadoop, Why? Need to process huge datasets on large clusters of computers

More information

Remote Accounting Solutions, Inc.

Remote Accounting Solutions, Inc. Remote Accounting Solutions, Inc. Remote Accounting Solutions uses a technique to perform efficient file transfers and directory synchronization within the context of the Remote Accounting Solutions (RAS)

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Bigdata High Availability (HA) Architecture

Bigdata High Availability (HA) Architecture Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources

More information

Optimize VMware and Hyper-V Protection with HP and Veeam

Optimize VMware and Hyper-V Protection with HP and Veeam Optimize VMware and Hyper-V Protection with HP and Veeam John DeFrees, Global Alliance Solution Architect, Veeam Markus Berber, HP LeftHand P4000 Product Marketing Manager, HP Key takeaways from today

More information

HTTP-Level Deduplication with HTML5

HTTP-Level Deduplication with HTML5 HTTP-Level Deduplication with HTML5 Franziska Roesner and Ivayla Dermendjieva Networks Class Project, Spring 2010 Abstract In this project, we examine HTTP-level duplication. We first report on our initial

More information

Turbo Charge Your Data Protection Strategy

Turbo Charge Your Data Protection Strategy Turbo Charge Your Data Protection Strategy Data protection for the hybrid cloud 1 WAVES OF CHANGE! Data GROWTH User EXPECTATIONS Do It YOURSELF Can t Keep Up Reliability and Visibility New Choices and

More information

Appendix A Core Concepts in SQL Server High Availability and Replication

Appendix A Core Concepts in SQL Server High Availability and Replication Appendix A Core Concepts in SQL Server High Availability and Replication Appendix Overview Core Concepts in High Availability Core Concepts in Replication 1 Lesson 1: Core Concepts in High Availability

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:

More information

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges September 2011 Table of Contents The Enterprise and Mobile Storage Landscapes... 3 Increased

More information

SEP Software. About SEP. Key Features ONE BACKUP & DISASTER RECOVERY SOLUTION FOR THE ENTIRE ENTERPRISE

SEP Software. About SEP. Key Features ONE BACKUP & DISASTER RECOVERY SOLUTION FOR THE ENTIRE ENTERPRISE SEP Software ONE BACKUP & DISASTER RECOVERY SOLUTION FOR THE ENTIRE ENTERPRISE About SEP SEP is an enterprise backup and disaster recovery solution for today s discerning computer environments. Supporting

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features that reduce the complexity of managing data protection

More information

Egnyte Local Cloud Architecture. White Paper

Egnyte Local Cloud Architecture. White Paper w w w. e g n y t e. c o m Egnyte Local Cloud Architecture White Paper Revised June 21, 2012 Table of Contents Egnyte Local Cloud Introduction page 2 Scalable Solutions Personal Local Cloud page 3 Office

More information

The assignment of chunk size according to the target data characteristics in deduplication backup system

The assignment of chunk size according to the target data characteristics in deduplication backup system The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,

More information

bup: the git-based backup system Avery Pennarun

bup: the git-based backup system Avery Pennarun bup: the git-based backup system Avery Pennarun 2010 10 25 The Challenge Back up entire filesystems (> 1TB) Including huge VM disk images (files >100GB) Lots of separate files (500k or more) Calculate/store

More information

BALANCING FOR DISTRIBUTED BACKUP

BALANCING FOR DISTRIBUTED BACKUP CONTENT-AWARE LOAD BALANCING FOR DISTRIBUTED BACKUP Fred Douglis 1, Deepti Bhardwaj 1, Hangwei Qian 2, and Philip Shilane 1 1 EMC 2 Case Western Reserve University 1 Starting Point Deduplicating disk-based

More information

Security Ensured Redundant Data Management under Cloud Environment

Security Ensured Redundant Data Management under Cloud Environment Security Ensured Redundant Data Management under Cloud Environment K. Malathi 1 M. Saratha 2 1 PG Scholar, Dept. of CSE, Vivekanandha College of Technology for Women, Namakkal. 2 Assistant Professor, Dept.

More information

Module 14: Scalability and High Availability

Module 14: Scalability and High Availability Module 14: Scalability and High Availability Overview Key high availability features available in Oracle and SQL Server Key scalability features available in Oracle and SQL Server High Availability High

More information

Trends in Enterprise Backup Deduplication

Trends in Enterprise Backup Deduplication Trends in Enterprise Backup Deduplication Shankar Balasubramanian Architect, EMC 1 Outline Protection Storage Deduplication Basics CPU-centric Deduplication: SISL (Stream-Informed Segment Layout) Data

More information

A programming model in Cloud: MapReduce

A programming model in Cloud: MapReduce A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value

More information

Prepared By : Manoj Kumar Joshi & Vikas Sawhney

Prepared By : Manoj Kumar Joshi & Vikas Sawhney Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks

More information

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn)

Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn) + Hadoop-based Open Source ediscovery: FreeEed (Easy as popcorn) + Hello! 2 Sujee Maniyam & Mark Kerzner Founders @ Elephant Scale consulting and training around Hadoop, Big Data technologies Enterprise

More information

Deduplication Demystified: How to determine the right approach for your business

Deduplication Demystified: How to determine the right approach for your business Deduplication Demystified: How to determine the right approach for your business Presented by Charles Keiper Senior Product Manager, Data Protection Quest Software Session Objective: To answer burning

More information

LDA, the new family of Lortu Data Appliances

LDA, the new family of Lortu Data Appliances LDA, the new family of Lortu Data Appliances Based on Lortu Byte-Level Deduplication Technology February, 2011 Copyright Lortu Software, S.L. 2011 1 Index Executive Summary 3 Lortu deduplication technology

More information

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013

Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics

More information

An Efficient Deduplication File System for Virtual Machine in Cloud

An Efficient Deduplication File System for Virtual Machine in Cloud An Efficient Deduplication File System for Virtual Machine in Cloud Bhuvaneshwari D M.E. computer science and engineering IndraGanesan college of Engineering,Trichy. Abstract Virtualization is widely deployed

More information

CURRENTLY, the enterprise data centers manage PB or

CURRENTLY, the enterprise data centers manage PB or IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 61, NO. 11, JANUARY 21 1 : Distributed Deduplication for Big Storage in the Cloud Shengmei Luo, Guangyan Zhang, Chengwen Wu, Samee U. Khan, Senior Member, IEEE,

More information

Enterprise Backup and Restore technology and solutions

Enterprise Backup and Restore technology and solutions Enterprise Backup and Restore technology and solutions LESSON VII Veselin Petrunov Backup and Restore team / Deep Technical Support HP Bulgaria Global Delivery Hub Global Operations Center November, 2013

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Data Protection. Senior Director of Product Management Database High Availability Oracle

Data Protection. Senior Director of Product Management Database High Availability Oracle Data Protection Best Practices for Databases Ashish Ray Senior Director of Product Management Database High Availability Oracle 1 Agenda Business Problem Overview of Data Protection Solutions Storage-centric

More information

Efficient and Safe Data Backup with Arrow

Efficient and Safe Data Backup with Arrow Efficient and Safe Data Backup with Arrow Technical Report UCSC-SSRC-8-2 June 28 Casey Marshall csm@soe.ucsc.edu Storage Systems Research Center Baskin School of Engineering University of California, Santa

More information

High Availability Solutions with MySQL

High Availability Solutions with MySQL High Availability Solutions with MySQL best OpenSystems Day Fall 2008 Ralf Gebhardt Senior Systems Engineer MySQL Global Software Practice ralf.gebhardt@sun.com 1 HA Requirements and Considerations HA

More information

Web-Based Data Backup Solutions

Web-Based Data Backup Solutions "IMAGINE LOSING ALL YOUR IMPORTANT FILES, IS NOT OF WHAT FILES YOU LOSS BUT THE LOSS IN TIME, MONEY AND EFFORT YOU ARE INVESTED IN" The fact Based on statistics gathered from various sources: 1. 6% of

More information

Protecting your SQL database with Hybrid Cloud Backup and Recovery. Session Code CL02

Protecting your SQL database with Hybrid Cloud Backup and Recovery. Session Code CL02 Protecting your SQL database with Hybrid Cloud Backup and Recovery Session Code CL02 ARCserve True Hybrid Data Protection ARCserve Backup Data protection for complex environments Disk to Disk to-tape Disk

More information

Data Deduplication Background: A Technical White Paper

Data Deduplication Background: A Technical White Paper Data Deduplication Background: A Technical White Paper NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice

More information

The Google File System

The Google File System The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:

More information

LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT

LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT LARGE-SCALE DATA PROCESSING USING MAPREDUCE IN CLOUD COMPUTING ENVIRONMENT Samira Daneshyar 1 and Majid Razmjoo 2 1,2 School of Computer Science, Centre of Software Technology and Management (SOFTEM),

More information

Experience with Server Self Service Center (S3C)

Experience with Server Self Service Center (S3C) Experience with Server Self Service Center (S3C) Juraj Sucik, Sebastian Bukowiec IT Department, CERN, CH-1211 Genève 23, Switzerland E-mail: juraj.sucik@cern.ch, sebastian.bukowiec@cern.ch Abstract. CERN

More information

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy, and Hong Tang University of California at Santa Barbara, Alibaba Inc. Abstract In a virtualized

More information

Byte-index Chunking Algorithm for Data Deduplication System

Byte-index Chunking Algorithm for Data Deduplication System , pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

A Web Site Protection Oriented Remote Backup and Recovery Method

A Web Site Protection Oriented Remote Backup and Recovery Method 2013 8th International Conference on Communications and Networking in China (CHINACOM) A Web Site Protection Oriented Remote Backup and Recovery Method He Qian 1,2, Guo Yafeng 1, Wang Yong 1, Qiang Baohua

More information

sulbhaghadling@gmail.com

sulbhaghadling@gmail.com www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 3 March 2015, Page No. 10715-10720 Data DeDuplication Using Optimized Fingerprint Lookup Method for

More information

Release Notes. LiveVault. Contents. Version 7.65. Revision 0

Release Notes. LiveVault. Contents. Version 7.65. Revision 0 R E L E A S E N O T E S LiveVault Version 7.65 Release Notes Revision 0 This document describes new features and resolved issues for LiveVault 7.65. You can retrieve the latest available product documentation

More information

Disk-based Backup for Virtualized Environment via Infortrend EonStor DS, ESVA, EonNAS 3000 / 5000 and Veeam Backup and Replication Application Note

Disk-based Backup for Virtualized Environment via Infortrend EonStor DS, ESVA, EonNAS 3000 / 5000 and Veeam Backup and Replication Application Note Disk-based Backup for Virtualized Environment via Infortrend EonStor DS, ESVA, EonNAS 3000 / 5000 and Veeam Backup and Replication Application Note Abstract The document describes, as an example the usage

More information

DXi Accent Technical Background

DXi Accent Technical Background TECHNOLOGY BRIEF NOTICE This Technology Brief contains information protected by copyright. Information in this Technology Brief is subject to change without notice and does not represent a commitment on

More information

Redefining Microsoft SQL Server Data Management. PAS Specification

Redefining Microsoft SQL Server Data Management. PAS Specification Redefining Microsoft SQL Server Data Management APRIL Actifio 11, 2013 PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft SQL Server Data Management.... 4 Virtualizing

More information

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant DISCOVER HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant HP StorageWorks Data Protection Solutions HP has it covered Near continuous data protection Disk Mirroring Advanced Backup

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside Managing the information that drives the enterprise STORAGE Buying Guide: DEDUPLICATION inside What you need to know about target data deduplication Special factors to consider One key difference among

More information

es T tpassport Q&A * K I J G T 3 W C N K V [ $ G V V G T 5 G T X K E G =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX *VVR YYY VGUVRCUURQTV EQO

es T tpassport Q&A * K I J G T 3 W C N K V [ $ G V V G T 5 G T X K E G =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX *VVR YYY VGUVRCUURQTV EQO Testpassport Q&A Exam : E22-280 Title : Avamar Backup and Data Deduplication Exam Version : Demo 1 / 9 1. What are key features of EMC Avamar? A. Disk-based archive RAID, RAIN, clustering and replication

More information

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP 1 M.SHYAMALA DEVI, 2 V.VIMAL KHANNA, 3 M.SHAHEEN SHAH 1 Assistant Professor, Department of CSE, R.M.D.

More information

HDFS: Hadoop Distributed File System

HDFS: Hadoop Distributed File System Istanbul Şehir University Big Data Camp 14 HDFS: Hadoop Distributed File System Aslan Bakirov Kevser Nur Çoğalmış Agenda Distributed File System HDFS Concepts HDFS Interfaces HDFS Full Picture Read Operation

More information

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper WHITE PAPER DATA DEDUPLICATION BACKGROUND: A Technical White Paper CONTENTS Data Deduplication Multiple Data Sets from a Common Storage Pool.......................3 Fixed-Length Blocks vs. Variable-Length

More information

EMC NETWORKER AND DATADOMAIN

EMC NETWORKER AND DATADOMAIN EMC NETWORKER AND DATADOMAIN Capabilities, options and news Madis Pärn Senior Technology Consultant EMC madis.parn@emc.com 1 IT Pressures 2009 0.8 Zettabytes 2020 35.2 Zettabytes DATA DELUGE BUDGET DILEMMA

More information

Cloud Based Application Architectures using Smart Computing

Cloud Based Application Architectures using Smart Computing Cloud Based Application Architectures using Smart Computing How to Use this Guide Joyent Smart Technology represents a sophisticated evolution in cloud computing infrastructure. Most cloud computing products

More information