Quanqing XU YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud
|
|
|
- Junior Peter Conley
- 10 years ago
- Views:
Transcription
1 Quanqing XU YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud
2 Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data De-duplication and Data Transmission Metadata Server Communication with Clients, Global Fingerprint Lookup and Store, and Highly Scalable Cluster of Metadata Servers Demo Preliminary experimental results Development status 2
3 Motivation Yuruware needs incremental backup in the cloud Cloud storage providers High reliability and scalability at low cost Ultra large-scale storage space 905 billion objects in Amazon S3, Q1/2012 Customers Backup and restore progressive data within short time Backup up to petabytes of data in total To build a large-scale cloud backup system System scalability Storage efficiency Backup and restoration performance NICTA Copyright [1]
4 The Architecture of YuruBackup To increase scalability to accommodate PB-scale data To improve space efficiency to reduce costs To save bandwidth to adapt to the low bandwidth of WAN Metadata of PB-scale data Backup Agent Write master Source-side De-duplication PB-scale space A cluster of metadata servers Target-side De-duplication slave Metadata Agent slave Snapshots Cloud Storage Read Read RPC, parallel transmission, data/metadata separation 4
5 Storage Hierarchy Snapshot A virtual file Collection Block Chunk Snapshot A Snapshot B Collection Block Chunk 5
6 Mapping blocks from memory to disk A block <collectionuuid, blockno, checksum, start, length> Components Memory Block, Block Proxy and TAR Store Memory Block... Memory Block... Memory Block In Memory Block Proxy TAR Store In Disk Collection Collection Collection 6
7 The Flow Chart of Backup Process Create DB connection to metadata catalog Initialize the TAR store T Initialize the Metadata Manager Scan a directory to get a file list The file list is empty? Yes Release the Metadata Manager Release the TAR store Close DB connection to metadata catalog No Remove a file and write its incremental backup into T T s size >= a given size? Yes Write T into disk and clear it No 7
8 Backup Client It provides a functional interface to users. Backup and restoration To reduce I/O requests Read/Write Buffer To locate items Compressed BF Berkeley DB Source-side dedup CD Chunking Transmission Batched RPC Parallel uploading 8
9 Source-side de-duplication Rabin s Fingerprinting Given a string A = a m a m-1 a 1 A k-bit Rabin fingerprint is computed as follows: m 1 m 2 Let, A( t) a t a t a t a m m 1 Choose an irreducible polynomial P(t) P k k 1 ( t) pkt pk 1t p0 Compute Rabin s fingerprint f(a) f ( A) A( t) mod P( t) Content-defined Chunking (SOSP 01) low_order(f, k) = c 2 1 C 1 C 2 C 3... [1] Muthitacharoen A, Chen B, Maziéres D. A low-bandwidth network file system. In: Proc. of the 18th ACM Symp. on Operating System Principles (SOSP 2001). New York: ACM Press, w 9
10 Duplication Detection based on Bloom filter Observations Most files are never changed after their creations (ATC 04) Over 2/3 of files have not been modified (FAST 07) Index Summary based on Compressed BF(ACM 70, PODC 01) Approximate set membership problem Trade-off between space and false positive probability Three functions 1) Initialize(initElementCount, desiredfpp) 2) Insert(fingerprint) 3) Lookup(fingerprint) [1] Burton H. Bloom. Space/time trade-o s in hash coding with allowable errors. ACM Communications, 13(7), [2] Mitzenmacher. Compressed Bloom Filters. In Twentieth ACM Symposium on Principles of Distributed Computing, August
11 Metadata Server Communication with Clients A single, batched and asynchronous lookup RPC for n FPs The callback function enqueues the updated request Global FP Lookup and Store Global Index Summary Global target-side deduplication FP Lookup FP Store 11
12 Highly Scalable Cluster of MDSs SQL Nodes with NDB YuruBackup Clients Load Balancer DataNodes Slaves Masters SQL Nodes with NDB+InnoDB Data replication To make reads scalable MySQL replication Failover Data partitioning To make writes scalable MySQL cluster Read Write Replication Load balancing To aware of which nodes are readable and writable 12
13 Demo of YuruBackup Chunk Partition Duplication Detection 13
14 An example of a snapshot (5 new blocks) B 1 B 3 B 5 B 7 B 12 14
15 An example of incremental backup emacs-23.2a emacs-23.3a 15
16 Comparison ReducedRatio = Datasets Hbase (97.5) ,462 4, Average (162.8) ,144 17, Nonoverlap data size (MB) # BytesSentByRsync - # BytesOfData - # BytesOfMetadata # BytesSentByRsync rsync Transferred data size (MB) Transferred data size (MB) Table 1. Dataset YuruBackup # chunks Data Metadata # old chunks # new chunks Emacs (155.9) ,731 11, Eclipse (234.9) , GCC (428.6) ,386 9, Hadoop-src (214.1) ,365 15, Hadoop-bin (110.5) , Lucene-src (64.8) , Lucene-bin (156.4) ,191 26, Hive-src (144.0) ,072 7, Hive-bin (21.7) , (%) 16
17 Others YuruBackup is deployed atop Amazon S3 metadata servers are running in EC2 will be deployed in other cloud platforms Performance evaluation De-duplication Efficiency De-duplication Overhead Scalability Backup Window Fine-granularity Restoration, etc. 17
18 Current Development Status Program directories (~12,000 LOC) include: header files, ~1,200 LOC src: source files, ~5,200 LOC 18
19 Thank you! Q&A
20 Dataset OverlapRatio = OverlapDataSize TransferredDataSize Emacs eclipse gcc Hadoopsrc Hadoopbin Objects # Files Data size (MB) 23.2a 4, a 4, galileo 2, Helios-SR2 2, , , , , # Overlap Files Overlap data size (MB) (%) (10.09) (0.21) 70, (74.86) 3, (56.56) (66.36) 20
21 Dataset lucenesrc Lucenebin Hive-src Hive-bin Objects # Files Data size (MB) , , , , , , hbase , , Linux shell: diff urnas v1 v2 # Overlap Files Overlap data size (MB) (%) 2, (73.58) (8.51) 3, (34.10) (73.88) 1, (50.81) Return 21
22 The rsync Algorithm f.old f.new A 2. A sends the checksums to B 4. B tells A how to construct file f.new from f.old and the literal data. B 1. A computes the checksum of each block S i in file f.old 3. B searches the file f.new and find the difference between f.old and f.new. The checksum consist of rolling 32-bit checksums (adler-32 checksum) and a 128-bit MD4 checksum. Return 22
Multi-level Metadata Management Scheme for Cloud Storage System
, pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1
IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS
IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department
Data Deduplication and Tivoli Storage Manager
Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements
A Deduplication-based Data Archiving System
2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System
WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression
WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of
Data Deduplication and Tivoli Storage Manager
Data Deduplication and Tivoli Storage Manager Dave annon Tivoli Storage Manager rchitect March 2009 Topics Tivoli Storage, IM Software Group Deduplication technology Data reduction and deduplication in
Theoretical Aspects of Storage Systems Autumn 2009
Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements
DEXT3: Block Level Inline Deduplication for EXT3 File System
DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India [email protected] Zishan Shaikh M.A.E. Alandi, Pune, India [email protected] Vishal Salve
Apache Hadoop. Alexandru Costan
1 Apache Hadoop Alexandru Costan Big Data Landscape No one-size-fits-all solution: SQL, NoSQL, MapReduce, No standard, except Hadoop 2 Outline What is Hadoop? Who uses it? Architecture HDFS MapReduce Open
A Data De-duplication Access Framework for Solid State Drives
JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 941-954 (2012) A Data De-duplication Access Framework for Solid State Drives Department of Electronic Engineering National Taiwan University of Science
922-280 Avamar Backup and Data De-duplication Exam
922-280 Avamar Backup and Data De-duplication Exam Q&A DEMO Version Copyright (c) 2007 Chinatag LLC. All rights reserved. Important Note Please Read Carefully For demonstration purpose only, this free
Veeam Best Practices with Exablox
Veeam Best Practices with Exablox Overview Exablox has worked closely with the team at Veeam to provide the best recommendations when using the the Veeam Backup & Replication software with OneBlox appliances.
A Deduplication File System & Course Review
A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror
Cumulus: filesystem backup to the Cloud
Michael Vrable, Stefan Savage, a n d G e o f f r e y M. V o e l k e r Cumulus: filesystem backup to the Cloud Michael Vrable is pursuing a Ph.D. in computer science at the University of California, San
MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services
MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,
Reducing Replication Bandwidth for Distributed Document Databases
Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 #1 You can
Creating a Cloud Backup Service. Deon George
Creating a Cloud Backup Service Deon George Agenda TSM Cloud Service features Cloud Service Customer, providing a internal backup service Internal Backup Cloud Service Service Provider, providing a backup
A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose
A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India
Open source Google-style large scale data analysis with Hadoop
Open source Google-style large scale data analysis with Hadoop Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory School of Electrical
NETAPP SYNCSORT INTEGRATED BACKUP. Technical Overview. Peter Eicher Syncsort Product Management
NETAPP SYNCSORT INTEGRATED BACKUP Technical Overview Peter Eicher Syncsort Product Management Current State of Data Protection Production Data Protection Storage Physical & VM App Servers Backup Servers
Tradeoffs in Scalable Data Routing for Deduplication Clusters
Tradeoffs in Scalable Data Routing for Deduplication Clusters Wei Dong Princeton University Fred Douglis EMC Kai Li Princeton University and EMC Hugo Patterson EMC Sazzala Reddy EMC Philip Shilane EMC
Design and Evolution of the Apache Hadoop File System(HDFS)
Design and Evolution of the Apache Hadoop File System(HDFS) Dhruba Borthakur Engineer@Facebook Committer@Apache HDFS SDC, Sept 19 2011 Outline Introduction Yet another file-system, why? Goals of Hadoop
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the The Israeli Association of Grid Technologies July 15, 2009 Outline Architecture
Apache Hadoop FileSystem and its Usage in Facebook
Apache Hadoop FileSystem and its Usage in Facebook Dhruba Borthakur Project Lead, Apache Hadoop Distributed File System [email protected] Presented at Indian Institute of Technology November, 2010 http://www.facebook.com/hadoopfs
IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE
IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,
Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos
Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos Symantec Research Labs Symantec FY 2013 (4/1/2012 to 3/31/2013) Revenue: $ 6.9 billion Segment Revenue Example Business
Turnkey Deduplication Solution for the Enterprise
Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for
Hadoop & its Usage at Facebook
Hadoop & its Usage at Facebook Dhruba Borthakur Project Lead, Hadoop Distributed File System [email protected] Presented at the Storage Developer Conference, Santa Clara September 15, 2009 Outline Introduction
Big data management with IBM General Parallel File System
Big data management with IBM General Parallel File System Optimize storage management and boost your return on investment Highlights Handles the explosive growth of structured and unstructured data Offers
Database Scalability {Patterns} / Robert Treat
Database Scalability {Patterns} / Robert Treat robert treat omniti postgres oracle - mysql mssql - sqlite - nosql What are Database Scalability Patterns? Part Design Patterns Part Application Life-Cycle
Leveraging Public Clouds to Ensure Data Availability
Systems Engineering at MITRE CLOUD COMPUTING SERIES Leveraging Public Clouds to Ensure Data Availability Toby Cabot Lawrence Pizette The MITRE Corporation manages federally funded research and development
Berkeley Ninja Architecture
Berkeley Ninja Architecture ACID vs BASE 1.Strong Consistency 2. Availability not considered 3. Conservative 1. Weak consistency 2. Availability is a primary design element 3. Aggressive --> Traditional
Bigdata High Availability (HA) Architecture
Bigdata High Availability (HA) Architecture Introduction This whitepaper describes an HA architecture based on a shared nothing design. Each node uses commodity hardware and has its own local resources
HTTP-Level Deduplication with HTML5
HTTP-Level Deduplication with HTML5 Franziska Roesner and Ivayla Dermendjieva Networks Class Project, Spring 2010 Abstract In this project, we examine HTTP-level duplication. We first report on our initial
The Google File System
The Google File System By Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung (Presented at SOSP 2003) Introduction Google search engine. Applications process lots of data. Need good file system. Solution:
bup: the git-based backup system Avery Pennarun
bup: the git-based backup system Avery Pennarun 2010 10 25 The Challenge Back up entire filesystems (> 1TB) Including huge VM disk images (files >100GB) Lots of separate files (500k or more) Calculate/store
Trends in Enterprise Backup Deduplication
Trends in Enterprise Backup Deduplication Shankar Balasubramanian Architect, EMC 1 Outline Protection Storage Deduplication Basics CPU-centric Deduplication: SISL (Stream-Informed Segment Layout) Data
Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation
Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely
Module 14: Scalability and High Availability
Module 14: Scalability and High Availability Overview Key high availability features available in Oracle and SQL Server Key scalability features available in Oracle and SQL Server High Availability High
DISK IMAGE BACKUP. For Physical Servers. VEMBU TECHNOLOGIES www.vembu.com TRUSTED BY OVER 25,000 BUSINESSES
DISK IMAGE BACKUP For Physical Servers VEMBU TECHNOLOGIES www.vembu.com Copyright Information Information in this document is subject to change without notice. The entire risk of the use or the results
Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! [email protected]
Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! [email protected] 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind
Hadoop Distributed File System. Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected]
Hadoop Distributed File System Dhruba Borthakur Apache Hadoop Project Management Committee [email protected] [email protected] Hadoop, Why? Need to process huge datasets on large clusters of computers
Egnyte Local Cloud Architecture. White Paper
w w w. e g n y t e. c o m Egnyte Local Cloud Architecture White Paper Revised June 21, 2012 Table of Contents Egnyte Local Cloud Introduction page 2 Scalable Solutions Personal Local Cloud page 3 Office
The assignment of chunk size according to the target data characteristics in deduplication backup system
The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB
BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next
Petabyte Scale Data at Facebook. Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013
Petabyte Scale Data at Facebook Dhruba Borthakur, Engineer at Facebook, SIGMOD, New York, June 2013 Agenda 1 Types of Data 2 Data Model and API for Facebook Graph Data 3 SLTP (Semi-OLTP) and Analytics
Optimize VMware and Hyper-V Protection with HP and Veeam
Optimize VMware and Hyper-V Protection with HP and Veeam John DeFrees, Global Alliance Solution Architect, Veeam Markus Berber, HP LeftHand P4000 Product Marketing Manager, HP Key takeaways from today
Distributed File Systems
Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.
WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation
WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:
Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL
Architectures Haute-Dispo Joffrey MICHAÏE Consultant MySQL 04.20111 High Availability with MySQL Higher Availability Shared nothing distributed cluster with MySQL Cluster Storage snapshots for disaster
Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges
Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges September 2011 Table of Contents The Enterprise and Mobile Storage Landscapes... 3 Increased
Distributed Block-level Storage Management for OpenStack
Distributed Block-level Storage Management for OpenStack OpenStack APAC Conference Daniel Lee CCMA/ITRI Cloud Computing Center for Mobile Applications Industrial Technology Research Institute ( 雲 端 運 算
Prepared By : Manoj Kumar Joshi & Vikas Sawhney
Prepared By : Manoj Kumar Joshi & Vikas Sawhney General Agenda Introduction to Hadoop Architecture Acknowledgement Thanks to all the authors who left their selfexplanatory images on the internet. Thanks
Hadoop-based Open Source ediscovery: FreeEed. (Easy as popcorn)
+ Hadoop-based Open Source ediscovery: FreeEed (Easy as popcorn) + Hello! 2 Sujee Maniyam & Mark Kerzner Founders @ Elephant Scale consulting and training around Hadoop, Big Data technologies Enterprise
[email protected]
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 3 March 2015, Page No. 10715-10720 Data DeDuplication Using Optimized Fingerprint Lookup Method for
Appendix A Core Concepts in SQL Server High Availability and Replication
Appendix A Core Concepts in SQL Server High Availability and Replication Appendix Overview Core Concepts in High Availability Core Concepts in Replication 1 Lesson 1: Core Concepts in High Availability
Enterprise Backup and Restore technology and solutions
Enterprise Backup and Restore technology and solutions LESSON VII Veselin Petrunov Backup and Restore team / Deep Technical Support HP Bulgaria Global Delivery Hub Global Operations Center November, 2013
INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP
INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP 1 M.SHYAMALA DEVI, 2 V.VIMAL KHANNA, 3 M.SHAHEEN SHAH 1 Assistant Professor, Department of CSE, R.M.D.
SEP Software. About SEP. Key Features ONE BACKUP & DISASTER RECOVERY SOLUTION FOR THE ENTIRE ENTERPRISE
SEP Software ONE BACKUP & DISASTER RECOVERY SOLUTION FOR THE ENTIRE ENTERPRISE About SEP SEP is an enterprise backup and disaster recovery solution for today s discerning computer environments. Supporting
Turbo Charge Your Data Protection Strategy
Turbo Charge Your Data Protection Strategy Data protection for the hybrid cloud 1 WAVES OF CHANGE! Data GROWTH User EXPECTATIONS Do It YOURSELF Can t Keep Up Reliability and Visibility New Choices and
LDA, the new family of Lortu Data Appliances
LDA, the new family of Lortu Data Appliances Based on Lortu Byte-Level Deduplication Technology February, 2011 Copyright Lortu Software, S.L. 2011 1 Index Executive Summary 3 Lortu deduplication technology
Deduplication Demystified: How to determine the right approach for your business
Deduplication Demystified: How to determine the right approach for your business Presented by Charles Keiper Senior Product Manager, Data Protection Quest Software Session Objective: To answer burning
Speeding Up Cloud/Server Applications Using Flash Memory
Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,
Understanding EMC Avamar with EMC Data Protection Advisor
Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features that reduce the complexity of managing data protection
Web-Based Data Backup Solutions
"IMAGINE LOSING ALL YOUR IMPORTANT FILES, IS NOT OF WHAT FILES YOU LOSS BUT THE LOSS IN TIME, MONEY AND EFFORT YOU ARE INVESTED IN" The fact Based on statistics gathered from various sources: 1. 6% of
Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.
Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated
Byte-index Chunking Algorithm for Data Deduplication System
, pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko
Cloud De-duplication Cost Model THESIS
Cloud De-duplication Cost Model THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Christopher Scott Hocker
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle
Direct NFS - Design considerations for next-gen NAS appliances optimized for database workloads Akshay Shah Gurmeet Goindi Oracle Agenda Introduction Database Architecture Direct NFS Client NFS Server
Alternatives to Big Backup
Alternatives to Big Backup Life Cycle Management, Object- Based Storage, and Self- Protecting Storage Systems Presented by: Chris Robertson Solution Architect Cambridge Computer Copyright 2010-2011, Cambridge
Protecting your SQL database with Hybrid Cloud Backup and Recovery. Session Code CL02
Protecting your SQL database with Hybrid Cloud Backup and Recovery Session Code CL02 ARCserve True Hybrid Data Protection ARCserve Backup Data protection for complex environments Disk to Disk to-tape Disk
FAST 11. Yongseok Oh <[email protected]> University of Seoul. Mobile Embedded System Laboratory
CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory
Bloom Filters. Christian Antognini Trivadis AG Zürich, Switzerland
Bloom Filters Christian Antognini Trivadis AG Zürich, Switzerland Oracle Database uses bloom filters in various situations. Unfortunately, no information about their usage is available in Oracle documentation.
Open source large scale distributed data management with Google s MapReduce and Bigtable
Open source large scale distributed data management with Google s MapReduce and Bigtable Ioannis Konstantinou Email: [email protected] Web: http://www.cslab.ntua.gr/~ikons Computing Systems Laboratory
A Web Site Protection Oriented Remote Backup and Recovery Method
2013 8th International Conference on Communications and Networking in China (CHINACOM) A Web Site Protection Oriented Remote Backup and Recovery Method He Qian 1,2, Guo Yafeng 1, Wang Yong 1, Qiang Baohua
Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms
Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes
Data Compression and Deduplication. LOC 2010 2010 Cisco Systems, Inc. All rights reserved.
Data Compression and Deduplication LOC 2010 2010 Systems, Inc. All rights reserved. 1 Data Redundancy Elimination Landscape VMWARE DeDE IBM DDE for Tank Solaris ZFS Hosts (Inline and Offline) MDS + Network
An Efficient Deduplication File System for Virtual Machine in Cloud
An Efficient Deduplication File System for Virtual Machine in Cloud Bhuvaneshwari D M.E. computer science and engineering IndraGanesan college of Engineering,Trichy. Abstract Virtualization is widely deployed
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique
Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,[email protected]
A programming model in Cloud: MapReduce
A programming model in Cloud: MapReduce Programming model and implementation developed by Google for processing large data sets Users specify a map function to generate a set of intermediate key/value
STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside
Managing the information that drives the enterprise STORAGE Buying Guide: DEDUPLICATION inside What you need to know about target data deduplication Special factors to consider One key difference among
Non-Stop for Apache HBase: Active-active region server clusters TECHNICAL BRIEF
Non-Stop for Apache HBase: -active region server clusters TECHNICAL BRIEF Technical Brief: -active region server clusters -active region server clusters HBase is a non-relational database that provides
Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski 19-05-2011 Spała
Data Deduplication in Tivoli Storage Manager Andrzej Bugowski 19-05-2011 Spała Agenda Tivoli Storage, IBM Software Group Deduplication concepts Data deduplication in TSM 6.1 Planning for data deduplication
Tushar Joshi Turtle Networks Ltd
MySQL Database for High Availability Web Applications Tushar Joshi Turtle Networks Ltd www.turtle.net Overview What is High Availability? Web/Network Architecture Applications MySQL Replication MySQL Clustering
ZFS Backup Platform. ZFS Backup Platform. Senior Systems Analyst TalkTalk Group. http://milek.blogspot.com. Robert Milkowski.
ZFS Backup Platform Senior Systems Analyst TalkTalk Group http://milek.blogspot.com The Problem Needed to add 100's new clients to backup But already run out of client licenses No spare capacity left (tapes,
Redefining Microsoft SQL Server Data Management. PAS Specification
Redefining Microsoft SQL Server Data Management APRIL Actifio 11, 2013 PAS Specification Table of Contents Introduction.... 3 Background.... 3 Virtualizing Microsoft SQL Server Data Management.... 4 Virtualizing
Hadoop: A Framework for Data- Intensive Distributed Computing. CS561-Spring 2012 WPI, Mohamed Y. Eltabakh
1 Hadoop: A Framework for Data- Intensive Distributed Computing CS561-Spring 2012 WPI, Mohamed Y. Eltabakh 2 What is Hadoop? Hadoop is a software framework for distributed processing of large datasets
THE HADOOP DISTRIBUTED FILE SYSTEM
THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,
MySQL Cluster 7.0 - New Features. Johan Andersson MySQL Cluster Consulting [email protected]
MySQL Cluster 7.0 - New Features Johan Andersson MySQL Cluster Consulting [email protected] Mat Keep MySQL Cluster Product Management [email protected] Copyright 2009 MySQL Sun Microsystems. The
