Target Deduplication Metrics and Risk Analysis Using Post Processing Methods

Size: px
Start display at page:

Download "Target Deduplication Metrics and Risk Analysis Using Post Processing Methods"

Transcription

1 Target Deduplication Metrics and Risk Analysis Using Post Processing Methods Gayathri.R 1, 1 Dr. Malathi.A 2 1 Assistant Professor, 2 Assistant Professor 1 School of IT and Science, 2 PG and Research Department of Computer Science, 1 Dr. G.R.D. College of Science, 2 Government Arts College 1 Coimbatore , 2 Coimbatore, India 1 gayathri.manalan@gmail.com, 2 malathi.arunachalam@yahoo.com Abstract In modern intelligent storage technologies deduplication of data is a data compression technique used for discarding duplicate copies of repeating data. It is used to improve storage utilization and applied to huge network data transfers to reduce the number of bytes which is to be transferred. In this process, similar data chunks, patterned bytes, are classified and stored at this stage. As a parallel process to the analysis process the rest of the data chunks are analyzed to the already available stored data and whenever a similarity in data is identified, the surplus chunk is replaced with minor changes that denote the already available stored data chunk. This process is repeated for n number of times.which discards the duplicate entry. Target deduplication is the process of removing duplicates of data in the secondary store. Generally this will be a backup store such as a data repository or a Virtual Space. In this paper post processing in Target Data is implemented and compared with the multiple data collected from different data centers with duration of 50 Days and analyzed for its efficiency & throughput of the global & local data using post processing in Target Data.Keywords:[post processing in Target Data, Target Deduplication, Post Processing Deduplication Methods] 1. INTRODUCTION The implementation approaches of Target deduplication system divide into several modes including post-process approach, In-line approach and source-based approach. In today s world most commercial systems are implemented and operated in an ILA and PPA approach, and a very few researchers focus on other approaches As data deduplication systems are widely used, to choose an appropriate mode considering operation environment becomes more and more important than ever. Because the overhead of each mode and resource usage wasn t fully studied, in some operating environments, the deduplication mode can lead to inefficiency and poor performance. In this study, we propose a data deduplication system supporting multi-mode.

2 Due to the relatively low bandwidth of WAN (Wide Area Network) that supports cloud backup services, both the backup time and restore time in the cloud backup environment are in desperate need for reduction to make cloud backup a practical and affordable service for small businesses and telecommuters alike. Existing solutions that employ the deduplication technology for cloud backup services only focus on removing redundant data from transmission during backup operations to reduce the backup time, while paying little attention to the restore time that we argue is an important aspect and affects the overall quality of service of the cloud backup services. As the volume of data increases, so does the demand for online storage services, from simple backup services to cloud storage infrastructures. Although deduplication is most effective when applied across multiple users, cross-user deduplication has serious privacy implications. Some simple mechanisms can enable cross-user deduplication while greatly reducing the risk of data leakage. Cloud storage refers to scalable and elastic storage capabilities delivered as a service using Internet technologies with elastic provisioning and use based pricing that doesn't penalize users for changing their storage consumption without notice. Figure 1: Post processing in Deduplication In this paper post processing in Target Data is implemented and compared with the multiple data collected from different data centers with duration of 50 Days and analyzed for its efficiency & throughput of the global & local data using post processing in Target Data. These techniques enable a modern twosocket dual-core system to run at full CPU utilization with needed disks and achieve 100 MB/sec for single-stream throughput and 210 MB/sec for multi-stream throughput. to make the experiment of Target based post processing into an effective Task. 2. ANALYSIS IN DATA CENTERS Large storage requirements for data protection have existing a serious problem for data centric companies. Most data centers perform a weekly full backup of all the data on their primary storage systems to secondary storage devices where they keep these backups for weeks to months. In addition, they may perform daily incremental backups that copy only the data which has changed since the last backup. The frequency, type and retention of backups vary for different kinds of data, but it is common for the secondary storage to hold 10 to 20 times more data than the primary storage. For disaster recovery, additional offsite copies may double the secondary storage capacity needed. If the data is transferred offsite over a wide area network, the network bandwidth requirement can be enormous. Given the data protection use case, there are two main requirements for a secondary storage system storing backup data. The first is low cost so that storing backups and moving copies offsite does not end up costing significantly more than storing the primary data. The second is high performance so that backups can complete in a timely fashion. In many cases, backups must complete overnight so the load of performing backups does not interfere with normal daytime usage. The traditional solution has been to use tape libraries as secondary storage devices and to transfer physical tapes for disaster recovery. Tape cartridges cost a small fraction of disk storage systems and they have

3 good sequential transfer rates in the neighborhood of 100 MB/sec. But, managing cartridges is a manual process that is expensive and error prone. It is quite common for restores to fail because a tape cartridge cannot be located or has been damaged during handling. Further, random access performance, needed for data restores, is extremely poor. Disk-based storage systems and network replication would be much preferred if they were affordable. 2.1 Challenges and Observations An Identical Segment for post processing in Target Data Deduplication system could choose to use either fixed length segments or variable length segments created in a content dependent manner. Fixed length segments are the same as the fixed-size blocks of many non-deduplication file systems. For the purposes of this discussion, extents that are multiples of some underlying fixed size unit such as a disk sector are the same as fixed-size blocks. Variable-length segments can be any number of bytes in length within some range. They are the result of partitioning a file or data stream in a content dependent manner [Man93, BDH94]. The main advantage of a post processing in Target Data fixed segment size is simplicity. A conventional file system can create fixedsize blocks in the usual way and a deduplication process can then be applied to deduplicate those fixed-size blocks or segments. The approach is effective at deduplicating whole files that are identical because every block of identical files will of course be identical. In backup applications, single files are backup images that are made up of large numbers of component files. These files are rarely entirely identical even when they are successive backups of the same file system. A single addition, deletion, or change of any component file can easily shift the remaining image content. Even if no other file has changed, the shift would cause each fixed sized segment to be different than it was last time, containing some bytes from one neighbor and giving up some bytes to its other neighbor. The approach of partitioning the data into variable length segments based on content allows a segment to grow or shrink as needed so the remaining segments can be identical to previously stored segments. Even for storing individual files, variable length segments have an advantage. Many files are very similar to, but not identical to other versions of the same file. Variable length segments can accommodate these differences and maximize the number of identical segments. Because variable length segments are essential for deduplication of the shifted content of backup images, we have chosen them over fixed-length segments. 2.2 Segment Analysis: Whether fixed or variable sized, the choice of average segment size is difficult because of its impact on compression and performance. The smaller the segments, the more duplicate segments there will be. Put another way, if there is a small modification to a file, the smaller the segment, the smaller the new data that must be stored and the more of the file s bytes will be in duplicate segments. Within limits, smaller segments will result in a better compression ratio. On the other hand, with smaller segments, there are more segments to process which reduces performance. At a minimum, more segments mean more times through the

4 deduplication loop, but it is also likely to mean more on-disk index lookups. With smaller segments, there are more segments to manage. Since each segment requires the same metadata size, smaller segments will require more storage footprint for their metadata, and the segment fingerprints for fewer total user bytes can be cached in a given amount of memory. The segment index is larger. There are more updates to the index. To the extent that any data structures scale with the number of segments, they will limit the overall capacity of the system. Since commodity servers typically have a hard limit on the amount of physical memory in a system, the decision on the segment size can greatly affect the cost of the system. A well-designed duplication storage system should have the smallest segment size possible given the throughput and capacity requirements for the product. After several iterative design processes, we have chosen to use 8 KB as the average segment size for the variable sized data segments in our deduplication storage system. 2.3 Segmentation Process Steps: Segment filtering determines if a segment is a duplicate. This is the key operation to deduplication segments and may trigger disk I/Os, thus its overhead can significantly impact throughput performance. Container packing adds segments to be stored to a container which is the unit of storage in the system. The packing operation also compresses segment data using a variation algorithm. A container, when fully packed, is appended to the Container Manager. Segment Indexing updates the segment index that maps segment descriptors to the container holding the segment, after the container has been appended to the Container Manager. Segment lookup finds the container storing the requested segment. This operation may trigger disk I/Os to look in the on-disk index, thus it is throughput sensitive. Container retrieval reads the relevant portion of the indicated container by invoking the Container Manager. Container unpacking decompresses the retrieved portion of the container and returns the requested data segment. 3. PERFORMANCE & CAPACITY ANALYSIS A secondary storage system used for data protection must support a reasonable balance between capacity and performance. Since backups must complete within a fixed backup window time, a system with a given performance can only backup so much data within the backup window. Further, given a fixed retention period for the data being backed up, the storage system needs only so much capacity to retain the backups that can complete within the backup window. Conversely, given a particular storage capacity, backup policy, and deduplication efficiency, it is possible to compute the throughput that the system must sustain to justify the capacity. This balance between performance and capacity motivates the need to achieve good system performance with only a small number of disk drives. Assuming a backup policy of weekly fulls and daily incremental with a retention period of 15 weeks and a system that achieves a 20x compression ratio storing backups for such a

5 policy, as a rough rule of thumb, it requires approximately as much capacity as the primary data to store all the backup images. That is, for 1 TB of primary data, the deduplication secondary storage would consume approximately 1 TB of physical capacity to store the 15 weeks of backups. 3.1 Experimental Setup: The experimental setup was made very clear in defining how better deduplication storage system work with real world datasets using post processing in Target Data, also about the effectiveness of the techniques in terms of reducing disk I/O operations and in the throughput can a deduplication storage system using these techniques achieve. These three are taken as a primary metrics (Betterment, Efficiency & Throughput) For the first metrics of post processing in Target Data, we will report our results with real world data from two customer data centers. For the next two questions, we conducted experiments with several internal datasets. Our experiments use a Data Domain deduplication storage system server. This deduplication system features two-socket duel-core CPU s running at 2.4 Ghz, a total of 6 GB system memory, 2 gigabit NIC cards, and a 15-drive disk subsystem running software. We use 1 and 4 backup client computers running in servers Data center A backs up structured database data over the course of 50 days during the initial deployment of a deduplication system. The backup policy is to do daily full backups, where each full backup produces over 200 GB at steady state. There are two exceptions: Capacity in GB Days DATA VOLUME Average capacity :3000 Total No of Days :50 Total Data Volume=14000 Table:1-Days Capacity & Data Volume Data During the First phase different post processing in Target Data data or different types of data are rolled into the backup set, as backup administrators figure out how they want to use the deduplication system. A low rate of duplicate segment identification and elimination is typically associated with the seeding phase. There are few days where no backup is generated. 3.2 Real World Data Analysis The system described in this paper has been used at over 100 data centres. The following paragraphs report the deduplication results from two data centres, generated from the auto-support mechanism of the system.

6 Figure 3: Compression Ratio of Different Data centres for global and Local Compression Figure 2: Comparision chart for Days Capacity & Data Volume Data from data centers Figure 2 shows post processing in Target Data logical capacity and the physical capacity of the system over time at data center A. At the end of 50 th day, the data center has backed up about 1.4 TB, and the corresponding physical capacity is less than actual reaching a total compression ratio of high values Min Max Avg SD Daily Local Compression Daily Global Compression * Values Mentioned here are round off Figure 3 shows the post processing in Target Data for daily global compression ratio (the daily rate of data reduction due to duplicate segment elimination), daily local compression ratio (the daily rate of data reduction due to Average Post Processing style compression on new segments), cumulative global compression ratio (the cumulative ratio of data reduction due to duplicate segment elimination), and cumulative total compression ratio (the cumulative ratio of data reduction due to duplicate segment elimination and Average Post Processing compression on new segments) over time. At the end of 50 st day, using the post processing in Target Data increasing global compression ratio reaches 23 to 1, and increasing total compression ratio reaches 39 to 1. In these experiments, average generation (global compression ratio of 50, and an average generation local compression ratio of 2 to 1 for each backup stream. These compression numbers seem possible given the real world examples. We measure throughput for one backup stream using one client computer and 4 backup streams using two client computers for write and read for 10 generations of the backup datasets.

7 Figure 4 : Throughput Comparison Chart 4. CONCLUSION This paper presents a set of techniques to post processing in Target Data and substantially reduce disk I/Os in high-throughput deduplication storage systems. The experiments show that the combination of these techniques can achieve over high read and write data streams on storage server with two dual-core processors and one shelf of few drives. These techniques are general methods to improve throughput performance of deduplication storage systems. Our techniques for minimizing disk I/Os to achieve good deduplication performance match well against the industry trend of building many-core processors. When compared to the other deduplication methods here we have shown reduce disk index lookups by about 15% thus avoiding the duplicated item into the target system. REFERENCES 1] G. R. Goodson, J. J. Wylie, G. R. Ganger, and M. K. Reiter.Efficient Byzantine-tolerant erasure-coded storage. InProceedings of the 2004 Int l Conference on DependableSystems and Networking (DSN 2004), June [2] Jeffrey Dean and Sanjay Ghemawat. MapReduce: Simplified Data Processing on Large Clusters. OSDI [3] Carloz Alvarez. NetApp Technical Report: NetApp Deduplication for FAS and V-Series Deployment and Implementation Guide. [4] William D. Norcott and Don Capps. Iozone Filesystem Benchmark. [5] Sean Quinlan and Sean Dorward, Venti: A new approach to archival storage, in FAST 02: Proceedings of the Conference on File and Storage Technologies, Berkeley, CA, USA, 2002, pp , USENIX Association. [6] OpenDedup, A userspace deduplication file system (SDFS), March 2010, [7] ZHU, B., LI, K., AND PATTERSON, H. Avoiding the disk bottleneck in the Data Domain deduplication file system. In File and Storage Technology Conference (2008). [8] Shai Halevi, Danny Harnik, Benny Pinkas, and Alexandra Shulman-Peleg. Proofs of Ownership in Remote Storage Systems. CCS 2011 [9] Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler. The Hadoop Distributed File System. MSST2010 [10] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer. Reclaiming space from duplicate files in a serverless distributed file system. In Proceedings of the 22nd International Conference on Distributed Computing Systems (ICDCS 02), pages , Vienna, Austria, July [11] F. Douglis and A. Iyengar. Applicationspecific delta-encoding via resemblance detection. In Proceedings of the 2003 USENIX Annual Technical Conference, pages USENIX, June [12] D. Goldschlag, M. Reed, and P. Syverson. Onion routing. Communications of the ACM, [13] Petros Efstathopoulos and Fanglu Guo. Rethinking Deduplication Scalability. HotStorage [14] H. S. Gunawi, N. Agrawal, A. C. Arpaci- Dusseau, R. H. Arpaci-Dusseau, and J. Schindler. Deconstructing commodity storage clusters. In Proceedings of the 32nd Int l

8 Symposium on Computer Architecture, pages 60 71, June [15] S. Hand and T. Roscoe. Mnemosyne: Peer-to-peer steganographic storage. Lecture Notes in Computer Science, 2429: , Mar [16] Health Information Portability and Accountability Act, Oct [17] D. Hitz, J. Lau, and M. Malcom. File system design for an NFS file server appliance. In Proceedings of the Winter 1994 USENIX Technical Conference, pages , San Francisco, CA, Jan [18] R. J. Honicky and E. L. Miller. Replication under scalable hashing: A family of algorithms for scalable decentralized data distribution. In Proceedings of the 18th International Parallel & Distributed Processing Symposium (IPDPS 2004), Santa Fe, NM, Apr IEEE. [19] A. Iyengar, R. Cahn, J. A. Garay, and C. Jutla. Design and implementation of a secure distributed data repository. In Proceedings of the 14th IFIP International Information Security Conference ( SEC 98), pages , Sept

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Benjamin Zhu Data Domain, Inc. Kai Li Data Domain, Inc. and Princeton University Hugo Patterson Data Domain, Inc. Abstract Disk-based

More information

Data Backup and Archiving with Enterprise Storage Systems

Data Backup and Archiving with Enterprise Storage Systems Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia slavjan_ivanov@yahoo.com,

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,

More information

LDA, the new family of Lortu Data Appliances

LDA, the new family of Lortu Data Appliances LDA, the new family of Lortu Data Appliances Based on Lortu Byte-Level Deduplication Technology February, 2011 Copyright Lortu Software, S.L. 2011 1 Index Executive Summary 3 Lortu deduplication technology

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

Data Deduplication Scheme for Cloud Storage

Data Deduplication Scheme for Cloud Storage 26 Data Deduplication Scheme for Cloud Storage 1 Iuon-Chang Lin and 2 Po-Ching Chien Abstract Nowadays, the utilization of storage capacity becomes an important issue in cloud storage. In this paper, we

More information

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department

More information

3Gen Data Deduplication Technical

3Gen Data Deduplication Technical 3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 58.7 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive

More information

Turnkey Deduplication Solution for the Enterprise

Turnkey Deduplication Solution for the Enterprise Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

Authorized data deduplication check in hybrid cloud With Cluster as a Service

Authorized data deduplication check in hybrid cloud With Cluster as a Service Authorized data deduplication check in hybrid cloud With Cluster as a Service X.ALPHONSEINBARAJ PG Scholar, Department of Computer Science and Engineering, Anna University, Coimbatore. Abstract Data deduplication

More information

Protect Data... in the Cloud

Protect Data... in the Cloud QUASICOM Private Cloud Backups with ExaGrid Deduplication Disk Arrays Martin Lui Senior Solution Consultant Quasicom Systems Limited Protect Data...... in the Cloud 1 Mobile Computing Users work with their

More information

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy, and Hong Tang University of California at Santa Barbara, Alibaba Inc. Abstract In a virtualized

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features to reduce the complexity of managing data protection

More information

A Survey on Deduplication Strategies and Storage Systems

A Survey on Deduplication Strategies and Storage Systems A Survey on Deduplication Strategies and Storage Systems Guljar Shaikh ((Information Technology,B.V.C.O.E.P/ B.V.C.O.E.P, INDIA) Abstract : Now a day there is raising demands for systems which provide

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM ESSENTIALS HIGH-SPEED, SCALABLE DEDUPLICATION Up to 58.7 TB/hr performance Reduces protection storage requirements by 10 to 30x CPU-centric scalability DATA INVULNERABILITY ARCHITECTURE Inline write/read

More information

Protect Microsoft Exchange databases, achieve long-term data retention

Protect Microsoft Exchange databases, achieve long-term data retention Technical white paper Protect Microsoft Exchange databases, achieve long-term data retention HP StoreOnce Backup systems, HP StoreOnce Catalyst, and Symantec NetBackup OpenStorage Table of contents Introduction...

More information

Trends in Enterprise Backup Deduplication

Trends in Enterprise Backup Deduplication Trends in Enterprise Backup Deduplication Shankar Balasubramanian Architect, EMC 1 Outline Protection Storage Deduplication Basics CPU-centric Deduplication: SISL (Stream-Informed Segment Layout) Data

More information

De-duplication-based Archival Storage System

De-duplication-based Archival Storage System De-duplication-based Archival Storage System Than Than Sint Abstract This paper presents the disk-based backup system in which only relational database files are stored by using data deduplication technology.

More information

EMC DATA DOMAIN PRODUCT OvERvIEW

EMC DATA DOMAIN PRODUCT OvERvIEW EMC DATA DOMAIN PRODUCT OvERvIEW Deduplication storage for next-generation backup and archive Essentials Scalable Deduplication Fast, inline deduplication Provides up to 65 PBs of logical storage for long-term

More information

Deduplication Demystified: How to determine the right approach for your business

Deduplication Demystified: How to determine the right approach for your business Deduplication Demystified: How to determine the right approach for your business Presented by Charles Keiper Senior Product Manager, Data Protection Quest Software Session Objective: To answer burning

More information

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP 1 M.SHYAMALA DEVI, 2 V.VIMAL KHANNA, 3 M.SHAHEEN SHAH 1 Assistant Professor, Department of CSE, R.M.D.

More information

HDFS Space Consolidation

HDFS Space Consolidation HDFS Space Consolidation Aastha Mehta*,1,2, Deepti Banka*,1,2, Kartheek Muthyala*,1,2, Priya Sehgal 1, Ajay Bakre 1 *Student Authors 1 Advanced Technology Group, NetApp Inc., Bangalore, India 2 Birla Institute

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve

More information

Byte-index Chunking Algorithm for Data Deduplication System

Byte-index Chunking Algorithm for Data Deduplication System , pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko

More information

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup

EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup EMC PERSPECTIVE An EMC Perspective on Data De-Duplication for Backup Abstract This paper explores the factors that are driving the need for de-duplication and the benefits of data de-duplication as a feature

More information

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl

Lecture 5: GFS & HDFS! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl Big Data Processing, 2014/15 Lecture 5: GFS & HDFS!! Claudia Hauff (Web Information Systems)! ti2736b-ewi@tudelft.nl 1 Course content Introduction Data streams 1 & 2 The MapReduce paradigm Looking behind

More information

Demystifying Deduplication for Backup with the Dell DR4000

Demystifying Deduplication for Backup with the Dell DR4000 Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett

More information

How To Make A Backup System More Efficient

How To Make A Backup System More Efficient Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem October, 2006 Introduction Data de-duplication has recently gained significant industry attention,

More information

Protecting enterprise servers with StoreOnce and CommVault Simpana

Protecting enterprise servers with StoreOnce and CommVault Simpana Technical white paper Protecting enterprise servers with StoreOnce and CommVault Simpana HP StoreOnce Backup systems Table of contents Introduction 2 Technology overview 2 HP StoreOnce Backup systems key

More information

VMware vsphere Data Protection 6.0

VMware vsphere Data Protection 6.0 VMware vsphere Data Protection 6.0 TECHNICAL OVERVIEW REVISED FEBRUARY 2015 Table of Contents Introduction.... 3 Architectural Overview... 4 Deployment and Configuration.... 5 Backup.... 6 Application

More information

DEDUPLICATION BASICS

DEDUPLICATION BASICS DEDUPLICATION BASICS 4 DEDUPE BASICS 12 HOW DO DISASTER RECOVERY & ARCHIVING FIT IN? 6 WHAT IS DEDUPLICATION 14 DEDUPLICATION FOR EVERY BUDGET QUANTUM DXi4000 and vmpro 4000 8 METHODS OF DEDUPLICATION

More information

Effective Planning and Use of TSM V6 Deduplication

Effective Planning and Use of TSM V6 Deduplication Effective Planning and Use of IBM Tivoli Storage Manager V6 Deduplication 08/17/12 1.0 Authors: Jason Basler Dan Wolfe Page 1 of 42 Document Location This is a snapshot of an on-line document. Paper copies

More information

Barracuda Backup Deduplication. White Paper

Barracuda Backup Deduplication. White Paper Barracuda Backup Deduplication White Paper Abstract Data protection technologies play a critical role in organizations of all sizes, but they present a number of challenges in optimizing their operation.

More information

Business-Centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Business-Centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance Intel Xeon processor Business-Centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance The easy solution for backup to disk with deduplication Intel Inside. Powerful Solution Outside. If

More information

Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos

Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos Symantec Research Labs Symantec FY 2013 (4/1/2012 to 3/31/2013) Revenue: $ 6.9 billion Segment Revenue Example Business

More information

International Journal of Advance Research in Computer Science and Management Studies

International Journal of Advance Research in Computer Science and Management Studies Volume 2, Issue 8, August 2014 ISSN: 2321 7782 (Online) International Journal of Advance Research in Computer Science and Management Studies Research Article / Survey Paper / Case Study Available online

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

Side channels in cloud services, the case of deduplication in cloud storage

Side channels in cloud services, the case of deduplication in cloud storage Side channels in cloud services, the case of deduplication in cloud storage Danny Harnik IBM Haifa Research Lab dannyh@il.ibm.com Benny Pinkas Bar Ilan University benny@pinkas.net Alexandra Shulman-Peleg

More information

Real-time Compression: Achieving storage efficiency throughout the data lifecycle

Real-time Compression: Achieving storage efficiency throughout the data lifecycle Real-time Compression: Achieving storage efficiency throughout the data lifecycle By Deni Connor, founding analyst Patrick Corrigan, senior analyst July 2011 F or many companies the growth in the volume

More information

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data

Energy Efficient Storage - Multi- Tier Strategies For Retaining Data Energy and Space Efficient Storage: Multi-tier Strategies for Protecting and Retaining Data NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White

More information

VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014

VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014 VMware vsphere Data Protection 5.8 TECHNICAL OVERVIEW REVISED AUGUST 2014 Table of Contents Introduction.... 3 Features and Benefits of vsphere Data Protection... 3 Additional Features and Benefits of

More information

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard UNDERSTANDING DATA DEDUPLICATION Tom Sas Hewlett-Packard SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material

More information

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India

More information

A SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA

A SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA INTERNATIONAL JOURNAL OF ADVANCED RESEARCH IN ENGINEERING AND SCIENCE A SIGNIFICANT REDUCTION OF CLOUD STORAGE BY ELIMINATION OF REPETITIVE DATA M.Rajashekar Reddy 1, B.Ramya 2 1 M.Tech Student, Dept of

More information

Future-Proofed Backup For A Virtualized World!

Future-Proofed Backup For A Virtualized World! ! Future-Proofed Backup For A Virtualized World! Prepared by: Colm Keegan, Senior Analyst! Prepared: January 2014 Future-Proofed Backup For A Virtualized World Like death and taxes, growing backup windows

More information

Deduplication has been around for several

Deduplication has been around for several Demystifying Deduplication By Joe Colucci Kay Benaroch Deduplication holds the promise of efficient storage and bandwidth utilization, accelerated backup and recovery, reduced costs, and more. Understanding

More information

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Advanced Storage Products Group Table of Contents 1 - Introduction 2 Data Deduplication 3

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique

Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Comparative analysis of mapreduce job by keeping data constant and varying cluster size technique Mahesh Maurya a, Sunita Mahajan b * a Research Scholar, JJT University, MPSTME, Mumbai, India,maheshkmaurya@yahoo.co.in

More information

Data Compression and Deduplication. LOC 2010 2010 Cisco Systems, Inc. All rights reserved.

Data Compression and Deduplication. LOC 2010 2010 Cisco Systems, Inc. All rights reserved. Data Compression and Deduplication LOC 2010 2010 Systems, Inc. All rights reserved. 1 Data Redundancy Elimination Landscape VMWARE DeDE IBM DDE for Tank Solaris ZFS Hosts (Inline and Offline) MDS + Network

More information

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance

Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance Business-centric Storage FUJITSU Storage ETERNUS CS800 Data Protection Appliance The easy solution for backup to disk with deduplication If you rethink your backup strategy, then think of ETERNUS CS800

More information

Get Success in Passing Your Certification Exam at first attempt!

Get Success in Passing Your Certification Exam at first attempt! Get Success in Passing Your Certification Exam at first attempt! Exam : E22-290 Title : EMC Data Domain Deduplication, Backup and Recovery Exam Version : DEMO 1.A customer has a Data Domain system with

More information

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,

More information

Tandberg Data AccuVault RDX

Tandberg Data AccuVault RDX Tandberg Data AccuVault RDX Binary Testing conducts an independent evaluation and performance test of Tandberg Data s latest small business backup appliance. Data backup is essential to their survival

More information

Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database

Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database JIOS, VOL. 35, NO. 1 (2011) SUBMITTED 02/11; ACCEPTED 06/11 UDC 004.75 Comparison of Cloud vs. Tape Backup Performance and Costs with Oracle Database University of Ljubljana Faculty of Computer and Information

More information

Berkeley Ninja Architecture

Berkeley Ninja Architecture Berkeley Ninja Architecture ACID vs BASE 1.Strong Consistency 2. Availability not considered 3. Conservative 1. Weak consistency 2. Availability is a primary design element 3. Aggressive --> Traditional

More information

Using HP StoreOnce Backup systems for Oracle database backups

Using HP StoreOnce Backup systems for Oracle database backups Technical white paper Using HP StoreOnce Backup systems for Oracle database backups Table of contents Introduction 2 Technology overview 2 HP StoreOnce Backup systems key features and benefits 2 HP StoreOnce

More information

DXi Accent Technical Background

DXi Accent Technical Background TECHNOLOGY BRIEF NOTICE This Technology Brief contains information protected by copyright. Information in this Technology Brief is subject to change without notice and does not represent a commitment on

More information

Symantec Backup Exec Blueprints

Symantec Backup Exec Blueprints Symantec Backup Exec Blueprints Blueprint for Remote Office Protection Backup Exec Technical Services Backup & Recovery Technical Education Services Symantec Backup Exec Blueprints 1 Symantec Backup Exec

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information

BALANCING FOR DISTRIBUTED BACKUP

BALANCING FOR DISTRIBUTED BACKUP CONTENT-AWARE LOAD BALANCING FOR DISTRIBUTED BACKUP Fred Douglis 1, Deepti Bhardwaj 1, Hangwei Qian 2, and Philip Shilane 1 1 EMC 2 Case Western Reserve University 1 Starting Point Deduplicating disk-based

More information

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski 19-05-2011 Spała

Data Deduplication in Tivoli Storage Manager. Andrzej Bugowski 19-05-2011 Spała Data Deduplication in Tivoli Storage Manager Andrzej Bugowski 19-05-2011 Spała Agenda Tivoli Storage, IBM Software Group Deduplication concepts Data deduplication in TSM 6.1 Planning for data deduplication

More information

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores

More information

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup

Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup Technical white paper Using HP StoreOnce Backup Systems for NDMP backups with Symantec NetBackup Table of contents Executive summary... 2 Introduction... 2 What is NDMP?... 2 Technology overview... 3 HP

More information

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007

Tiered Data Protection Strategy Data Deduplication. Thomas Störr Sales Director Central Europe November 8, 2007 Tiered Data Protection Strategy Data Deduplication Thomas Störr Sales Director Central Europe November 8, 2007 Overland Storage Tiered Data Protection = Good = Better = Best! NEO / ARCvault REO w/ expansion

More information

Data Deduplication HTBackup

Data Deduplication HTBackup Data Deduplication HTBackup HTBackup and it s Deduplication technology is touted as one of the best ways to manage today's explosive data growth. If you're new to the technology, these key facts will help

More information

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Efficient Backup with Data Deduplication Which Strategy is Right for You? Efficient Backup with Data Deduplication Which Strategy is Right for You? Rob Emsley Senior Director, Product Marketing CPU Utilization CPU Utilization Exabytes Why So Much Interest in Data Deduplication?

More information

VMware vsphere Data Protection 6.1

VMware vsphere Data Protection 6.1 VMware vsphere Data Protection 6.1 Technical Overview Revised August 10, 2015 Contents Introduction... 3 Architecture... 3 Deployment and Configuration... 5 Backup... 6 Application Backup... 6 Backup Data

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Table of Contents Introduction.... 3 vsphere Architectural Overview... 4 SAN Backup

More information

WHITE PAPER. Dedupe-Centric Storage. Hugo Patterson, Chief Architect, Data Domain. Storage. Deduplication. September 2007

WHITE PAPER. Dedupe-Centric Storage. Hugo Patterson, Chief Architect, Data Domain. Storage. Deduplication. September 2007 WHITE PAPER Dedupe-Centric Storage Hugo Patterson, Chief Architect, Data Domain Deduplication Storage September 2007 w w w. d a t a d o m a i n. c o m - 2 0 0 7 1 DATA DOMAIN I Contents INTRODUCTION................................

More information

Disaster Recovery Cloud Computing Changes Everything. Mark Hadfield CEO - LegalCloud Peter Westerveld IT Director - Minter Ellison Lawyers

Disaster Recovery Cloud Computing Changes Everything. Mark Hadfield CEO - LegalCloud Peter Westerveld IT Director - Minter Ellison Lawyers Disaster Recovery Cloud Computing Changes Everything Mark Hadfield CEO - LegalCloud Peter Westerveld IT Director - Minter Ellison Lawyers Agenda Disaster Recovery to Business Continuity What makes an effective

More information

We look beyond IT. Cloud Offerings

We look beyond IT. Cloud Offerings Cloud Offerings cstor Cloud Offerings As today s fast-moving businesses deal with increasing demands for IT services and decreasing IT budgets, the onset of cloud-ready solutions has provided a forward-thinking

More information

THE HADOOP DISTRIBUTED FILE SYSTEM

THE HADOOP DISTRIBUTED FILE SYSTEM THE HADOOP DISTRIBUTED FILE SYSTEM Konstantin Shvachko, Hairong Kuang, Sanjay Radia, Robert Chansler Presented by Alexander Pokluda October 7, 2013 Outline Motivation and Overview of Hadoop Architecture,

More information

EMC NETWORKER AND DATADOMAIN

EMC NETWORKER AND DATADOMAIN EMC NETWORKER AND DATADOMAIN Capabilities, options and news Madis Pärn Senior Technology Consultant EMC madis.parn@emc.com 1 IT Pressures 2009 0.8 Zettabytes 2020 35.2 Zettabytes DATA DELUGE BUDGET DILEMMA

More information

Data Deduplication Background: A Technical White Paper

Data Deduplication Background: A Technical White Paper Data Deduplication Background: A Technical White Paper NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice

More information

Recommendations for Performance Benchmarking

Recommendations for Performance Benchmarking Recommendations for Performance Benchmarking Shikhar Puri Abstract Performance benchmarking of applications is increasingly becoming essential before deployment. This paper covers recommendations and best

More information

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside Managing the information that drives the enterprise STORAGE Buying Guide: DEDUPLICATION inside What you need to know about target data deduplication Special factors to consider One key difference among

More information

Barracuda Backup Server. Introduction

Barracuda Backup Server. Introduction Barracuda Backup Server Introduction Backup & Recovery Conditions and Trends in the Market Barracuda Networks 2! Business Continuity! Business today operates around the clock Downtime is very costly Disaster

More information

DEDISbench: A Benchmark for Deduplicated Storage Systems

DEDISbench: A Benchmark for Deduplicated Storage Systems DEDISbench: A Benchmark for Deduplicated Storage Systems J. Paulo, P. Reis, J. Pereira and A. Sousa High-Assurance Software Lab (HASLab) INESC TEC & University of Minho Abstract. Deduplication is widely

More information

WHITE PAPER Data Deduplication for Backup: Accelerating Efficiency and Driving Down IT Costs

WHITE PAPER Data Deduplication for Backup: Accelerating Efficiency and Driving Down IT Costs WHITE PAPER Data Deduplication for Backup: Accelerating Efficiency and Driving Down IT Costs Sponsored by: EMC Corporation Laura DuBois May 2009 EXECUTIVE SUMMARY Global Headquarters: 5 Speen Street Framingham,

More information

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON UNDERSTANDING DATA DEDUPLICATION Thomas Rivera SEPATON SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material

More information

Symantec NetBackup 5220

Symantec NetBackup 5220 A single-vendor enterprise backup appliance that installs in minutes Data Sheet: Data Protection Overview is a single-vendor enterprise backup appliance that installs in minutes, with expandable storage

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Continuous Data Protection. PowerVault DL Backup to Disk Appliance

Continuous Data Protection. PowerVault DL Backup to Disk Appliance Continuous Data Protection PowerVault DL Backup to Disk Appliance Continuous Data Protection Current Situation The PowerVault DL Backup to Disk Appliance Powered by Symantec Backup Exec offers the industry

More information

Evaluation Guide. Software vs. Appliance Deduplication

Evaluation Guide. Software vs. Appliance Deduplication Evaluation Guide Software vs. Appliance Deduplication Table of Contents Introduction... 2 Data Deduplication Overview... 3 Backup Requirements... 6 Backup Application Client Side Deduplication... 7 Backup

More information

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

Deduplication and Beyond: Optimizing Performance for Backup and Recovery Beyond: Optimizing Gartner clients using deduplication for backups typically report seven times to 25 times the reductions (7:1 to 25:1) in the size of their data, and sometimes higher than 100:1 for file

More information

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s. UNDERSTANDING DATA DEDUPLICATION Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s. SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual

More information

Two-Level Metadata Management for Data Deduplication System

Two-Level Metadata Management for Data Deduplication System Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,

More information

Symantec NetBackup PureDisk Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines

Symantec NetBackup PureDisk Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines Mayur Dewaikar Sr. Product Manager Information Management Group White Paper: Symantec NetBackup PureDisk Symantec

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features that reduce the complexity of managing data protection

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

Backup and Recovery 1

Backup and Recovery 1 Backup and Recovery What is a Backup? Backup is an additional copy of data that can be used for restore and recovery purposes. The Backup copy is used when the primary copy is lost or corrupted. This Backup

More information

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper WHITE PAPER DATA DEDUPLICATION BACKGROUND: A Technical White Paper CONTENTS Data Deduplication Multiple Data Sets from a Common Storage Pool.......................3 Fixed-Length Blocks vs. Variable-Length

More information

Efficiently Storing Virtual Machine Backups

Efficiently Storing Virtual Machine Backups Efficiently Storing Virtual Machine Backups Stephen Smaldone, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract Physical level backups offer increased performance

More information