The assignment of chunk size according to the target data characteristics in deduplication backup system

Size: px
Start display at page:

Download "The assignment of chunk size according to the target data characteristics in deduplication backup system"

Transcription

1 The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai, Nakai-machi, Ashigarakami-gun, Kanagawa, Osaka University 2-1, Yamadaoka, Suita, Osaka, Abstract This paper focuses on the trade-off between the deduplication rate and the processing penalty in backup system which uses a conventional variable chunking method. The trade-off is a nonlinear negative correlation if the chunk size is fixed. In order to analyze quantitatively the trade-off all over the factors, a simulation approach is taken and clarifies the several correlations among chunk sizes, densities and average lengths of the different parts. Then it clarifies to assign an appropriate chunk size based on the data characteristics dynamically is effective to weaken the trade-off and provide higher efficiency than a conventional way. Keywords: Deduplication, Backup, Archive, Capacity Optimization, Enterprise Storage 1 Introduction Due to the explosive increase of the data in IT system, backup operation are becoming a burden to the system because of the resource usage, the processing time and the managing cost, while it is an indispensable operation to recover the data in case of unpredictable disaster. Major requirements to backup operations are to shorten the processing time and to reduce the resource usage, especially storage capacity to keep the data for a long term. Recently, the technology called deduplication has become popular to reduce the burden. This is a technology to eliminate the duplication of the backup target data and store only unique data in the storage. The reduction of stored data reduces not only the backup storage cost but also other resources running workload. Various techniques have been proposed so far to provide more deduction with less processing time from the point view of more cost-effective backup operation[1][2]. However, because the deduplication rate and the processing time have a non-linear negative correlation, it is difficult to improve both rate and time simultaneously if only one chunking size is activated. This paper shows how to assign the appropriate chunking size according to the data characteristics. The assignment weakens the trade-off in single chunking method and therefore provides more efficient deduplication than any conventional approach. 2 Deduplication backup system 2.1 Variable chunking algorithm In usual customers environment, backup target data include various types of files and how much the duplication patterns remains or how they locates depends on the customers environments or applications, for example some are scarcely duplicated because it was much edited, changed or updated, some are densely duplicated because it was rarely changed, just replicated or copied. In this paper, all changed or updated parts of data are called differed area and unchanged parts are called identical area. A deduplication backup system has a processing module which eliminates the duplication within the target data. A module reads the target data, divides into many small segments, distinguishes the duplicated area from unique or newly updated area, then transfers and stores only the unique data in the storage. Figure 1 shows the typical operational flow of the deduplication process from reading the target data and finalizing to store the unique data. The size of the box is not to scale. A module divides the target data in small segments called chunk which are the unit of reduction or storing (Chunking). Two dividing al-

2 Figure 1. Deduplication process. gorithms are widely implemented, fixed length chunking or variable length chunking. Fixed length chunking divides the data into chunks with the same length that is defined in advance. Variable length chunking divides into chunks with variable length that is calculated by the data patterns. A typical algorithm of variable length chunking scans the data from the beginning of the file one byte at a time to the end by shifting a fixed length window[3][4]. The algorithm generates a special value, called a signature, based on the window using a Rabin s algorithm[5]. Then when the signature is recognized to match a predefined value, called anchor, the end byte of the windows is set at the end of the chunk. Otherwise the window is shifted by one byte and generate a signature again. The average chunk size is defined from how many bits are taken as a signature. For example, the length of the signature is 12 bits, under the assumption that the calculation of signature generates completely random values, 2 12 patterns are possible, this results in 4KB chunk size in average. Next, the module calculates the unique code from the chunk data to analyze the similarity of chunks using. The hashing algorithm is commonly used to calculate the code for example by SHA-1, SHA-256. This code is called by fingerprint (Fingerprinting). Next the module decides the uniqueness of each chunk using the fingerprint (Decision), whether the chunk is duplicated that the same chunk has been stored or not duplicated that all previous chunks are different. Finally, the module write out only not duplicated chunks into the storage (Writing). 2.2 Issues Many approaches have been proposed and implemented in order to increase the efficiency of deduplication. For the customers, two criteria are important to discuss the efficiency, one is the deduplication rate and the other is the processing time. The deduplication rate is desirable to be higher and the processing time is desirable to be shorter. How to choose the adequate chunk size of the algorithm is one key sensitive factor, because that affects heavily to the efficiency. In many cases, the chunk size is determined based on the consideration to keep the balance of the performance and the deduplication capability for an assuming hypothetical environment[6][7]. In order to gain more reduction, to assign smaller chunk size is generally effective. A smaller chunk size can pick up the differed area in more precise than a bigger chunk size. However, the effectiveness of the chunk size, that is how much sharply the chunk size cut off the duplication, very much depends on the distribution of differed area among the data, such as a length of each differed area and a distance between differed areas. When a small chunk size is assigned to longer differed data, many continuous chunks may be wasted to remain not duplicated area. When differed area are located with similar distance to the chunk size, many chunks may include both of differed area and identical area, this results in poor deduplication rate. When differed area are distributed with much smaller or much bigger distance comparing to the chunk size, the effectiveness is improved. Processing time, that is how much time or resource utilization is necessary to reduce the duplication, is also depends on the chunk size and others. A smaller chunk size requires more CPU and IO intensive processing time, a bigger one does less time. The less duplicated, less storing operation to the storage. Further, the correlation of the process time and the chunk sizes are non-linear, for example, for a smaller chunk size, the processing time increases more steeply as the size decreases. This non-linearity makes the compensation of trade-off hard to overcome. In any conventional way, only one chunk size is assigned for all data or all environments, it is not effective. If the chunk size is assigned to gain better deduplication rate with smaller processing time corresponding to the characteristics of data or environmental, it is beneficial for the system. 3 Simulation 3.1 Implementation The followings are the operation and assumptions of our simulator. As a convenience, we call the simulator SIM.

3 (1) Backup data Target backup data is implemented as a list with 1024*1024 length, one entity represents 64 bytes length of real data. Therefore, it represents in total 64MB real data length. In the stream, there are many area that represents identical, identical to one of previously stored area and many area which represents differed, not identical to anyone of previously stored area. SIM reduces all of identical area as much as it can and remains definitely all of differed area by the process. The ratio of the amount of all identical area to the total amount of data is defined r as a duplication rate. The domain of r is 0 r 1. In other words, the ratio of all differed area comes to 1 r. These differed areas are spread with equal density over the stream. (2) Chunking SIM assumes an anchor algorithm to separate the data into chunks. The signature length to control the average chunk size is set to 11 to 15 bits. These come to 2 to 32 KB in average chunk size respectively. In this case, the sizes of chunks fits to geometrical distribution[8]. The distribution parameter p is equal to 1/2 n, here n is the window size in bits. We call the case that the average chunk size is m as a notation of (m) for convenience. From industrial practical reason, SIM set two boundaries of the minimum and the maximum size. The minimum size is set to the half of chunk size, and the maximum size is set to one and half of the chunk size. For example, when an average chunk size is 8KB, the minimum size is 4KB and the maximum size is 12KB. Table. 1 is a list of parameters of the simulation. Table 1. Parameters of simulation Parameter Values Duplication rate: r 0.2, 0.3, 0.4, Length of differed area 2, 4, 8, [KB] Average chunk size: m 2, 3, 4, [KB] (3) Decision SIM checks whether the chunk includes differed area. When the chunk includes no differed area at all, SIM decides the chunk should be removed. When the chunk includes fully or partially differed area, SIM decided the chunk should be remained. The deduplication rate is calculated as a ratio of total amount of chunks which are decided to be removed to total amount of data. In addition, the fulfillment rate is calculated as a ratio of the deduplication rate to the duplication rate. The number of reduced chunks is counted as a number of identical therefore removed chunks by SIM. Since the processing time of deduplication is very varied on the resource configuration, usage in the system, SIM evaluates the total number of removed chunks as a uniform substitute. (4) Environment SIM is coded by the language R GUI bit, operated on the environment of the hardware consists of Intel Core i7 2.6GB CPU, 8GB (1600MHz, DDR3) memory. 3.2 Evaluation of the validity of SIM comparing the measured data At first, the validity of SIM is evaluated by comparing the equivalent data that are gathered from the measurement using real hardware. Our experiment hardware is based on a linux based NAS (Network Attached Storage) product connecting to HDD storage, NFS (Network File System) interface and a variable chunking mechanism is coded. Chunk size is controlled as an parameter of average chunk size for NAS and controlled as 1 p for SIM. These parameters are called here operational chunk sizes. Total amount of data in measurement is 1GB, the length of differed area is fixed at 32KB. The differed areas are randomly distributed over the data with control parameters of duplication rate that are assumed to reflect practical data characteristics. NAS reports the deduplication rate, the average chunk size, the number of reduced chunks by the experiment. Table 2 shows the comparison of two chunk sizes gathered from each experiment. One is chunk size that NAS divides in the measurement, the other is one that SIM divides in the simulation. Simulated chunk sizes match well to measured chunk sizes in the area of smaller operational chunk size. Simulated chunk size become gradually shorter than measured as operational chunk size increases. This comes from the difference that NAS uses multiple chunking algorithm installed, on the other hand, SIM uses only one variable chunking algorithm. Table 3 and figure 2 show the comparison of deduplication rates. The rates matches in smaller chunk size, 2% difference in case of 8KB. The

4 Table 3. Comparison of simulated and measured deduplication rate Operational measured simulated difference [%] chunk size KB KB KB KB KB KB Table 2. Comparison of simulated and measured chunk size Operational measured simulated difference chunk size [KB] [KB] [%] 2KB KB KB KB KB KB difference become larger as the chunk size increases. The rate become worse for NAS as the duplication rate increases. This comes from the difference that in NAS some additional information is embedded in to the stream to improve the data integrity, SIM does not. The embedded information spoil the deduplication efficiency and lessen the rate as the chunk size increases. The rate is better for NAS in case that the duplication rate equals to 0.5. This comes from the difference that NAS uses multiple chunking algorithms, results in higher deduplication rate, at the same time NAS has the penalty to lessen the rate by embedded information which becomes bigger as duplication increases. 4 Evaluation 4.1 Dependency on duplication rate Figure 3 shows the correlation of deduplication rate and the number of reduced chunks by changing the duplication rate of 0.2, 0.4, 0.5, 0.6, 0.8 under the same length of differed size of 32KB. The number of reduced chunks that is removed by SIM increases as the duplication rate increases or the chunk size decreases. A criteria is useful to analyze the effects of changing chunk sizes. As shown in the figure, as the chunk size decrease starting from 32KB, the number of re- Deduplication rate KB measured 8KB measured 16KB measured 32KB measured 64KB measured Simulated Duplication rate Figure 2. Deduplication rate by Single layer deduplication system Number of reduced chunks r=0.2 r=0.4 r=0.5 r=0.6 r=0.8 Opt. (32) (64) (2) (4) (8) (16) Cost Deduplication rate Figure 3. Reduced number of chunks by duplication rate (length of differed area = 32KB) duced chunks becomes 3.2 times by (16) from (32), the deduplication rate improves 60% from to Then, the number of reduced chunks becomes 2.5 times by (8) from (16), the deduplication rate improves 30%, successively 2.0 times and only 3% by (2) from (4). The improvement ratio decrease as chunk size decreases. This means to decrement chunk size in bigger chunking situation gains more improvement with less penalty. At the same time, the improvement is more clear as the duplication rate decrease.

5 The deduplication rate and the fulfillment rate with each chunk size increase as the duplication rate increase. This comes because the deduplication works more efficiently as the potential of mixture of identical area and differed area in the same chunk becomes lower along duplication rate increase. 4.2 Dependency on length of differed area Figure 4 shows the correlation between the deduplication rate and the number of reduced chunks by changing the length of differed area of 2, 4, 8, 16, 32KB under the same duplication rate of 0.6. As the length of differed area increase, the corresponding number of reduced chunks increase and deduplication rate also increase under the same chunk size. This comes from the decrement of deduplication inefficiency because the potential of mixture of identical area and differed area in a chunk become lower as length of differed area increase. The improvement of the deduplication rate along decreasing of chunk size becomes smaller as the length of differed area increases. This comes because the preciseness of reduction by using small chunks degrades. The number of reduced chunks increase and therefore the deduplication rate increase as the length of differed area increases for all chunk sizes. The improvement becomes smaller for small chunk sizes. For example, when the length of differed area increase from 2KB to 32KB, the improvement of deduplication rate is 1.8 times in case of (2), 3.3 times for (4), 9.5 times for (8), and so on. This indicates to assign bigger chunk size is more effective than the smaller ones in case that the length of differed area widely varies in the data. Number of reduced chunks KB differed length 4KB differed length 8KB differed length 16KB differed length 32KB differed length (16) (8) (2) (4) Deduplication rate Figure 4. Reduced number of chunks by length of dirty (duplication rate = 0.6) 4.3 Assignment of optimal chunk size As described, there is a trade-off between the number of reduced chunk size and the deduplication rate. Therefore, when the problem to assign the optimal chunk size is defined as a linear programming with two objectives, to maximize the deduplication rate and to minimize the number of reduced chunks, it generates a Pareto-optimal solutions, because there exists no feasible solution for which an improvement in one objective does not lead to a simultaneous degradation in the other objectives. In figure 3 or figure 4, each solid line represents one corresponding Pareto-optimal solutions. Generally, to assign the best chunk size as an optimal one depend on the customers choice, considering of various factors from their proprietary priorities[9]. Without neglecting practical priorities, this paper simplify the customers factors and priorities in two costs. One factor is the operational cost, which includes the cost of resources like server, LAN(Local Area Network) SAN(Storage Area network), utilities and the cost of management. The other factor is the capital cost like Storage and the cost of maintenance. The priorities is to minimize the total cost. Here, the cost is defined in the following function, the optimal chunk size is assigned to minimize the cost. C = A ReducedChunks B DedupedRate Here, ReducedChunks: Number of reduced chunks DedupedRate : Deduplication rate Figure 3 shows an example of cost function. Optimul chunk size KB differed length 32KB differed length 8KB differed length Duplication rate Figure 5. Optimal chunk size to minimize the cost (length of dirty = KB) The dotted line indicates the cost for corresponding deduplication rate and number of reduced chunks, A is set to 1 and B to In case of duplication rate of 0.6, the chunk size to pro-

6 vide the minimum cost is 8KB. Figure 5 shows the chunk size to minimize the cost. The optimal chunk size increase as the duplication rate increases. This comes from the characteristic of bigger chunks that reduce coarsely with short process time. Further, the optimal chunk size increase as the length of differed area increases. This comes from the same reason. In addition, the difference between the optimal chunk size increases as the duplication rate increases. This comes because the effectiveness to use bigger chunks becomes weak as the duplication rate increases in case of short length of differed area. In commercially available products, 4KB chunk size is a popular size. Table 4 shows the improvement for the cases of duplication rate of 0.2, 0.4, 0.6, 0.8 and the length of differed area of 8, 32, 64KB. The improvement of optimal chunks increases as the duplication rate and the length of differed area increase, becomes 33% in the case of 0.8 and 64KB. Table 4. Improvement of cost value by the optimal chunk size to conventional 4KB. Length of Duplication rate differed area KB Optimal (3) (4) (5) (8) Improve KB Optimal (4) (7) (12) (15) Improve KB Optimal (6) (8) (13) (19) Improve Conclusion The paper simulates variable chunking algorithm in backup deduplication system. First, the simulation clarify changing chunk size is more beneficial for bigger chunks than smaller chunks, gains more improvement with less penalty. In addition, the improvement is increase as the duplication rate decrease. Next, to assign bigger chunk size is more effective than the smaller ones in case that the length of differed area widely varies within the data. Finally, under the reasonable assumption to minimize the cost, the optimal chunk size is 3KB in case of 0.2 duplication rate and 8KB length of differed area with 27% improvement than fixed 4K chunk size, similarly the optimal chunk size is 19KB in case of 0.8, 64KB with the improvement of 33%. References [1] Y. Tan et al. Dam: A data ownershipaware multi-layered de-duplication scheme. In 2010 Fifth IEEE International Conference on Networking, Architecture and Storage. IDC-Japan, [2] Y. Won, J. Ban, J. Min, L. Hur, S. Oh, and J. Lee. Efficient index lookup for de-duplication backup system. In Modeling, Analysis and Simulation of Computers and Telecommunication Systems, MASCOTS IEEE International Symposium on (Poster Presentation), Sept [3] U. Manber. Finding similar files in a large file system. In Proceedings of the USENIX Winter 1994 Technical Conference, [4] A. Muthitacharoen, B. Chen, and D. Mazières. A low-bandwidth network file system. In Proceeding of SIGOPS. 18th Symposium on Operating Systems Principles., Banff, Canada, [5] M. O. Rabin. Fingerprinting by random polynomials. Technical report, Department of Computer Science, Harvard University, [6] D. Meister and A. Brinkmann. Multilevel comparison of data deduplication in a backup scenario. In Proceedings of SYSTOR 2009, The 2nd Annual International Systems and Storage Conference. ACM, May [7] G. Wallace, F. Douglis, H. Qian, P. Shilane, S. Smaldone, M. Chamness, and W. Hsu. Characteristics of backup workloads in production systems. In Proceedings of the 10th USENIX Conference on File and Storage Technologies, [8] J. Min, D. Yoon, and Y. Won. Efficient deduplication techniques for modern backup operation. In IEEE Transaction on computer, Vol. 60, No. 6, [9] M. Ogata and N. Komoda. Optimized assignment of deduplication backup methods using integer programming. In Proceedings of JCIS2011: The 4th Japan-China Joint Symposium on Information Systems, Apr

A Deduplication-based Data Archiving System

A Deduplication-based Data Archiving System 2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

A Data De-duplication Access Framework for Solid State Drives

A Data De-duplication Access Framework for Solid State Drives JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 941-954 (2012) A Data De-duplication Access Framework for Solid State Drives Department of Electronic Engineering National Taiwan University of Science

More information

De-duplication-based Archival Storage System

De-duplication-based Archival Storage System De-duplication-based Archival Storage System Than Than Sint Abstract This paper presents the disk-based backup system in which only relational database files are stored by using data deduplication technology.

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve

More information

Data Backup and Archiving with Enterprise Storage Systems

Data Backup and Archiving with Enterprise Storage Systems Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia slavjan_ivanov@yahoo.com,

More information

Efficient File Storage Using Content-based Indexing

Efficient File Storage Using Content-based Indexing Efficient File Storage Using Content-based Indexing João Barreto joao.barreto@inesc-id.pt Paulo Ferreira paulo.ferreira@inesc-id.pt Distributed Systems Group - INESC-ID Lisbon Technical University of Lisbon

More information

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

File Size Distribution Model in Enterprise File Server toward Efficient Operational Management

File Size Distribution Model in Enterprise File Server toward Efficient Operational Management Proceedings of the World Congress on Engineering and Computer Science 212 Vol II WCECS 212, October 24-26, 212, San Francisco, USA File Size Distribution Model in Enterprise File Server toward Efficient

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

Byte-index Chunking Algorithm for Data Deduplication System

Byte-index Chunking Algorithm for Data Deduplication System , pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko

More information

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Table of Contents Introduction... 3 Shortest Possible Backup Window... 3 Instant

More information

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India

More information

BALANCING FOR DISTRIBUTED BACKUP

BALANCING FOR DISTRIBUTED BACKUP CONTENT-AWARE LOAD BALANCING FOR DISTRIBUTED BACKUP Fred Douglis 1, Deepti Bhardwaj 1, Hangwei Qian 2, and Philip Shilane 1 1 EMC 2 Case Western Reserve University 1 Starting Point Deduplicating disk-based

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory

More information

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant DISCOVER HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant HP StorageWorks Data Protection Solutions HP has it covered Near continuous data protection Disk Mirroring Advanced Backup

More information

3Gen Data Deduplication Technical

3Gen Data Deduplication Technical 3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and

More information

Demystifying Deduplication for Backup with the Dell DR4000

Demystifying Deduplication for Backup with the Dell DR4000 Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett

More information

Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets

Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets Young Jin Nam School of Computer and Information Technology Daegu University Gyeongsan, Gyeongbuk, KOREA 7-7 Email:

More information

Data Storage Framework on Flash Memory using Object-based Storage Model

Data Storage Framework on Flash Memory using Object-based Storage Model 2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51. 118 Data Storage Framework

More information

Two-Level Metadata Management for Data Deduplication System

Two-Level Metadata Management for Data Deduplication System Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,

More information

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Quanqing XU Quanqing.Xu@nicta.com.au YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data

More information

Energy aware RAID Configuration for Large Storage Systems

Energy aware RAID Configuration for Large Storage Systems Energy aware RAID Configuration for Large Storage Systems Norifumi Nishikawa norifumi@tkl.iis.u-tokyo.ac.jp Miyuki Nakano miyuki@tkl.iis.u-tokyo.ac.jp Masaru Kitsuregawa kitsure@tkl.iis.u-tokyo.ac.jp Abstract

More information

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Table of Contents Introduction.... 3 vsphere Architectural Overview... 4 SAN Backup

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges September 2011 Table of Contents The Enterprise and Mobile Storage Landscapes... 3 Increased

More information

Data Deduplication: An Essential Component of your Data Protection Strategy

Data Deduplication: An Essential Component of your Data Protection Strategy WHITE PAPER: THE EVOLUTION OF DATA DEDUPLICATION Data Deduplication: An Essential Component of your Data Protection Strategy JULY 2010 Andy Brewerton CA TECHNOLOGIES RECOVERY MANAGEMENT AND DATA MODELLING

More information

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE

IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE White Paper IBM TSM DISASTER RECOVERY BEST PRACTICES WITH EMC DATA DOMAIN DEDUPLICATION STORAGE Abstract This white paper focuses on recovery of an IBM Tivoli Storage Manager (TSM) server and explores

More information

How To Make A Backup System More Efficient

How To Make A Backup System More Efficient Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem October, 2006 Introduction Data de-duplication has recently gained significant industry attention,

More information

Contents. WD Arkeia Page 2 of 14

Contents. WD Arkeia Page 2 of 14 Contents Contents...2 Executive Summary...3 What Is Data Deduplication?...4 Traditional Data Deduplication Strategies...5 Deduplication Challenges...5 Single-Instance Storage...5 Fixed-Block Deduplication...6

More information

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP 1 M.SHYAMALA DEVI, 2 V.VIMAL KHANNA, 3 M.SHAHEEN SHAH 1 Assistant Professor, Department of CSE, R.M.D.

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information

Exploring RAID Configurations

Exploring RAID Configurations Exploring RAID Configurations J. Ryan Fishel Florida State University August 6, 2008 Abstract To address the limits of today s slow mechanical disks, we explored a number of data layouts to improve RAID

More information

Cumulus: filesystem backup to the Cloud

Cumulus: filesystem backup to the Cloud Michael Vrable, Stefan Savage, a n d G e o f f r e y M. V o e l k e r Cumulus: filesystem backup to the Cloud Michael Vrable is pursuing a Ph.D. in computer science at the University of California, San

More information

An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space

An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space Jannu.Prasanna Krishna M.Tech Student, Department of CSE,

More information

A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm

A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm GUIPING WANG 1, SHUYU CHEN 2*, AND JUN LIU 1 1 College of Computer Science Chongqing University No.

More information

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract

More information

EMC BACKUP-AS-A-SERVICE

EMC BACKUP-AS-A-SERVICE Reference Architecture EMC BACKUP-AS-A-SERVICE EMC AVAMAR, EMC DATA PROTECTION ADVISOR, AND EMC HOMEBASE Deliver backup services for cloud and traditional hosted environments Reduce storage space and increase

More information

Data Deduplication HTBackup

Data Deduplication HTBackup Data Deduplication HTBackup HTBackup and it s Deduplication technology is touted as one of the best ways to manage today's explosive data growth. If you're new to the technology, these key facts will help

More information

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server)

How To Test For Performance And Scalability On A Server With A Multi-Core Computer (For A Large Server) Scalability Results Select the right hardware configuration for your organization to optimize performance Table of Contents Introduction... 1 Scalability... 2 Definition... 2 CPU and Memory Usage... 2

More information

Efficiently Storing Virtual Machine Backups

Efficiently Storing Virtual Machine Backups Efficiently Storing Virtual Machine Backups Stephen Smaldone, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract Physical level backups offer increased performance

More information

Offline Deduplication for Solid State Disk Using a Lightweight Hash Algorithm

Offline Deduplication for Solid State Disk Using a Lightweight Hash Algorithm JOURNAL OF SEMICONDUCTOR TECHNOLOGY AND SCIENCE, VOL.15, NO.5, OCTOBER, 2015 ISSN(Print) 1598-1657 http://dx.doi.org/10.5573/jsts.2015.15.5.539 ISSN(Online) 2233-4866 Offline Deduplication for Solid State

More information

HTTP-Level Deduplication with HTML5

HTTP-Level Deduplication with HTML5 HTTP-Level Deduplication with HTML5 Franziska Roesner and Ivayla Dermendjieva Networks Class Project, Spring 2010 Abstract In this project, we examine HTTP-level duplication. We first report on our initial

More information

Veeam Best Practices with Exablox

Veeam Best Practices with Exablox Veeam Best Practices with Exablox Overview Exablox has worked closely with the team at Veeam to provide the best recommendations when using the the Veeam Backup & Replication software with OneBlox appliances.

More information

Technology Fueling the Next Phase of Storage Optimization

Technology Fueling the Next Phase of Storage Optimization White Paper HP StoreOnce Deduplication Software Technology Fueling the Next Phase of Storage Optimization By Lauren Whitehouse June, 2010 This ESG White Paper was commissioned by Hewlett-Packard and is

More information

Tradeoffs in Scalable Data Routing for Deduplication Clusters

Tradeoffs in Scalable Data Routing for Deduplication Clusters Tradeoffs in Scalable Data Routing for Deduplication Clusters Wei Dong Princeton University Fred Douglis EMC Kai Li Princeton University and EMC Hugo Patterson EMC Sazzala Reddy EMC Philip Shilane EMC

More information

PIONEER RESEARCH & DEVELOPMENT GROUP

PIONEER RESEARCH & DEVELOPMENT GROUP SURVEY ON RAID Aishwarya Airen 1, Aarsh Pandit 2, Anshul Sogani 3 1,2,3 A.I.T.R, Indore. Abstract RAID stands for Redundant Array of Independent Disk that is a concept which provides an efficient way for

More information

Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,

More information

A Method of Deduplication for Data Remote Backup

A Method of Deduplication for Data Remote Backup A Method of Deduplication for Data Remote Backup Jingyu Liu 1,2, Yu-an Tan 1, Yuanzhang Li 1, Xuelan Zhang 1, Zexiang Zhou 3 1 School of Computer Science and Technology, Beijing Institute of Technology,

More information

Design of a NAND Flash Memory File System to Improve System Boot Time

Design of a NAND Flash Memory File System to Improve System Boot Time International Journal of Information Processing Systems, Vol.2, No.3, December 2006 147 Design of a NAND Flash Memory File System to Improve System Boot Time Song-Hwa Park*, Tae-Hoon Lee*, and Ki-Dong

More information

Protect Microsoft Exchange databases, achieve long-term data retention

Protect Microsoft Exchange databases, achieve long-term data retention Technical white paper Protect Microsoft Exchange databases, achieve long-term data retention HP StoreOnce Backup systems, HP StoreOnce Catalyst, and Symantec NetBackup OpenStorage Table of contents Introduction...

More information

Cloud De-duplication Cost Model THESIS

Cloud De-duplication Cost Model THESIS Cloud De-duplication Cost Model THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Christopher Scott Hocker

More information

Data Compression and Deduplication. LOC 2010 2010 Cisco Systems, Inc. All rights reserved.

Data Compression and Deduplication. LOC 2010 2010 Cisco Systems, Inc. All rights reserved. Data Compression and Deduplication LOC 2010 2010 Systems, Inc. All rights reserved. 1 Data Redundancy Elimination Landscape VMWARE DeDE IBM DDE for Tank Solaris ZFS Hosts (Inline and Offline) MDS + Network

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...

More information

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper HP StoreOnce D2D Understanding the challenges associated with NetApp s deduplication Business white paper Table of contents Challenge #1: Primary deduplication: Understanding the tradeoffs...4 Not all

More information

STORAGE. 2015 Arka Service s.r.l.

STORAGE. 2015 Arka Service s.r.l. STORAGE STORAGE MEDIA independently from the repository model used, data must be saved on a support (data storage media). Arka Service uses the most common methods used as market standard such as: MAGNETIC

More information

OFFLOADING THE CLIENT-SERVER TRE EFFORT FOR MINIMIZING CLOUD BANDWITH AND COST

OFFLOADING THE CLIENT-SERVER TRE EFFORT FOR MINIMIZING CLOUD BANDWITH AND COST OFFLOADING THE CLIENT-SERVER TRE EFFORT FOR MINIMIZING CLOUD BANDWITH AND COST Akshata B Korwar #1,Ashwini B Korwar #2,Sharanabasappa D Hannure #3 #1 Karnataka Kalburgi, 9742090637, korwar9.aksha ta@ gmail.com.

More information

Evaluating HDFS I/O Performance on Virtualized Systems

Evaluating HDFS I/O Performance on Virtualized Systems Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang xtang@cs.wisc.edu University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing

More information

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,

More information

Binary search tree with SIMD bandwidth optimization using SSE

Binary search tree with SIMD bandwidth optimization using SSE Binary search tree with SIMD bandwidth optimization using SSE Bowen Zhang, Xinwei Li 1.ABSTRACT In-memory tree structured index search is a fundamental database operation. Modern processors provide tremendous

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features that reduce the complexity of managing data protection

More information

Partition Alignment Dramatically Increases System Performance

Partition Alignment Dramatically Increases System Performance Partition Alignment Dramatically Increases System Performance Information for anyone in IT that manages large storage environments, data centers or virtual servers. Paragon Software Group Paragon Alignment

More information

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software

Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software WHITEPAPER Accelerating Enterprise Applications and Reducing TCO with SanDisk ZetaScale Software SanDisk ZetaScale software unlocks the full benefits of flash for In-Memory Compute and NoSQL applications

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features to reduce the complexity of managing data protection

More information

OPTIMIZING SERVER VIRTUALIZATION

OPTIMIZING SERVER VIRTUALIZATION OPTIMIZING SERVER VIRTUALIZATION HP MULTI-PORT SERVER ADAPTERS BASED ON INTEL ETHERNET TECHNOLOGY As enterprise-class server infrastructures adopt virtualization to improve total cost of ownership (TCO)

More information

Solid State Storage in Massive Data Environments Erik Eyberg

Solid State Storage in Massive Data Environments Erik Eyberg Solid State Storage in Massive Data Environments Erik Eyberg Senior Analyst Texas Memory Systems, Inc. Agenda Taxonomy Performance Considerations Reliability Considerations Q&A Solid State Storage Taxonomy

More information

Google File System. Web and scalability

Google File System. Web and scalability Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might

More information

Security Ensured Redundant Data Management under Cloud Environment

Security Ensured Redundant Data Management under Cloud Environment Security Ensured Redundant Data Management under Cloud Environment K. Malathi 1 M. Saratha 2 1 PG Scholar, Dept. of CSE, Vivekanandha College of Technology for Women, Namakkal. 2 Assistant Professor, Dept.

More information

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518

International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 ISSN 2229-5518 International Journal of Scientific & Engineering Research, Volume 4, Issue 11, November-2013 349 Load Balancing Heterogeneous Request in DHT-based P2P Systems Mrs. Yogita A. Dalvi Dr. R. Shankar Mr. Atesh

More information

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation

WHITE PAPER. Permabit Albireo Data Optimization Software. Benefits of Albireo for Virtual Servers. January 2012. Permabit Technology Corporation WHITE PAPER Permabit Albireo Data Optimization Software Benefits of Albireo for Virtual Servers January 2012 Permabit Technology Corporation Ten Canal Park Cambridge, MA 02141 USA Phone: 617.252.9600 FAX:

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Energy Efficiency in Secure and Dynamic Cloud Storage

Energy Efficiency in Secure and Dynamic Cloud Storage Energy Efficiency in Secure and Dynamic Cloud Storage Adilet Kachkeev Ertem Esiner Alptekin Küpçü Öznur Özkasap Koç University Department of Computer Science and Engineering, İstanbul, Turkey {akachkeev,eesiner,akupcu,oozkasap}@ku.edu.tr

More information

Cyber Forensic for Hadoop based Cloud System

Cyber Forensic for Hadoop based Cloud System Cyber Forensic for Hadoop based Cloud System ChaeHo Cho 1, SungHo Chin 2 and * Kwang Sik Chung 3 1 Korea National Open University graduate school Dept. of Computer Science 2 LG Electronics CTO Division

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

A Efficient Hybrid Inline and Out-of-line Deduplication for Backup Storage

A Efficient Hybrid Inline and Out-of-line Deduplication for Backup Storage A Efficient Hybrid Inline and Out-of-line Deduplication for Backup Storage YAN-KIT Li, MIN XU, CHUN-HO NG, and PATRICK P. C. LEE The Chinese University of Hong Kong Backup storage systems often remove

More information

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda

How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda How swift is your Swift? Ning Zhang, OpenStack Engineer at Zmanda Chander Kant, CEO at Zmanda 1 Outline Build a cost-efficient Swift cluster with expected performance Background & Problem Solution Experiments

More information

CrashPlan PRO Enterprise Backup

CrashPlan PRO Enterprise Backup CrashPlan PRO Enterprise Backup People Friendly, Enterprise Tough CrashPlan PRO is a high performance, cross-platform backup solution that provides continuous protection onsite, offsite, and online for

More information

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy, and Hong Tang University of California at Santa Barbara, Alibaba Inc. Abstract In a virtualized

More information

Booting from NAND Flash Memory

Booting from NAND Flash Memory Booting from NAND Flash Memory Introduction NAND flash memory technology differs from NOR flash memory which has dominated the embedded flash memory market in the past. Traditional applications for NOR

More information

ALG De-dupe for Cloud Backup Services of personal Storage Uma Maheswari.M, umajamu30@gmail.com DEPARTMENT OF ECE, IFET College of Engineering

ALG De-dupe for Cloud Backup Services of personal Storage Uma Maheswari.M, umajamu30@gmail.com DEPARTMENT OF ECE, IFET College of Engineering ALG De-dupe for Cloud Backup Services of personal Storage Uma Maheswari.M, umajamu30@gmail.com DEPARTMENT OF ECE, IFET College of Engineering ABSTRACT Deduplication due to combination of resource intensive

More information

EMC VNXe File Deduplication and Compression

EMC VNXe File Deduplication and Compression White Paper EMC VNXe File Deduplication and Compression Overview Abstract This white paper describes EMC VNXe File Deduplication and Compression, a VNXe system feature that increases the efficiency with

More information

Protecting Information in a Smarter Data Center with the Performance of Flash

Protecting Information in a Smarter Data Center with the Performance of Flash 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 Protecting Information in a Smarter Data Center with the Performance of Flash IBM FlashSystem and IBM ProtecTIER Printed in

More information

Reclaiming Primary Storage with Managed Server HSM

Reclaiming Primary Storage with Managed Server HSM White Paper Reclaiming Primary Storage with Managed Server HSM November, 2013 RECLAIMING PRIMARY STORAGE According to Forrester Research Inc., the total amount of data warehoused by enterprises is doubling

More information

TEST REPORT SUMMARY MAY 2010 Symantec Backup Exec 2010: Source deduplication advantages in database server, file server, and mail server scenarios

TEST REPORT SUMMARY MAY 2010 Symantec Backup Exec 2010: Source deduplication advantages in database server, file server, and mail server scenarios TEST REPORT SUMMARY MAY 21 Symantec Backup Exec 21: Source deduplication advantages in database server, file server, and mail server scenarios Executive summary Symantec commissioned Principled Technologies

More information

Dynamic Load Balancing of Virtual Machines using QEMU-KVM

Dynamic Load Balancing of Virtual Machines using QEMU-KVM Dynamic Load Balancing of Virtual Machines using QEMU-KVM Akshay Chandak Krishnakant Jaju Technology, College of Engineering, Pune. Maharashtra, India. Akshay Kanfade Pushkar Lohiya Technology, College

More information

Acronis Backup Deduplication. Technical Whitepaper

Acronis Backup Deduplication. Technical Whitepaper Acronis Backup Deduplication Technical Whitepaper Table of Contents Table of Contents Table of Contents... 1 Introduction... 3 Storage Challenges... 4 How Deduplication Helps... 5 How It Works... 6 Deduplication

More information

Managing Storage Space in a Flash and Disk Hybrid Storage System

Managing Storage Space in a Flash and Disk Hybrid Storage System Managing Storage Space in a Flash and Disk Hybrid Storage System Xiaojian Wu, and A. L. Narasimha Reddy Dept. of Electrical and Computer Engineering Texas A&M University IEEE International Symposium on

More information

Deduplication has been around for several

Deduplication has been around for several Demystifying Deduplication By Joe Colucci Kay Benaroch Deduplication holds the promise of efficient storage and bandwidth utilization, accelerated backup and recovery, reduced costs, and more. Understanding

More information

Data Deduplication in a Hybrid Architecture for Improving Write Performance

Data Deduplication in a Hybrid Architecture for Improving Write Performance Data Deduplication in a Hybrid Architecture for Improving Write Performance Data-intensive Salable Computing Laboratory Department of Computer Science Texas Tech University Lubbock, Texas June 10th, 2013

More information

Data Deduplication Scheme for Cloud Storage

Data Deduplication Scheme for Cloud Storage 26 Data Deduplication Scheme for Cloud Storage 1 Iuon-Chang Lin and 2 Po-Ching Chien Abstract Nowadays, the utilization of storage capacity becomes an important issue in cloud storage. In this paper, we

More information

STORAGE SOURCE DATA DEDUPLICATION PRODUCTS. Buying Guide: inside

STORAGE SOURCE DATA DEDUPLICATION PRODUCTS. Buying Guide: inside Managing the information that drives the enterprise STORAGE Buying Guide: inside 2 Key features of source data deduplication products 5 Special considerations Source dedupe products can efficiently protect

More information

A Novel Deduplication Avoiding Chunk Index in RAM

A Novel Deduplication Avoiding Chunk Index in RAM A Novel Deduplication Avoiding Chunk Index in RAM 1 Zhike Zhang, 2 Zejun Jiang, 3 Xiaobin Cai, 4 Chengzhang Peng 1, First Author Northwestern Polytehnical University, 127 Youyixilu, Xi an, Shaanxi, P.R.

More information

WHITE PAPER. How Deduplication Benefits Companies of All Sizes An Acronis White Paper

WHITE PAPER. How Deduplication Benefits Companies of All Sizes An Acronis White Paper How Deduplication Benefits Companies of All Sizes An Acronis White Paper Copyright Acronis, Inc., 2000 2009 Table of contents Executive Summary... 3 What is deduplication?... 4 File-level deduplication

More information