The assignment of chunk size according to the target data characteristics in deduplication backup system

Size: px
Start display at page:

Download "The assignment of chunk size according to the target data characteristics in deduplication backup system"

Transcription

1 The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai, Nakai-machi, Ashigarakami-gun, Kanagawa, Osaka University 2-1, Yamadaoka, Suita, Osaka, Abstract This paper focuses on the trade-off between the deduplication rate and the processing penalty in backup system which uses a conventional variable chunking method. The trade-off is a nonlinear negative correlation if the chunk size is fixed. In order to analyze quantitatively the trade-off all over the factors, a simulation approach is taken and clarifies the several correlations among chunk sizes, densities and average lengths of the different parts. Then it clarifies to assign an appropriate chunk size based on the data characteristics dynamically is effective to weaken the trade-off and provide higher efficiency than a conventional way. Keywords: Deduplication, Backup, Archive, Capacity Optimization, Enterprise Storage 1 Introduction Due to the explosive increase of the data in IT system, backup operation are becoming a burden to the system because of the resource usage, the processing time and the managing cost, while it is an indispensable operation to recover the data in case of unpredictable disaster. Major requirements to backup operations are to shorten the processing time and to reduce the resource usage, especially storage capacity to keep the data for a long term. Recently, the technology called deduplication has become popular to reduce the burden. This is a technology to eliminate the duplication of the backup target data and store only unique data in the storage. The reduction of stored data reduces not only the backup storage cost but also other resources running workload. Various techniques have been proposed so far to provide more deduction with less processing time from the point view of more cost-effective backup operation[1][2]. However, because the deduplication rate and the processing time have a non-linear negative correlation, it is difficult to improve both rate and time simultaneously if only one chunking size is activated. This paper shows how to assign the appropriate chunking size according to the data characteristics. The assignment weakens the trade-off in single chunking method and therefore provides more efficient deduplication than any conventional approach. 2 Deduplication backup system 2.1 Variable chunking algorithm In usual customers environment, backup target data include various types of files and how much the duplication patterns remains or how they locates depends on the customers environments or applications, for example some are scarcely duplicated because it was much edited, changed or updated, some are densely duplicated because it was rarely changed, just replicated or copied. In this paper, all changed or updated parts of data are called differed area and unchanged parts are called identical area. A deduplication backup system has a processing module which eliminates the duplication within the target data. A module reads the target data, divides into many small segments, distinguishes the duplicated area from unique or newly updated area, then transfers and stores only the unique data in the storage. Figure 1 shows the typical operational flow of the deduplication process from reading the target data and finalizing to store the unique data. The size of the box is not to scale. A module divides the target data in small segments called chunk which are the unit of reduction or storing (Chunking). Two dividing al-

2 Figure 1. Deduplication process. gorithms are widely implemented, fixed length chunking or variable length chunking. Fixed length chunking divides the data into chunks with the same length that is defined in advance. Variable length chunking divides into chunks with variable length that is calculated by the data patterns. A typical algorithm of variable length chunking scans the data from the beginning of the file one byte at a time to the end by shifting a fixed length window[3][4]. The algorithm generates a special value, called a signature, based on the window using a Rabin s algorithm[5]. Then when the signature is recognized to match a predefined value, called anchor, the end byte of the windows is set at the end of the chunk. Otherwise the window is shifted by one byte and generate a signature again. The average chunk size is defined from how many bits are taken as a signature. For example, the length of the signature is 12 bits, under the assumption that the calculation of signature generates completely random values, 2 12 patterns are possible, this results in 4KB chunk size in average. Next, the module calculates the unique code from the chunk data to analyze the similarity of chunks using. The hashing algorithm is commonly used to calculate the code for example by SHA-1, SHA-256. This code is called by fingerprint (Fingerprinting). Next the module decides the uniqueness of each chunk using the fingerprint (Decision), whether the chunk is duplicated that the same chunk has been stored or not duplicated that all previous chunks are different. Finally, the module write out only not duplicated chunks into the storage (Writing). 2.2 Issues Many approaches have been proposed and implemented in order to increase the efficiency of deduplication. For the customers, two criteria are important to discuss the efficiency, one is the deduplication rate and the other is the processing time. The deduplication rate is desirable to be higher and the processing time is desirable to be shorter. How to choose the adequate chunk size of the algorithm is one key sensitive factor, because that affects heavily to the efficiency. In many cases, the chunk size is determined based on the consideration to keep the balance of the performance and the deduplication capability for an assuming hypothetical environment[6][7]. In order to gain more reduction, to assign smaller chunk size is generally effective. A smaller chunk size can pick up the differed area in more precise than a bigger chunk size. However, the effectiveness of the chunk size, that is how much sharply the chunk size cut off the duplication, very much depends on the distribution of differed area among the data, such as a length of each differed area and a distance between differed areas. When a small chunk size is assigned to longer differed data, many continuous chunks may be wasted to remain not duplicated area. When differed area are located with similar distance to the chunk size, many chunks may include both of differed area and identical area, this results in poor deduplication rate. When differed area are distributed with much smaller or much bigger distance comparing to the chunk size, the effectiveness is improved. Processing time, that is how much time or resource utilization is necessary to reduce the duplication, is also depends on the chunk size and others. A smaller chunk size requires more CPU and IO intensive processing time, a bigger one does less time. The less duplicated, less storing operation to the storage. Further, the correlation of the process time and the chunk sizes are non-linear, for example, for a smaller chunk size, the processing time increases more steeply as the size decreases. This non-linearity makes the compensation of trade-off hard to overcome. In any conventional way, only one chunk size is assigned for all data or all environments, it is not effective. If the chunk size is assigned to gain better deduplication rate with smaller processing time corresponding to the characteristics of data or environmental, it is beneficial for the system. 3 Simulation 3.1 Implementation The followings are the operation and assumptions of our simulator. As a convenience, we call the simulator SIM.

3 (1) Backup data Target backup data is implemented as a list with 1024*1024 length, one entity represents 64 bytes length of real data. Therefore, it represents in total 64MB real data length. In the stream, there are many area that represents identical, identical to one of previously stored area and many area which represents differed, not identical to anyone of previously stored area. SIM reduces all of identical area as much as it can and remains definitely all of differed area by the process. The ratio of the amount of all identical area to the total amount of data is defined r as a duplication rate. The domain of r is 0 r 1. In other words, the ratio of all differed area comes to 1 r. These differed areas are spread with equal density over the stream. (2) Chunking SIM assumes an anchor algorithm to separate the data into chunks. The signature length to control the average chunk size is set to 11 to 15 bits. These come to 2 to 32 KB in average chunk size respectively. In this case, the sizes of chunks fits to geometrical distribution[8]. The distribution parameter p is equal to 1/2 n, here n is the window size in bits. We call the case that the average chunk size is m as a notation of (m) for convenience. From industrial practical reason, SIM set two boundaries of the minimum and the maximum size. The minimum size is set to the half of chunk size, and the maximum size is set to one and half of the chunk size. For example, when an average chunk size is 8KB, the minimum size is 4KB and the maximum size is 12KB. Table. 1 is a list of parameters of the simulation. Table 1. Parameters of simulation Parameter Values Duplication rate: r 0.2, 0.3, 0.4, Length of differed area 2, 4, 8, [KB] Average chunk size: m 2, 3, 4, [KB] (3) Decision SIM checks whether the chunk includes differed area. When the chunk includes no differed area at all, SIM decides the chunk should be removed. When the chunk includes fully or partially differed area, SIM decided the chunk should be remained. The deduplication rate is calculated as a ratio of total amount of chunks which are decided to be removed to total amount of data. In addition, the fulfillment rate is calculated as a ratio of the deduplication rate to the duplication rate. The number of reduced chunks is counted as a number of identical therefore removed chunks by SIM. Since the processing time of deduplication is very varied on the resource configuration, usage in the system, SIM evaluates the total number of removed chunks as a uniform substitute. (4) Environment SIM is coded by the language R GUI bit, operated on the environment of the hardware consists of Intel Core i7 2.6GB CPU, 8GB (1600MHz, DDR3) memory. 3.2 Evaluation of the validity of SIM comparing the measured data At first, the validity of SIM is evaluated by comparing the equivalent data that are gathered from the measurement using real hardware. Our experiment hardware is based on a linux based NAS (Network Attached Storage) product connecting to HDD storage, NFS (Network File System) interface and a variable chunking mechanism is coded. Chunk size is controlled as an parameter of average chunk size for NAS and controlled as 1 p for SIM. These parameters are called here operational chunk sizes. Total amount of data in measurement is 1GB, the length of differed area is fixed at 32KB. The differed areas are randomly distributed over the data with control parameters of duplication rate that are assumed to reflect practical data characteristics. NAS reports the deduplication rate, the average chunk size, the number of reduced chunks by the experiment. Table 2 shows the comparison of two chunk sizes gathered from each experiment. One is chunk size that NAS divides in the measurement, the other is one that SIM divides in the simulation. Simulated chunk sizes match well to measured chunk sizes in the area of smaller operational chunk size. Simulated chunk size become gradually shorter than measured as operational chunk size increases. This comes from the difference that NAS uses multiple chunking algorithm installed, on the other hand, SIM uses only one variable chunking algorithm. Table 3 and figure 2 show the comparison of deduplication rates. The rates matches in smaller chunk size, 2% difference in case of 8KB. The

4 Table 3. Comparison of simulated and measured deduplication rate Operational measured simulated difference [%] chunk size KB KB KB KB KB KB Table 2. Comparison of simulated and measured chunk size Operational measured simulated difference chunk size [KB] [KB] [%] 2KB KB KB KB KB KB difference become larger as the chunk size increases. The rate become worse for NAS as the duplication rate increases. This comes from the difference that in NAS some additional information is embedded in to the stream to improve the data integrity, SIM does not. The embedded information spoil the deduplication efficiency and lessen the rate as the chunk size increases. The rate is better for NAS in case that the duplication rate equals to 0.5. This comes from the difference that NAS uses multiple chunking algorithms, results in higher deduplication rate, at the same time NAS has the penalty to lessen the rate by embedded information which becomes bigger as duplication increases. 4 Evaluation 4.1 Dependency on duplication rate Figure 3 shows the correlation of deduplication rate and the number of reduced chunks by changing the duplication rate of 0.2, 0.4, 0.5, 0.6, 0.8 under the same length of differed size of 32KB. The number of reduced chunks that is removed by SIM increases as the duplication rate increases or the chunk size decreases. A criteria is useful to analyze the effects of changing chunk sizes. As shown in the figure, as the chunk size decrease starting from 32KB, the number of re- Deduplication rate KB measured 8KB measured 16KB measured 32KB measured 64KB measured Simulated Duplication rate Figure 2. Deduplication rate by Single layer deduplication system Number of reduced chunks r=0.2 r=0.4 r=0.5 r=0.6 r=0.8 Opt. (32) (64) (2) (4) (8) (16) Cost Deduplication rate Figure 3. Reduced number of chunks by duplication rate (length of differed area = 32KB) duced chunks becomes 3.2 times by (16) from (32), the deduplication rate improves 60% from to Then, the number of reduced chunks becomes 2.5 times by (8) from (16), the deduplication rate improves 30%, successively 2.0 times and only 3% by (2) from (4). The improvement ratio decrease as chunk size decreases. This means to decrement chunk size in bigger chunking situation gains more improvement with less penalty. At the same time, the improvement is more clear as the duplication rate decrease.

5 The deduplication rate and the fulfillment rate with each chunk size increase as the duplication rate increase. This comes because the deduplication works more efficiently as the potential of mixture of identical area and differed area in the same chunk becomes lower along duplication rate increase. 4.2 Dependency on length of differed area Figure 4 shows the correlation between the deduplication rate and the number of reduced chunks by changing the length of differed area of 2, 4, 8, 16, 32KB under the same duplication rate of 0.6. As the length of differed area increase, the corresponding number of reduced chunks increase and deduplication rate also increase under the same chunk size. This comes from the decrement of deduplication inefficiency because the potential of mixture of identical area and differed area in a chunk become lower as length of differed area increase. The improvement of the deduplication rate along decreasing of chunk size becomes smaller as the length of differed area increases. This comes because the preciseness of reduction by using small chunks degrades. The number of reduced chunks increase and therefore the deduplication rate increase as the length of differed area increases for all chunk sizes. The improvement becomes smaller for small chunk sizes. For example, when the length of differed area increase from 2KB to 32KB, the improvement of deduplication rate is 1.8 times in case of (2), 3.3 times for (4), 9.5 times for (8), and so on. This indicates to assign bigger chunk size is more effective than the smaller ones in case that the length of differed area widely varies in the data. Number of reduced chunks KB differed length 4KB differed length 8KB differed length 16KB differed length 32KB differed length (16) (8) (2) (4) Deduplication rate Figure 4. Reduced number of chunks by length of dirty (duplication rate = 0.6) 4.3 Assignment of optimal chunk size As described, there is a trade-off between the number of reduced chunk size and the deduplication rate. Therefore, when the problem to assign the optimal chunk size is defined as a linear programming with two objectives, to maximize the deduplication rate and to minimize the number of reduced chunks, it generates a Pareto-optimal solutions, because there exists no feasible solution for which an improvement in one objective does not lead to a simultaneous degradation in the other objectives. In figure 3 or figure 4, each solid line represents one corresponding Pareto-optimal solutions. Generally, to assign the best chunk size as an optimal one depend on the customers choice, considering of various factors from their proprietary priorities[9]. Without neglecting practical priorities, this paper simplify the customers factors and priorities in two costs. One factor is the operational cost, which includes the cost of resources like server, LAN(Local Area Network) SAN(Storage Area network), utilities and the cost of management. The other factor is the capital cost like Storage and the cost of maintenance. The priorities is to minimize the total cost. Here, the cost is defined in the following function, the optimal chunk size is assigned to minimize the cost. C = A ReducedChunks B DedupedRate Here, ReducedChunks: Number of reduced chunks DedupedRate : Deduplication rate Figure 3 shows an example of cost function. Optimul chunk size KB differed length 32KB differed length 8KB differed length Duplication rate Figure 5. Optimal chunk size to minimize the cost (length of dirty = KB) The dotted line indicates the cost for corresponding deduplication rate and number of reduced chunks, A is set to 1 and B to In case of duplication rate of 0.6, the chunk size to pro-

6 vide the minimum cost is 8KB. Figure 5 shows the chunk size to minimize the cost. The optimal chunk size increase as the duplication rate increases. This comes from the characteristic of bigger chunks that reduce coarsely with short process time. Further, the optimal chunk size increase as the length of differed area increases. This comes from the same reason. In addition, the difference between the optimal chunk size increases as the duplication rate increases. This comes because the effectiveness to use bigger chunks becomes weak as the duplication rate increases in case of short length of differed area. In commercially available products, 4KB chunk size is a popular size. Table 4 shows the improvement for the cases of duplication rate of 0.2, 0.4, 0.6, 0.8 and the length of differed area of 8, 32, 64KB. The improvement of optimal chunks increases as the duplication rate and the length of differed area increase, becomes 33% in the case of 0.8 and 64KB. Table 4. Improvement of cost value by the optimal chunk size to conventional 4KB. Length of Duplication rate differed area KB Optimal (3) (4) (5) (8) Improve KB Optimal (4) (7) (12) (15) Improve KB Optimal (6) (8) (13) (19) Improve Conclusion The paper simulates variable chunking algorithm in backup deduplication system. First, the simulation clarify changing chunk size is more beneficial for bigger chunks than smaller chunks, gains more improvement with less penalty. In addition, the improvement is increase as the duplication rate decrease. Next, to assign bigger chunk size is more effective than the smaller ones in case that the length of differed area widely varies within the data. Finally, under the reasonable assumption to minimize the cost, the optimal chunk size is 3KB in case of 0.2 duplication rate and 8KB length of differed area with 27% improvement than fixed 4K chunk size, similarly the optimal chunk size is 19KB in case of 0.8, 64KB with the improvement of 33%. References [1] Y. Tan et al. Dam: A data ownershipaware multi-layered de-duplication scheme. In 2010 Fifth IEEE International Conference on Networking, Architecture and Storage. IDC-Japan, [2] Y. Won, J. Ban, J. Min, L. Hur, S. Oh, and J. Lee. Efficient index lookup for de-duplication backup system. In Modeling, Analysis and Simulation of Computers and Telecommunication Systems, MASCOTS IEEE International Symposium on (Poster Presentation), Sept [3] U. Manber. Finding similar files in a large file system. In Proceedings of the USENIX Winter 1994 Technical Conference, [4] A. Muthitacharoen, B. Chen, and D. Mazières. A low-bandwidth network file system. In Proceeding of SIGOPS. 18th Symposium on Operating Systems Principles., Banff, Canada, [5] M. O. Rabin. Fingerprinting by random polynomials. Technical report, Department of Computer Science, Harvard University, [6] D. Meister and A. Brinkmann. Multilevel comparison of data deduplication in a backup scenario. In Proceedings of SYSTOR 2009, The 2nd Annual International Systems and Storage Conference. ACM, May [7] G. Wallace, F. Douglis, H. Qian, P. Shilane, S. Smaldone, M. Chamness, and W. Hsu. Characteristics of backup workloads in production systems. In Proceedings of the 10th USENIX Conference on File and Storage Technologies, [8] J. Min, D. Yoon, and Y. Won. Efficient deduplication techniques for modern backup operation. In IEEE Transaction on computer, Vol. 60, No. 6, [9] M. Ogata and N. Komoda. Optimized assignment of deduplication backup methods using integer programming. In Proceedings of JCIS2011: The 4th Japan-China Joint Symposium on Information Systems, Apr

A Deduplication-based Data Archiving System

A Deduplication-based Data Archiving System 2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

De-duplication-based Archival Storage System

De-duplication-based Archival Storage System De-duplication-based Archival Storage System Than Than Sint Abstract This paper presents the disk-based backup system in which only relational database files are stored by using data deduplication technology.

More information

A Data De-duplication Access Framework for Solid State Drives

A Data De-duplication Access Framework for Solid State Drives JOURNAL OF INFORMATION SCIENCE AND ENGINEERING 28, 941-954 (2012) A Data De-duplication Access Framework for Solid State Drives Department of Electronic Engineering National Taiwan University of Science

More information

Primary Data Deduplication Large Scale Study and System Design

Primary Data Deduplication Large Scale Study and System Design Primary Data Deduplication Large Scale Study and System Design A. El-Shimi, R. Kalach, A. Kumar, J. Li, A. Oltean, S. Sengupta Microsoft Corporation, Redmond (USA) Primary Data Deduplication for File-based

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

Data Backup and Archiving with Enterprise Storage Systems

Data Backup and Archiving with Enterprise Storage Systems Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia slavjan_ivanov@yahoo.com,

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve

More information

3Gen Data Deduplication Technical

3Gen Data Deduplication Technical 3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and

More information

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication

Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Data De-duplication Methodologies: Comparing ExaGrid s Byte-level Data De-duplication To Block Level Data De-duplication Table of Contents Introduction... 3 Shortest Possible Backup Window... 3 Instant

More information

FAST 11. Yongseok Oh University of Seoul. Mobile Embedded System Laboratory

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory

More information

File Size Distribution Model in Enterprise File Server toward Efficient Operational Management

File Size Distribution Model in Enterprise File Server toward Efficient Operational Management Proceedings of the World Congress on Engineering and Computer Science 212 Vol II WCECS 212, October 24-26, 212, San Francisco, USA File Size Distribution Model in Enterprise File Server toward Efficient

More information

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

Two-Level Metadata Management for Data Deduplication System

Two-Level Metadata Management for Data Deduplication System Two-Level Metadata Management for Data Deduplication System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3.,Young Woong Ko 1 1 Dept. of Computer Engineering, Hallym University Chuncheon, Korea { kongjs,

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges September 2011 Table of Contents The Enterprise and Mobile Storage Landscapes... 3 Increased

More information

Scalability Results. Select the right hardware configuration for your organization to optimize performance

Scalability Results. Select the right hardware configuration for your organization to optimize performance Scalability Results Select the right hardware configuration for your organization to optimize performance Table of Contents Introduction... 1 Scalability... 2 Definition... 2 CPU and Memory Usage... 2

More information

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department

More information

Efficient File Storage Using Content-based Indexing

Efficient File Storage Using Content-based Indexing Efficient File Storage Using Content-based Indexing João Barreto joao.barreto@inesc-id.pt Paulo Ferreira paulo.ferreira@inesc-id.pt Distributed Systems Group - INESC-ID Lisbon Technical University of Lisbon

More information

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India

More information

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com

Cloud Storage. Parallels. Performance Benchmark Results. White Paper. www.parallels.com Parallels Cloud Storage White Paper Performance Benchmark Results www.parallels.com Table of Contents Executive Summary... 3 Architecture Overview... 3 Key Features... 4 No Special Hardware Requirements...

More information

Data Deduplication: An Essential Component of your Data Protection Strategy

Data Deduplication: An Essential Component of your Data Protection Strategy WHITE PAPER: THE EVOLUTION OF DATA DEDUPLICATION Data Deduplication: An Essential Component of your Data Protection Strategy JULY 2010 Andy Brewerton CA TECHNOLOGIES RECOVERY MANAGEMENT AND DATA MODELLING

More information

Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets

Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets Assuring Demanded Read Performance of Data Deduplication Storage with Backup Datasets Young Jin Nam School of Computer and Information Technology Daegu University Gyeongsan, Gyeongbuk, KOREA 7-7 Email:

More information

CHARACTERISTICS OF BACKUP WORKLOADS IN PRODUCTION SYSTEMS

CHARACTERISTICS OF BACKUP WORKLOADS IN PRODUCTION SYSTEMS CHARACTERISTICS OF BACKUP WORKLOADS IN PRODUCTION SYSTEMS Grant Wallace, Fred Douglis, Hangwei Qian*, Philip Shilane, Stephen Smaldone, Mark Chamness, Windsor Hsu Backup Recovery Systems Division EMC Corporation

More information

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant

HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant DISCOVER HP StoreOnce & Deduplication Solutions Zdenek Duchoň Pre-sales consultant HP StorageWorks Data Protection Solutions HP has it covered Near continuous data protection Disk Mirroring Advanced Backup

More information

Energy aware RAID Configuration for Large Storage Systems

Energy aware RAID Configuration for Large Storage Systems Energy aware RAID Configuration for Large Storage Systems Norifumi Nishikawa norifumi@tkl.iis.u-tokyo.ac.jp Miyuki Nakano miyuki@tkl.iis.u-tokyo.ac.jp Masaru Kitsuregawa kitsure@tkl.iis.u-tokyo.ac.jp Abstract

More information

Byte-index Chunking Algorithm for Data Deduplication System

Byte-index Chunking Algorithm for Data Deduplication System , pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko

More information

Extending SSD Simulator to Support Shared Channel between Packages

Extending SSD Simulator to Support Shared Channel between Packages Extending SSD Simulator to Support Shared Channel between s Y. A. Winata Graduate Student, Department of Electronic Engineering, S. Jin Graduate Student, Department of Electronic Engineering, I. Shin*

More information

TEST REPORT SUMMARY MAY 2010 Symantec Backup Exec 2010: Source deduplication advantages in database server, file server, and mail server scenarios

TEST REPORT SUMMARY MAY 2010 Symantec Backup Exec 2010: Source deduplication advantages in database server, file server, and mail server scenarios TEST REPORT SUMMARY MAY 21 Symantec Backup Exec 21: Source deduplication advantages in database server, file server, and mail server scenarios Executive summary Symantec commissioned Principled Technologies

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features to reduce the complexity of managing data protection

More information

Data Storage Framework on Flash Memory using Object-based Storage Model

Data Storage Framework on Flash Memory using Object-based Storage Model 2011 International Conference on Computer Science and Information Technology (ICCSIT 2011) IPCSIT vol. 51 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V51. 118 Data Storage Framework

More information

Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem

Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem October, 2006 Introduction Data de-duplication has recently gained significant industry attention,

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information

Deduplicating Compressed Contents in Cloud Storage Environment

Deduplicating Compressed Contents in Cloud Storage Environment Deduplicating Compressed Contents in Cloud Storage Environment Zhichao Yan, Hong Jiang University of Texas Arlington zhichao.yan@mavs.uta.edu hong.jiang@uta.edu Abstract Data compression and deduplication

More information

Keywords De-duplication, block level de-duplication, hash, Inline parallel de-duplication.

Keywords De-duplication, block level de-duplication, hash, Inline parallel de-duplication. Volume 5, Issue 4, April 2015 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com A Parallel Architecture

More information

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Table of Contents Introduction.... 3 vsphere Architectural Overview... 4 SAN Backup

More information

Hardware Configuration Guide

Hardware Configuration Guide Hardware Configuration Guide Contents Contents... 1 Annotation... 1 Factors to consider... 2 Machine Count... 2 Data Size... 2 Data Size Total... 2 Daily Backup Data Size... 2 Unique Data Percentage...

More information

Data Deduplication HTBackup

Data Deduplication HTBackup Data Deduplication HTBackup HTBackup and it s Deduplication technology is touted as one of the best ways to manage today's explosive data growth. If you're new to the technology, these key facts will help

More information

BALANCING FOR DISTRIBUTED BACKUP

BALANCING FOR DISTRIBUTED BACKUP CONTENT-AWARE LOAD BALANCING FOR DISTRIBUTED BACKUP Fred Douglis 1, Deepti Bhardwaj 1, Hangwei Qian 2, and Philip Shilane 1 1 EMC 2 Case Western Reserve University 1 Starting Point Deduplicating disk-based

More information

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Quanqing XU Quanqing.Xu@nicta.com.au YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data

More information

File Management. Chapter 12

File Management. Chapter 12 Chapter 12 File Management File is the basic element of most of the applications, since the input to an application, as well as its output, is usually a file. They also typically outlive the execution

More information

Whitepaper: Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam. info@sepusa.com www.sepusa.com Copyright 2014 SEP

Whitepaper: Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam. info@sepusa.com www.sepusa.com Copyright 2014 SEP Whitepaper: Back Up SAP HANA and SUSE Linux Enterprise Server with SEP sesam info@sepusa.com www.sepusa.com Table of Contents INTRODUCTION AND OVERVIEW... 3 SOLUTION COMPONENTS... 4-5 SAP HANA... 6 SEP

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_WP_ 20121112 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges... 3 SSD

More information

IBM Tivoli Storage Manager for Windows Version Installation Guide IBM

IBM Tivoli Storage Manager for Windows Version Installation Guide IBM IBM Tivoli Storage Manager for Windows Version 7.1.5 Installation Guide IBM IBM Tivoli Storage Manager for Windows Version 7.1.5 Installation Guide IBM Note: Before you use this information and the product

More information

System Software for Flash Memory: A Survey

System Software for Flash Memory: A Survey System Software for Flash Memory: A Survey Tae-Sun Chung 1, Dong-Joo Park 2, Sangwon Park 3, Dong-Ho Lee 4, Sang-Won Lee 5, and Ha-Joo Song 6 1 College of Information Technoloty, Ajou University, Korea

More information

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT

MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,

More information

PIONEER RESEARCH & DEVELOPMENT GROUP

PIONEER RESEARCH & DEVELOPMENT GROUP SURVEY ON RAID Aishwarya Airen 1, Aarsh Pandit 2, Anshul Sogani 3 1,2,3 A.I.T.R, Indore. Abstract RAID stands for Redundant Array of Independent Disk that is a concept which provides an efficient way for

More information

EMC BACKUP-AS-A-SERVICE

EMC BACKUP-AS-A-SERVICE Reference Architecture EMC BACKUP-AS-A-SERVICE EMC AVAMAR, EMC DATA PROTECTION ADVISOR, AND EMC HOMEBASE Deliver backup services for cloud and traditional hosted environments Reduce storage space and increase

More information

features at a glance

features at a glance hp availability stats and performance software network and system monitoring for hp NonStop servers a product description from hp features at a glance Online monitoring of object status and performance

More information

Using Synology SSD Technology to Enhance System Performance Synology Inc.

Using Synology SSD Technology to Enhance System Performance Synology Inc. Using Synology SSD Technology to Enhance System Performance Synology Inc. Synology_SSD_Cache_WP_ 20140512 Table of Contents Chapter 1: Enterprise Challenges and SSD Cache as Solution Enterprise Challenges...

More information

Protecting Information in a Smarter Data Center with the Performance of Flash

Protecting Information in a Smarter Data Center with the Performance of Flash 89 Fifth Avenue, 7th Floor New York, NY 10003 www.theedison.com 212.367.7400 Protecting Information in a Smarter Data Center with the Performance of Flash IBM FlashSystem and IBM ProtecTIER Printed in

More information

Reclaiming Primary Storage with Managed Server HSM

Reclaiming Primary Storage with Managed Server HSM White Paper Reclaiming Primary Storage with Managed Server HSM November, 2013 RECLAIMING PRIMARY STORAGE According to Forrester Research Inc., the total amount of data warehoused by enterprises is doubling

More information

Protect Microsoft Exchange databases, achieve long-term data retention

Protect Microsoft Exchange databases, achieve long-term data retention Technical white paper Protect Microsoft Exchange databases, achieve long-term data retention HP StoreOnce Backup systems, HP StoreOnce Catalyst, and Symantec NetBackup OpenStorage Table of contents Introduction...

More information

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression

WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA

More information

Demystifying Deduplication for Backup with the Dell DR4000

Demystifying Deduplication for Backup with the Dell DR4000 Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett

More information

The Classical Architecture. Storage 1 / 36

The Classical Architecture. Storage 1 / 36 1 / 36 The Problem Application Data? Filesystem Logical Drive Physical Drive 2 / 36 Requirements There are different classes of requirements: Data Independence application is shielded from physical storage

More information

Rackspace Cloud Databases and Container-based Virtualization

Rackspace Cloud Databases and Container-based Virtualization Rackspace Cloud Databases and Container-based Virtualization August 2012 J.R. Arredondo @jrarredondo Page 1 of 6 INTRODUCTION When Rackspace set out to build the Cloud Databases product, we asked many

More information

Solid State Storage in Massive Data Environments Erik Eyberg

Solid State Storage in Massive Data Environments Erik Eyberg Solid State Storage in Massive Data Environments Erik Eyberg Senior Analyst Texas Memory Systems, Inc. Agenda Taxonomy Performance Considerations Reliability Considerations Q&A Solid State Storage Taxonomy

More information

Quantum StorNext. Product Brief: Distributed LAN Client

Quantum StorNext. Product Brief: Distributed LAN Client Quantum StorNext Product Brief: Distributed LAN Client NOTICE This product brief may contain proprietary information protected by copyright. Information in this product brief is subject to change without

More information

Performance evaluation of Web Information Retrieval Systems and its application to e-business

Performance evaluation of Web Information Retrieval Systems and its application to e-business Performance evaluation of Web Information Retrieval Systems and its application to e-business Fidel Cacheda, Angel Viña Departament of Information and Comunications Technologies Facultad de Informática,

More information

Hitachi Storage Solution for Cloud Computing

Hitachi Storage Solution for Cloud Computing Storage Solution for Cloud Computing Virtual Storage Platform Review Vol. 61 (2012), No. 2 85 Tsutomu Sukigara Naoko Kumagai OVERVIEW: With growing interest in cloud computing driven by major changes in

More information

Best Practices for Deploying Citrix XenDesktop on NexentaStor Open Storage

Best Practices for Deploying Citrix XenDesktop on NexentaStor Open Storage Best Practices for Deploying Citrix XenDesktop on NexentaStor Open Storage White Paper July, 2011 Deploying Citrix XenDesktop on NexentaStor Open Storage Table of Contents The Challenges of VDI Storage

More information

A Dell Technical White Paper Dell Compellent

A Dell Technical White Paper Dell Compellent The Architectural Advantages of Dell Compellent Automated Tiered Storage A Dell Technical White Paper Dell Compellent THIS WHITE PAPER IS FOR INFORMATIONAL PURPOSES ONLY, AND MAY CONTAIN TYPOGRAPHICAL

More information

Exploring RAID Configurations

Exploring RAID Configurations Exploring RAID Configurations J. Ryan Fishel Florida State University August 6, 2008 Abstract To address the limits of today s slow mechanical disks, we explored a number of data layouts to improve RAID

More information

NETAPP WHITE PAPER Looking Beyond the Hype: Evaluating Data Deduplication Solutions

NETAPP WHITE PAPER Looking Beyond the Hype: Evaluating Data Deduplication Solutions NETAPP WHITE PAPER Looking Beyond the Hype: Evaluating Data Deduplication Solutions Larry Freeman, Network Appliance, Inc. September 2007 WP-7028-0907 Table of Contents The Deduplication Hype 3 What Is

More information

Google File System. Web and scalability

Google File System. Web and scalability Google File System Web and scalability The web: - How big is the Web right now? No one knows. - Number of pages that are crawled: o 100,000 pages in 1994 o 8 million pages in 2005 - Crawlable pages might

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

Get Success in Passing Your Certification Exam at first attempt!

Get Success in Passing Your Certification Exam at first attempt! Get Success in Passing Your Certification Exam at first attempt! Exam : E22-290 Title : EMC Data Domain Deduplication, Backup and Recovery Exam Version : DEMO 1.A customer has a Data Domain system with

More information

An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space

An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space Jannu.Prasanna Krishna M.Tech Student, Department of CSE,

More information

Similarity Evaluation Scheme Using Fixed-length VLC Based Representative Hash

Similarity Evaluation Scheme Using Fixed-length VLC Based Representative Hash Similarity Evaluation Scheme Using Fixed-length VLC Based Representative Hash Young Jun Yoo 1, Jin Kim, Jung min So, Jeong Gun Lee, Sun Jung Kim,Young Woong Ko 1 Dept. of Computer Engineering, Hallym University

More information

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Advanced Storage Products Group Table of Contents 1 - Introduction 2 Data Deduplication 3

More information

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging

Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging Achieving Nanosecond Latency Between Applications with IPC Shared Memory Messaging In some markets and scenarios where competitive advantage is all about speed, speed is measured in micro- and even nano-seconds.

More information

Entrust IdentityGuard Comprehensive

Entrust IdentityGuard Comprehensive Entrust IdentityGuard Comprehensive Entrust IdentityGuard Comprehensive is a five-day, hands-on overview of Entrust Course participants will gain experience planning, installing and configuring Entrust

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features that reduce the complexity of managing data protection

More information

WHITE PAPER The End of Fragmentation: How to Keep Windows Systems Running Like New

WHITE PAPER The End of Fragmentation: How to Keep Windows Systems Running Like New WHITE PAPER The End of Fragmentation: How to Keep Windows Systems Running Like New Think Faster. Visit us at Condusiv.com THE END OF FRAGMENTATION: 1 Diskeeper data performance technology makes it easier

More information

Efficiently Storing Virtual Machine Backups

Efficiently Storing Virtual Machine Backups Efficiently Storing Virtual Machine Backups Stephen Smaldone, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract Physical level backups offer increased performance

More information

Scala Storage Scale-Out Clustered Storage White Paper

Scala Storage Scale-Out Clustered Storage White Paper White Paper Scala Storage Scale-Out Clustered Storage White Paper Chapter 1 Introduction... 3 Capacity - Explosive Growth of Unstructured Data... 3 Performance - Cluster Computing... 3 Chapter 2 Current

More information

Contents. WD Arkeia Page 2 of 14

Contents. WD Arkeia Page 2 of 14 Contents Contents...2 Executive Summary...3 What Is Data Deduplication?...4 Traditional Data Deduplication Strategies...5 Deduplication Challenges...5 Single-Instance Storage...5 Fixed-Block Deduplication...6

More information

Keywords Cloud Storage, Error Identification, Partitioning, Cloud Storage Integrity Checking, Digital Signature Extraction, Encryption, Decryption

Keywords Cloud Storage, Error Identification, Partitioning, Cloud Storage Integrity Checking, Digital Signature Extraction, Encryption, Decryption Partitioning Data and Domain Integrity Checking for Storage - Improving Cloud Storage Security Using Data Partitioning Technique Santosh Jogade *, Ravi Sharma, Prof. Rajani Kadam Department Of Computer

More information

Security Ensured Redundant Data Management under Cloud Environment

Security Ensured Redundant Data Management under Cloud Environment Security Ensured Redundant Data Management under Cloud Environment K. Malathi 1 M. Saratha 2 1 PG Scholar, Dept. of CSE, Vivekanandha College of Technology for Women, Namakkal. 2 Assistant Professor, Dept.

More information

The Architectural Advantages of Dell SC Series Automated Tiered Storage

The Architectural Advantages of Dell SC Series Automated Tiered Storage The Architectural Advantages of Dell SC Series Automated Tiered Storage Dell Engineering January 2016 A Dell Technical White Paper Revisions Date February 2011 January 2016 Description Initial release

More information

Evaluating HDFS I/O Performance on Virtualized Systems

Evaluating HDFS I/O Performance on Virtualized Systems Evaluating HDFS I/O Performance on Virtualized Systems Xin Tang xtang@cs.wisc.edu University of Wisconsin-Madison Department of Computer Sciences Abstract Hadoop as a Service (HaaS) has received increasing

More information

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP 1 M.SHYAMALA DEVI, 2 V.VIMAL KHANNA, 3 M.SHAHEEN SHAH 1 Assistant Professor, Department of CSE, R.M.D.

More information

Design of a NAND Flash Memory File System to Improve System Boot Time

Design of a NAND Flash Memory File System to Improve System Boot Time International Journal of Information Processing Systems, Vol.2, No.3, December 2006 147 Design of a NAND Flash Memory File System to Improve System Boot Time Song-Hwa Park*, Tae-Hoon Lee*, and Ki-Dong

More information

Partition Alignment Dramatically Increases System Performance

Partition Alignment Dramatically Increases System Performance Partition Alignment Dramatically Increases System Performance Information for anyone in IT that manages large storage environments, data centers or virtual servers. Paragon Software Group Paragon Alignment

More information

File-System Structure

File-System Structure Chapter 12: File System Implementation File System Structure File System Implementation Directory Implementation Allocation Methods Free-Space Management Efficiency and Performance Recovery Log-Structured

More information

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper

HP StoreOnce D2D. Understanding the challenges associated with NetApp s deduplication. Business white paper HP StoreOnce D2D Understanding the challenges associated with NetApp s deduplication Business white paper Table of contents Challenge #1: Primary deduplication: Understanding the tradeoffs...4 Not all

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

Optimized And Secure Data Backup Solution For Cloud Using Data Deduplication

Optimized And Secure Data Backup Solution For Cloud Using Data Deduplication RESEARCH ARTICLE OPEN ACCESS Optimized And Secure Data Backup Solution For Cloud Using Data Deduplication Siva Ramakrishnan S( M.Tech ) 1,Vinoth Kumar P (M.E) 2 1 ( Department Of Computer Science Engineering,

More information

Technology Fueling the Next Phase of Storage Optimization

Technology Fueling the Next Phase of Storage Optimization White Paper HP StoreOnce Deduplication Software Technology Fueling the Next Phase of Storage Optimization By Lauren Whitehouse June, 2010 This ESG White Paper was commissioned by Hewlett-Packard and is

More information

P2P support for web proxy caching with web characteristics

P2P support for web proxy caching with web characteristics P2P support for web proxy caching with web characteristics Kyungbaek Kim 1 and Daeyeon Park 1 Department of Electrical Engineering & Computer Science, Division of Electrical Engineering, Korea Advanced

More information

Performance Testing of a Cloud Service

Performance Testing of a Cloud Service Performance Testing of a Cloud Service Trilesh Bhurtun, Junior Consultant, Capacitas Ltd Capacitas 2012 1 Introduction Objectives Environment Tests and Results Issues Summary Agenda Capacitas 2012 2 1

More information

Tandberg Data AccuVault RDX

Tandberg Data AccuVault RDX Tandberg Data AccuVault RDX Binary Testing conducts an independent evaluation and performance test of Tandberg Data s latest small business backup appliance. Data backup is essential to their survival

More information

Filesystems Performance in GNU/Linux Multi-Disk Data Storage

Filesystems Performance in GNU/Linux Multi-Disk Data Storage JOURNAL OF APPLIED COMPUTER SCIENCE Vol. 22 No. 2 (2014), pp. 65-80 Filesystems Performance in GNU/Linux Multi-Disk Data Storage Mateusz Smoliński 1 1 Lodz University of Technology Faculty of Technical

More information

ExaGrid Product Description. Cost-Effective Disk-Based Backup with Data Deduplication

ExaGrid Product Description. Cost-Effective Disk-Based Backup with Data Deduplication ExaGrid Product Description Cost-Effective Disk-Based Backup with Data Deduplication 1 Contents Introduction... 3 Considerations When Examining Disk-Based Backup Approaches... 3 ExaGrid A Disk-Based Backup

More information

SiLo: A Similarity-Locality based Near- Exact Deduplication Scheme with Low RAM Overhead and High Throughput

SiLo: A Similarity-Locality based Near- Exact Deduplication Scheme with Low RAM Overhead and High Throughput SiLo: A Similarity-Locality based Near- Exact Deduplication Scheme with Low RAM Overhead and High Throughput Wen Xia Hong Jiang Dan Feng Yu Hua, Huazhong University of Science and Technology University

More information

A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm

A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm A Network Differential Backup and Restore System based on a Novel Duplicate Data Detection algorithm GUIPING WANG 1, SHUYU CHEN 2*, AND JUN LIU 1 1 College of Computer Science Chongqing University No.

More information