Security Ensured Redundant Data Management under Cloud Environment

Size: px
Start display at page:

Download "Security Ensured Redundant Data Management under Cloud Environment"

Transcription

1 Security Ensured Redundant Data Management under Cloud Environment K. Malathi 1 M. Saratha 2 1 PG Scholar, Dept. of CSE, Vivekanandha College of Technology for Women, Namakkal. 2 Assistant Professor, Dept. of CSE, Vivekanandha College of Technology for Women, Namakkal. Abstract: Cloud backup service provides offsite storage for the users with disaster recovery support. Deduplication methods are used to control high data redundancy in backup dataset.data deduplication is a data compression approach applied in communication or storage environment. Limited resource level and I/O overhead are considered in the deduplication process. Source deduplication strategies can be divided into two categories. They are local source deduplication (LSD) and global source deduplication (GSD). LSD only detects redundancy in backup dataset from the same device at the client side and only sends the unique data chunks to the cloud storage. GSD performs duplicate check in backup datasets from all clients in the cloud side before data transfer over WAN. Redundant data management process is achieved using Application aware Local-Global source Deduplication scheme. File size filter is used to separate the small size files. Application aware chunking strategy is used in Intelligent Chunker to break the backup data streams. Application aware deduplicator deduplicates the data chunks from the same type of files. Hash engine is used to generate chunk finger prints. Data redundancy check is carried out in application-aware indices in both local client and remote cloud. File metadata is updated with redundant chunk location details. Segments and corresponding finger prints are stored in the cloud data center using self describing data structure. The deduplication scheme is enhanced with encrypted data handling features. Encrypted cloud storage model is used to secure personal data values. Deduplication scheme is adapted to control data redundancy under Smart Phone environment. File level deduplication scheme is designed for global level deduplication process. I. INTRODUCTION Cloud computing is a novel paradigm that provides infrastructure, platform and software as a service. A cloud platform can be either virtualized or not. Virtualizing the cloud platform increases the resources availability and the flexibility of their management. It also reduces the cost through hardware multiplexing and helps energy saving. Virtualization is then a key enabling technology of cloud computing. System virtualization refers to the software and hardware techniques that allow partitioning one physical machine into multiple virtual instances that run concurrently and share the underlying physical resources and devices. The recent introduction of digital TV, digital camcorders and other communication technologies has rapidly accelerated the amount of data being maintained in digital form. In 2007, for the first time ever, the total volume of digital contents exceeded the global storage capacity and it is estimated that by 2011 only half of the digital information will be stored. Further, the volume of automatically generated information exceeds the volume of human generated digital information. Compounding the problem of storage space, digitized information has a more fundamental problem: it is more vulnerable to error compared to the information in legacy media, e.g., paper, book and film. When data is stored in a computer storage system, a single storage error or power failure can put a large amount of information in danger. To protect against such problems, a number of technologies to strengthen the availability and reliability of digital data have been used, including mirroring, replication and adding parity information. In the application layer, the administrator replicates the data onto additional copies called backups so that that the original information can be restored in case of data loss. Due to the exponential growth in the volume of digital data, the backup operation is no longer routine. Further, exploiting commonalities in a file or among a set of files when storing and transmitting contents is no longer an option. By properly removing information redundancy in a file system, the amount of information to manage is effectively reduced, significantly reducing the time and space requirement of managing information. The deduplication module partitions a file into chunks, generates the respective summary information, which we call a fingerprint and looks up Fingerprint Table to determine if the respective chunk already exists. If it does not exist, the fingerprint value is inserted into Fingerprint Table. Chunking and fingerprint management is the key technical constituents which governs the overall deduplication performance. There are a number of ways for chunking, e.g., variable size chunking, fixed size chunking, or mixture of both. There are a number of ways to managing fingerprints. Legacy index structure, e.g., B+tree and hashing does not fit for deduplication workload. A sequence of fingerprints generated from a single file does not yield any spatial locality in Fingerprint Table. On the same token, a sequence of fingerprint lookup operations can result in a random read on Fingerprint Table and therefore each fingerprint lookup can result in disk access. Given that most of the deduplication operation needs to be performed online, it is critical that fingerprint lookup and insert is performed with minimal disk access. 401 Techscripts

2 II. RELATED WORK Chunk-based deduplication is the most widely used deduplication method for secondary storage. Such a system breaks a data file or stream into contiguous chunks and eliminates duplicate copies by recording references to previous, identical chunks. Numerous studies have investigated content-addressable storage using whole files, fixed-size blocks, content defined chunks and combinations or comparisons of these approaches [3]; generally, these have found that using content-defined chunks improves deduplication rates when small file modifications are stored. Once the data are divided into chunks, it is represented by a secure fingerprint used for deduplication. A technique to decrease the in-memory index requirements is presented in Sparse Indexing, which uses a sampling technique to reduce the size of the fingerprint 12 index. The backup set is broken into relatively large regions in a content-defined manner similar to our super chunks, each containing thousands of chunks. Regions are then deduplicated against a few of the most similar regions that have been previously stored using a sparse, in-memory index with only a small loss of deduplication. While Sparse Indexing is used in a single system to reduce its memory footprint, the notion of sampling within a region of chunks to identify other chunks against which new data may be deduplicated is similar to our sampling approach in stateful routing. We use those matches to direct to a specific node, while they use matches to load a cache for deduplication. Several other deduplication clusters have been presented in the literature. Bhagwat et al. [2] describe a distributed deduplication system based on Extreme Binning : data are forwarded and stored on a file basis and the representative chunk ID is used to determine the destination. An incoming file is only deduplicated against a file with a matching representative chunk ID rather than against all data in the system. Note that Extreme Binning is intended for operations on individual files, not aggregates of all files being backed up together. In the latter case, this approach limits deduplication when inter-file locality is poor, suffers from increased cache misses and data skew and requires multiple passes over the data when these aggregates are too big to fit in memory. DEBAR [10] also deduplicates individual files written to their cluster. Unlike our system, DEBAR deduplicates files partially as they are written to disk and completes deduplication during post-processing by sharing fingerprints between nodes. HYDRAstor [8] is a cluster deduplication storage system that creates chunks from a backup stream and routes chunks to storage nodes and HydraFS [5] is a file system built on top of the underlying HYDRAstor architecture. Throughput of hundreds of MB/s is achieved on 4-12 storage nodes while using 64 KB-sized chunks. Individual chunks are routed by evenly partitioning fingerprint space across storage nodes, which is similar to the routing techniques used by Avamar [11] and PureDisk [7]. In comparison, our system uses larger super-chunks for routing to maximize cache locality and throughput but also uses smaller chunks for deduplication to achieve higher deduplication. Choosing the right chunking granularity presents a tradeoff between deduplication and system capacity and throughput even in a single-node system. Bimodal chunking [9] is based on the observation that using large chunks reduces metadata overhead and improves throughput, but large chunks fail to recover some deduplication opportunities when they straddle the point where new data are added to the stream. Bimodal chunking tries to identify such points and uses a smaller chunk size around them for better deduplication. III. DATA REDUNDANCY CONTROL SCHEMES FOR CLOUD SERVERS Data deduplication, an effective data compression approach that exploits data redundancy, partitions large data objects into smaller parts, called chunks, represents these chunks by their fingerprints replaces the duplicate chunks with their fingerprints after chunk fingerprint index lookup and only transfers or stores the unique chunks for the purpose of communication or storage efficiency. Source deduplication that eliminates redundant data at the client site is obviously preferred to target deduplication due to the former s ability to significantly reduce the amount of data transferred over wide area network (WAN) with low communication bandwidth [1]. For dataset with logical size L and physical size P, source deduplication can reduce the data transfer time to P/L that of traditional cloud backup. Data deduplication is a resourceintensive process, which entails the CPU-intensive hash calculations for chunking and fingerprinting and the I/O-intensive operations for identifying and eliminating duplicate data. Such resources are limited in a typical personal computing device. Therefore, it is desirable to achieve a tradeoff between deduplication effectiveness and system overhead for personal computing devices with limited system resources. In the traditional storage stack comprising applications, file systems and storage hardware, each of the layers contains different kinds of information about the data they manage and such information in one layer is typically not available to any other layers. Codesign for storage and application is possible to optimize deduplication based storage system when the lower-level storage layer has extensive knowledge about the data structures and their access characteristics in the higher-level application layer. ADMAD improves 402 Techscripts

3 redundancy detection by application- specific chunking methods that exploit the knowledge about concrete file formats. ViDeDup [4] is a frame-work for video deduplication based on an application-level view of redundancy at the content level rather than at the byte level. But all these prior work only focus on the effectiveness of deduplication to remove more redundancy without consider the system overheads for high efficiency in deduplication process. In this paper, we propose ALG-Dedupe, an Application aware Local-Global source deduplication scheme that not only exploits application awareness, but also combines local and global duplication detection, to achieve high deduplication efficiency by reducing the deduplication latency to as low as the application-aware local deduplication while saving as much cloud storage cost as the application-aware global deduplication [6]. Our application-aware deduplication design is motivated by the systematic deduplication analysis on personal storage. We observe that there is a significant difference among different types of applications in the personal computing environment in terms of data redundancy, sensitivity to different chunking methods and independence in the deduplication process. Thus, the basic idea of ALG- Dedupe is to effectively exploit this application difference and awareness by treating different types of applications independently and adaptively during the local and global duplicate check processes to significantly improve the deduplication efficiency and reduce the system overhead. We have made several contributions in the paper. We propose a new metric, bytes saved per second, to measure the efficiency of different deduplication schemes on the same platform. We design an application-aware deduplication scheme that employs an intelligent data chunking method and an adaptive use of hash functions to minimize computational overhead and maximize deduplication effectiveness by exploiting application awareness. We combine local deduplication and global deduplication to balance the effectiveness and latency of deduplication. To relieve the disk index lookup bottleneck, we provide application-aware index structure to suppress redundancy independently and in parallel by dividing a central index into many independent small indices to optimize lookup performance. We also propose a data aggregation strategy at the client side to improve data transfer efficiency by grouping many small data packets into a single larger one for cloud storage. Our prototype implementation and real dataset driven evaluations show that our ALG-Dedupe outperforms the existing state-of-the-art source deduplication schemes in terms of backup window, energy efficiency and cost saving for its high deduplication efficiency and low system overhead. IV. APPLICATION-AWARE DEDUPLICATION PROCESS ALG-Dedupe are designed to meet the requirement of deduplication efficiency with high deduplication effectiveness and low system overhead. The main idea of ALG Dedupe is 1) exploiting both low-overhead local resources and high-overhead cloud resources to reduce the computational overhead by employing an intelligent data chunking scheme and an adaptive use of hash functions based on application awareness and 2) to mitigate the on-disk index lookup bottleneck by dividing the full index into small independent and application-specific indices in an application- aware index structure. It combines local-global source deduplication with application awareness to improve deduplication effectiveness with low system overhead on the client side. A. File Size Analysis Most of the files in the PC dataset are tiny files that less than 10 KB in file size, accounting for a negligibly small percentage of the storage capacity. As shown in our statistical evidences, about 60.3 percent of all files are tiny files, accounting for only 1.7 percent of the total storage capacity of the dataset. To reduce the metadata overhead, ALG-Dedupe filters out these tiny files in the file size filter before the deduplication process and groups data from many tiny files together into larger units of about 1 MB each in the segment store to increase the data transfer efficiency over WAN. B. Data Chunking Process The deduplication efficiency of data chunking scheme among different applications differs greatly. Depending on whether the file type is compressed or whether SC can outperform CDC in deduplication efficiency, we divide files into three main categories: compressed files, static uncompressed files and dynamic uncompressed files. The dynamic files are always editable, while the static files are uneditable in common. To strike a better tradeoff between duplicate elimination ratio and deduplication overhead, we deduplicate compressed files with WFC, separate static uncompressed files into fix-sized chunks by SC with ideal chunk size and break dynamic uncompressed files into variable-sized chunks with optimal average chunk size using CDC based on the Rabin fingerprinting to identify chunk boundaries. C. Application-Aware Deduplicator After data chunking in intelligent chunker module, data chunks will be deduplicated in the applicationaware deduplicator by generating chunk fingerprints in the hash engine and detecting duplicate chunks in 403 Techscripts

4 both the local client and remote cloud. ALG-Dedupe strikes a good balance between alleviating computation overhead on the client side and avoiding hash collision to keep data integrity. We employ an extended 12- byte Rabin hash value as chunk fingerprint for local duplicate data detection and a MD5 value for global duplicate detection of compressed files with WFC. In both local and global detection scenarios, a SHA-1 value of chunk serves as chunk fingerprint of SC in static uncompressed files and a MD5 value is used as chunk fingerprint of dynamic uncompressed files since chunk length is another dimension for duplicate detection in CDC-based deduplication. To achieve high deduplication efficiency, the application aware deduplicator first detects duplicate data in the application-aware local index corresponding to the local datasetwith low deduplication latency in the PC client and then compares local deduplicated data chunks with all data stored in the cloud by looking up fingerprints in the application-aware global index on the cloud side for high data reduction ratio. Only the unique data chunks after global duplicate detection are stored in the cloud storage with parallel container management. D. Modified AES Algorithm The Advanced Encryption Standard (AES) is an encryption standard adopted by the U.S. government. The standard comprises three block ciphers: AES-128, AES-192 and AES-256, adopted from a larger collection originally published as Rijndael. Each AES cipher has a 128-bit block size, with key sizes of 128, 192 and 256 bits, respectively. The AES ciphers have been analyzed extensively and are now used worldwide, as was the case with their predecessor, the Data Encryption Standard (DES). The AES has now replaced DES as the preferred encryption standard. AES is a cryptographically secure encryption algorithm. A brute force attack requires 2128 trials for the 128-bit key size. In addition, the structure of the algorithm and the round functions used in it ensure high immunity to linear and differential cryptanalysis. Attacks against AES haven't been successful till now and it is the current encryption standard. The AES design can be used in any application that requires protection of data during transmission through the communication network, including applications such as electronic commerce transactions, ATM machines and wireless communication. To increase the robustness of the AES algorithm, we have to use longer encryption keys and larger data matrix. To keep the processing time at low values, we have to maintain unchanged the complexity of the AES algorithm. The modified AES algorithm (MAES) will work on data matrices. The encryption key will have an equivalent length of about 384, 512, 768 and 1024 bits and the modified algorithm is denoted according to it: MAES-384, MAES- 512, MAES-768 and MAES V. ISSUES ON REDUNDANT DATA CONTROL SCHEMES Application aware Local-Global source Deduplication (ALG-Dedupe) scheme is used to control redundancy in cloud backups. File size filter is used to separate the small size files. Application aware chunking strategy is used in Intelligent Chunker to break the backup data streams. Application aware deduplicator deduplicates the data chunks from the same type of files. Hash engine is used to generate chunk finger prints. Data redundancy check is carried out in application-aware indices in both local client and remote cloud. File metadata is updated with redundant chunk location details. Segments and corresponding finger prints are stored in the cloud data center using self describing data structure (container). The following problems are identified from the existing system. They are resource constrained mobile devices are not supported, data security is not considered, deduplication is not applied for small size files and backup window size selection is not optimized. VI. SECURITY ENSURED REDUNDANT DATA MANAGEMENT The deduplication system is adapted for the Computer and Smart phone clients. The system provides security for the backup data values. Small size files are also included in the deduplication process. The system is divided into six major modules. They are Cloud Backup Server, Chunking Process, Block level Deduplication, File level Deduplication, Security Process and Deduplication in Smart Phones. The cloud backup server module is designed to maintain the backup data for the clients. Chunking process module is designed to split the file into blocks. Block signature generation and deduplication operations are carried out in block level deduplication module. File level deduplication module is designed to perform deduplication in file level. Data security module is designed to protect the backup data values. Deduplication process is performed in the mobile phones in Deduplication under Smart phones module. A. Cloud Backup Server The cloud backup server module is designed to maintain the backup data for the clients. Chunking process module is designed to split the file into blocks. Block signature generation and deduplication operations are carried out in block level deduplication module. File level deduplication module is designed to perform deduplication in file level. Data security module is designed to protect the backup data values. Deduplication process is performed in the mobile phones in Deduplication under Smart phones module. 404 Techscripts

5 Cloud Server Upload Chunks Security Process Local Deduplicat ion Global Deduplica tion Local Deduplicat ion PC Client PC Client Figure 1: Security Ensured Redundant Data Management Scheme B. Chunking Process File size filter is used to separate the tiny files. Intelligent chunker is used to breakup the large size files into chunks. Backup files are divided into three categories. They are compressed files, static uncompressed files and dynamic uncompressed files. Static files are uneditable and dynamic files are editable. Compressed files are chunked with Whole File Chunking (WFC) mechanism. Static uncompressed files are partitioned into fix-sized chunks by Static Chunking (SC). Dynamic uncompressed files are braked into variable-sized chunks by Content Defined Chunking (CDC). C. Block Level Deduplication Chunks finger prints are generated in the hash engine. Rabin hash functions with 12 bytes are used as chunk fingerprint for local duplicate data detection for compressed files. Message Digest MD5 algorithm is used for global deduplication process in compressed files. Secure Hash Algorithm (SHA1) is used for deduplication in uncompressed static files. Chunks finger prints are generated in the hash engine. Rabin hash functions with 12 bytes are used as chunk fingerprint for local duplicate data detection for compressed files. Message Digest MD5 algorithm is used for global deduplication process in compressed files. Secure Hash Algorithm (SHA1) is used for deduplication in uncompressed static files. Dynamic uncompressed files are hashed using Message Digest (MD5) algorithm. Deduplicate detection is carried out in the local client 405 Techscripts

6 and remote cloud. Fingerprints are indexed in local and global level. Deduplication is performed by verifying the finger print index values. D. File Level Deduplication Tiny files are maintained under segment store environment. File level deduplication is performed on files with the size less than 10 KB. File level fingerprints are generated using Rabin hash Function. Deduplication is performed with file level fingerprint index verification mechanism. E. Security Process The backup data values are maintained in encrypted form. Modified Advanced Encryption Standard (MAES) algorithm is used in the encryption/decryption process. Encryption process is performed after the deduplication process. Local and global keys are used for the data security process. Deduplication in Smart Phones. F. Deduplication in Smart Phones The deduplication process is tuned for smart phone environment. Smart phones are used as client for cloud backup services. File level and block level deduplication tasks are supported by the system. Data security is also provided for the smart phone environment. VII. CONCLUSION Cloud storage mediums are used to manage public/private data values. Source deduplication methods are applied to limit the storage and communication requirements. Application aware Local-Global source Deduplication (ALG-Dedupe) mechanism performs redundancy filtering in same client and all client environments. The Security ensured Application aware Local-Global source Deduplication (SALG-Dedupe) scheme is designed with security and mobile device support features. Deduplication and power efficiency is improved in the computer and smart devices environment. The system reduces the cost for cloud backup services. Data access rate is increased by the system. The system achieves intra-client and inter-client redundancy with high deduplication effectiveness. REFERENCES [1] P. Shilane, M. Huang, G. Wallace and W. Hsu, WAN Optimized Replication of Backup Datasets Using Stream- Informed Delta Compression, in Proc. 10th USENIX Conf. FAST, 2012, pp [2] D. Bhagwat, K. Eshghi, D. D. Long and M. Lillibridge. Extreme binning: scalable, parallel deduplication for chunk-based file backup. In MASCOTS 09: Proceedings of the 17th IEEE International Symposium on Modeling, Analysis and Simulation of Computer and Telecommunication Systems, Sept [3] D. T.Meyer and W. J. Bolosky. A Study of Practical Deduplication. In FAST 11: Proceedings of the 9th Conference on File and Storage Technologies, February [4] A. Katiyar and J. Weissman, ViDeDup: An Application-Aware Framework for Video De-Duplication, in Proc. 3rd USENIX Workshop Hot-Storage File Syst., 2011, pp [5] C. Ungureanu, Aranya and A. Bohra. HydraFS: a high throughput files system for the HYDRAstor content-addressable storage system. In FAST 10: Proceedings of the 8th Conference on File and Storage Technologies, February [6] Cheng-Kang Chu, Sherman S.M. Chow and Robert H. Deng, Key-Aggregate Cryptosystem for Scalable Data Sharing in Cloud Storage IEEE Transactions On Parallel And Distributed Systems, Vol. 25, No. 2, February 2014 [7] M. Dewaikar. Symantec NetBackup PureDisk: optimizing backups with deduplication for remote offices, data center and virtual machines. tbackup_puredisk_wp-en-us.pdf, September [8] C. Dubnicki, G. Leszek, H. Lukasz, C. Ungureanu and M. Welnicki. HYDRAstor: ascalable secondary storage. In FAST 09: Proceedings of the 7th conference on File and Storage Technologies, pages , February [9] E. Kruus, C. Ungureanu and C. Dubnicki. Bimodal content defined chunking for backup streams. In FAST 10: Proceedings of the 8th Conference on File and Storage Technologies, February [10] T. Yang, D. Feng, Z. Niu, K. Zhou and Y. Wan. DEBAR: a scalable high-performance deduplication storage system for backup and archiving. In IEEE International Symposium on Parallel& Distributed Processing, May [11] EMC Corporation. Efficient data protection with EMC Avamar global deduplication software. collateral/software/white-papers/h2681-efdta-prot-av%amar.pdf, July Techscripts

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department

More information

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

ALG De-dupe for Cloud Backup Services of personal Storage Uma Maheswari.M, umajamu30@gmail.com DEPARTMENT OF ECE, IFET College of Engineering

ALG De-dupe for Cloud Backup Services of personal Storage Uma Maheswari.M, umajamu30@gmail.com DEPARTMENT OF ECE, IFET College of Engineering ALG De-dupe for Cloud Backup Services of personal Storage Uma Maheswari.M, umajamu30@gmail.com DEPARTMENT OF ECE, IFET College of Engineering ABSTRACT Deduplication due to combination of resource intensive

More information

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,

More information

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP 1 M.SHYAMALA DEVI, 2 V.VIMAL KHANNA, 3 M.SHAHEEN SHAH 1 Assistant Professor, Department of CSE, R.M.D.

More information

A Novel Deduplication Avoiding Chunk Index in RAM

A Novel Deduplication Avoiding Chunk Index in RAM A Novel Deduplication Avoiding Chunk Index in RAM 1 Zhike Zhang, 2 Zejun Jiang, 3 Xiaobin Cai, 4 Chengzhang Peng 1, First Author Northwestern Polytehnical University, 127 Youyixilu, Xi an, Shaanxi, P.R.

More information

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy, and Hong Tang University of California at Santa Barbara, Alibaba Inc. Abstract In a virtualized

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

Tradeoffs in Scalable Data Routing for Deduplication Clusters

Tradeoffs in Scalable Data Routing for Deduplication Clusters Tradeoffs in Scalable Data Routing for Deduplication Clusters Wei Dong Princeton University Fred Douglis EMC Kai Li Princeton University and EMC Hugo Patterson EMC Sazzala Reddy EMC Philip Shilane EMC

More information

AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment

AA-Dedupe: An Application-Aware Source Deduplication Approach for Cloud Backup Services in the Personal Computing Environment !000111111 IIIEEEEEEEEE IIInnnttteeerrrnnnaaatttiiiooonnnaaalll CCCooonnnfffeeerrreeennnccceee ooonnn CCCllluuusssttteeerrr CCCooommmpppuuutttiiinnnggg AA-Dedupe: An Application-Aware Source Deduplication

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information

CURRENTLY, the enterprise data centers manage PB or

CURRENTLY, the enterprise data centers manage PB or IEEE TRANSACTIONS ON CLOUD COMPUTING, VOL. 61, NO. 11, JANUARY 21 1 : Distributed Deduplication for Big Storage in the Cloud Shengmei Luo, Guangyan Zhang, Chengwen Wu, Samee U. Khan, Senior Member, IEEE,

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India ahmore@comp.maepune.ac.in Zishan Shaikh M.A.E. Alandi, Pune, India zishan366shaikh@gmail.com Vishal Salve

More information

The assignment of chunk size according to the target data characteristics in deduplication backup system

The assignment of chunk size according to the target data characteristics in deduplication backup system The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,

More information

A Deduplication-based Data Archiving System

A Deduplication-based Data Archiving System 2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System

More information

Turnkey Deduplication Solution for the Enterprise

Turnkey Deduplication Solution for the Enterprise Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for

More information

Data Backup and Archiving with Enterprise Storage Systems

Data Backup and Archiving with Enterprise Storage Systems Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia slavjan_ivanov@yahoo.com,

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

How To Make A Backup System More Efficient

How To Make A Backup System More Efficient Identifying the Hidden Risk of Data De-duplication: How the HYDRAstor Solution Proactively Solves the Problem October, 2006 Introduction Data de-duplication has recently gained significant industry attention,

More information

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression

WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression WAN Optimized Replication of Backup Datasets Using Stream-Informed Delta Compression Philip Shilane, Mark Huang, Grant Wallace, and Windsor Hsu Backup Recovery Systems Division EMC Corporation Abstract

More information

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside Managing the information that drives the enterprise STORAGE Buying Guide: DEDUPLICATION inside What you need to know about target data deduplication Special factors to consider One key difference among

More information

sulbhaghadling@gmail.com

sulbhaghadling@gmail.com www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 3 March 2015, Page No. 10715-10720 Data DeDuplication Using Optimized Fingerprint Lookup Method for

More information

Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity

Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity Reliability-Aware Deduplication Storage: Assuring Chunk Reliability and Chunk Loss Severity Youngjin Nam School of Computer and Information Technology Daegu University Gyeongsan, Gyeongbuk, KOREA 712-714

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave annon Tivoli Storage Manager rchitect March 2009 Topics Tivoli Storage, IM Software Group Deduplication technology Data reduction and deduplication in

More information

Data Deduplication: An Essential Component of your Data Protection Strategy

Data Deduplication: An Essential Component of your Data Protection Strategy WHITE PAPER: THE EVOLUTION OF DATA DEDUPLICATION Data Deduplication: An Essential Component of your Data Protection Strategy JULY 2010 Andy Brewerton CA TECHNOLOGIES RECOVERY MANAGEMENT AND DATA MODELLING

More information

3Gen Data Deduplication Technical

3Gen Data Deduplication Technical 3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and

More information

A Survey on Deduplication Strategies and Storage Systems

A Survey on Deduplication Strategies and Storage Systems A Survey on Deduplication Strategies and Storage Systems Guljar Shaikh ((Information Technology,B.V.C.O.E.P/ B.V.C.O.E.P, INDIA) Abstract : Now a day there is raising demands for systems which provide

More information

A Method of Deduplication for Data Remote Backup

A Method of Deduplication for Data Remote Backup A Method of Deduplication for Data Remote Backup Jingyu Liu 1,2, Yu-an Tan 1, Yuanzhang Li 1, Xuelan Zhang 1, Zexiang Zhou 3 1 School of Computer Science and Technology, Beijing Institute of Technology,

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features to reduce the complexity of managing data protection

More information

Inline Deduplication

Inline Deduplication Inline Deduplication binarywarriors5@gmail.com 1.1 Inline Vs Post-process Deduplication In target based deduplication, the deduplication engine can either process data for duplicates in real time (i.e.

More information

Demystifying Deduplication for Backup with the Dell DR4000

Demystifying Deduplication for Backup with the Dell DR4000 Demystifying Deduplication for Backup with the Dell DR4000 This Dell Technical White Paper explains how deduplication with the DR4000 can help your organization save time, space, and money. John Bassett

More information

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup International Journal of Machine Learning and Computing, Vol. 4, No. 4, August 2014 Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup M. Shyamala

More information

Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup

Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup M. Shyamala Devi and Steven S. Fernandez Abstract Cloud Storage provide users with abundant storage

More information

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011 the Availability Digest Data Deduplication February 2011 What is Data Deduplication? Data deduplication is a technology that can reduce disk storage-capacity requirements and replication bandwidth requirements

More information

Wide-area Network Acceleration for the Developing World. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton)

Wide-area Network Acceleration for the Developing World. Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton) Wide-area Network Acceleration for the Developing World Sunghwan Ihm (Princeton) KyoungSoo Park (KAIST) Vivek S. Pai (Princeton) POOR INTERNET ACCESS IN THE DEVELOPING WORLD Internet access is a scarce

More information

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng and Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong,

More information

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges

Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges Reference Guide WindSpring Data Management Technology (DMT) Solving Today s Storage Optimization Challenges September 2011 Table of Contents The Enterprise and Mobile Storage Landscapes... 3 Increased

More information

De-duplication-based Archival Storage System

De-duplication-based Archival Storage System De-duplication-based Archival Storage System Than Than Sint Abstract This paper presents the disk-based backup system in which only relational database files are stored by using data deduplication technology.

More information

BALANCING FOR DISTRIBUTED BACKUP

BALANCING FOR DISTRIBUTED BACKUP CONTENT-AWARE LOAD BALANCING FOR DISTRIBUTED BACKUP Fred Douglis 1, Deepti Bhardwaj 1, Hangwei Qian 2, and Philip Shilane 1 1 EMC 2 Case Western Reserve University 1 Starting Point Deduplicating disk-based

More information

Berkeley Ninja Architecture

Berkeley Ninja Architecture Berkeley Ninja Architecture ACID vs BASE 1.Strong Consistency 2. Availability not considered 3. Conservative 1. Weak consistency 2. Availability is a primary design element 3. Aggressive --> Traditional

More information

Contents. WD Arkeia Page 2 of 14

Contents. WD Arkeia Page 2 of 14 Contents Contents...2 Executive Summary...3 What Is Data Deduplication?...4 Traditional Data Deduplication Strategies...5 Deduplication Challenges...5 Single-Instance Storage...5 Fixed-Block Deduplication...6

More information

A Method of Deduplication for Data Remote Backup

A Method of Deduplication for Data Remote Backup A Method of Deduplication for Data Remote Backup Jingyu Liu 1,2, Yu-an Tan 1, Yuanzhang Li 1, Xuelan Zhang 1, and Zexiang Zhou 3 1 School of Computer Science and Technology, Beijing Institute of Technology,

More information

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem

Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Identifying the Hidden Risk of Data Deduplication: How the HYDRAstor TM Solution Proactively Solves the Problem Advanced Storage Products Group Table of Contents 1 - Introduction 2 Data Deduplication 3

More information

0 %1 10 & ( )! 2/(( 3 & (! 4! 5.000! 6& /6&7.189 2 ( 2 &7

0 %1 10 & ( )! 2/(( 3 & (! 4! 5.000! 6& /6&7.189 2 ( 2 &7 !!! #! %& ( ) +,. /.000. 1+,, 1 1+, 0 %1 10 & ( ).000. 1+,, 1 1+, 0 %1 10 & ( )! 2/(( 3 & (! 4! 5.000! 6& /6&7.189 2 ( 2 &7 4 ( (( 1 10 & ( : Dynamic Data Deduplication in Cloud Storage Waraporn Leesakul,

More information

Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos

Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos Building a High Performance Deduplication System Fanglu Guo and Petros Efstathopoulos Symantec Research Labs Symantec FY 2013 (4/1/2012 to 3/31/2013) Revenue: $ 6.9 billion Segment Revenue Example Business

More information

Efficient Cloud Storage Management Using DHT Mechanism

Efficient Cloud Storage Management Using DHT Mechanism Efficient Cloud Storage Management Using DHT Mechanism D.K.Karthika *, G.Sudhakar, D.Sugumar * PG Student, Department of CSE, Ranganathan Engineering College, Coimbatore, India HOD, Department of CSE,

More information

CISCO WIDE AREA APPLICATION SERVICES (WAAS) OPTIMIZATIONS FOR EMC AVAMAR

CISCO WIDE AREA APPLICATION SERVICES (WAAS) OPTIMIZATIONS FOR EMC AVAMAR PERFORMANCE BRIEF CISCO WIDE AREA APPLICATION SERVICES (WAAS) OPTIMIZATIONS FOR EMC AVAMAR INTRODUCTION Enterprise organizations face numerous challenges when delivering applications and protecting critical

More information

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Quanqing XU Quanqing.Xu@nicta.com.au YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data

More information

Trends in Enterprise Backup Deduplication

Trends in Enterprise Backup Deduplication Trends in Enterprise Backup Deduplication Shankar Balasubramanian Architect, EMC 1 Outline Protection Storage Deduplication Basics CPU-centric Deduplication: SISL (Stream-Informed Segment Layout) Data

More information

Understanding EMC Avamar with EMC Data Protection Advisor

Understanding EMC Avamar with EMC Data Protection Advisor Understanding EMC Avamar with EMC Data Protection Advisor Applied Technology Abstract EMC Data Protection Advisor provides a comprehensive set of features that reduce the complexity of managing data protection

More information

HP Data Protector software and HP StoreOnce backup systems for federated deduplication and flexible deployment

HP Data Protector software and HP StoreOnce backup systems for federated deduplication and flexible deployment Brochure Maximize storage efficiency across the enterprise HP Data Protector software and HP StoreOnce backup systems for federated deduplication and flexible deployment Maximize storage efficiency across

More information

Deduplication has been around for several

Deduplication has been around for several Demystifying Deduplication By Joe Colucci Kay Benaroch Deduplication holds the promise of efficient storage and bandwidth utilization, accelerated backup and recovery, reduced costs, and more. Understanding

More information

Edelta: A Word-Enlarging Based Fast Delta Compression Approach

Edelta: A Word-Enlarging Based Fast Delta Compression Approach : A Word-Enlarging Based Fast Delta Compression Approach Wen Xia, Chunguang Li, Hong Jiang, Dan Feng, Yu Hua, Leihua Qin, Yucheng Zhang School of Computer, Huazhong University of Science and Technology,

More information

Don t be duped by dedupe - Modern Data Deduplication with Arcserve UDP

Don t be duped by dedupe - Modern Data Deduplication with Arcserve UDP Don t be duped by dedupe - Modern Data Deduplication with Arcserve UDP by Christophe Bertrand, VP of Product Marketing Too much data, not enough time, not enough storage space, and not enough budget, sound

More information

Service Overview CloudCare Online Backup

Service Overview CloudCare Online Backup Service Overview CloudCare Online Backup CloudCare s Online Backup service is a secure, fully automated set and forget solution, powered by Attix5, and is ideal for organisations with limited in-house

More information

Metadata Feedback and Utilization for Data Deduplication Across WAN

Metadata Feedback and Utilization for Data Deduplication Across WAN Zhou B, Wen JT. Metadata feedback and utilization for data deduplication across WAN. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 31(3): 604 623 May 2016. DOI 10.1007/s11390-016-1650-6 Metadata Feedback

More information

Byte-index Chunking Algorithm for Data Deduplication System

Byte-index Chunking Algorithm for Data Deduplication System , pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko

More information

E-Guide. Sponsored By:

E-Guide. Sponsored By: E-Guide An in-depth look at data deduplication methods This E-Guide will discuss the various approaches to data deduplication. You ll learn the pros and cons of each, and will benefit from independent

More information

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

Deduplication and Beyond: Optimizing Performance for Backup and Recovery Beyond: Optimizing Gartner clients using deduplication for backups typically report seven times to 25 times the reductions (7:1 to 25:1) in the size of their data, and sometimes higher than 100:1 for file

More information

DXi Accent Technical Background

DXi Accent Technical Background TECHNOLOGY BRIEF NOTICE This Technology Brief contains information protected by copyright. Information in this Technology Brief is subject to change without notice and does not represent a commitment on

More information

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

EMC Data Domain Boost for Oracle Recovery Manager (RMAN) White Paper EMC Data Domain Boost for Oracle Recovery Manager (RMAN) Abstract EMC delivers Database Administrators (DBAs) complete control of Oracle backup, recovery, and offsite disaster recovery with

More information

Frequently Asked Questions

Frequently Asked Questions Frequently Asked Questions 1. Q: What is the Network Data Tunnel? A: Network Data Tunnel (NDT) is a software-based solution that accelerates data transfer in point-to-point or point-to-multipoint network

More information

WHITE PAPER BRENT WELCH NOVEMBER

WHITE PAPER BRENT WELCH NOVEMBER BACKUP WHITE PAPER BRENT WELCH NOVEMBER 2006 WHITE PAPER: BACKUP TABLE OF CONTENTS Backup Overview 3 Background on Backup Applications 3 Backup Illustration 4 Media Agents & Keeping Tape Drives Busy 5

More information

Symantec NetBackup PureDisk Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines

Symantec NetBackup PureDisk Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines Optimizing Backups with Deduplication for Remote Offices, Data Center and Virtual Machines Mayur Dewaikar Sr. Product Manager Information Management Group White Paper: Symantec NetBackup PureDisk Symantec

More information

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved.

Redefining Backup for VMware Environment. Copyright 2009 EMC Corporation. All rights reserved. Redefining Backup for VMware Environment 1 Agenda VMware infrastructure backup and recovery challenges Introduction to EMC Avamar Avamar solutions for VMware infrastructure Key takeaways Copyright 2009

More information

Data Deduplication and Corporate PC Backup

Data Deduplication and Corporate PC Backup A Druva White Paper Data Deduplication and Corporate PC Backup This Whitepaper explains source based deduplication technology and how it is used by Druva s insync product to save storage bandwidth and

More information

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen

Design and Implementation of a Storage Repository Using Commonality Factoring. IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Design and Implementation of a Storage Repository Using Commonality Factoring IEEE/NASA MSST2003 April 7-10, 2003 Eric W. Olsen Axion Overview Potentially infinite historic versioning for rollback and

More information

Protect Microsoft Exchange databases, achieve long-term data retention

Protect Microsoft Exchange databases, achieve long-term data retention Technical white paper Protect Microsoft Exchange databases, achieve long-term data retention HP StoreOnce Backup systems, HP StoreOnce Catalyst, and Symantec NetBackup OpenStorage Table of contents Introduction...

More information

Key Components of WAN Optimization Controller Functionality

Key Components of WAN Optimization Controller Functionality Key Components of WAN Optimization Controller Functionality Introduction and Goals One of the key challenges facing IT organizations relative to application and service delivery is ensuring that the applications

More information

Application-Aware Client-Side Data Reduction and Encryption of Personal Data in Cloud Backup Services

Application-Aware Client-Side Data Reduction and Encryption of Personal Data in Cloud Backup Services Fu YJ, Xiao N, Liao XK et al. Application-aware client-side data reduction and encryption of personal data in cloud backup services. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 28(6): 1012 1024 Nov. 2013.

More information

EMC BACKUP-AS-A-SERVICE

EMC BACKUP-AS-A-SERVICE Reference Architecture EMC BACKUP-AS-A-SERVICE EMC AVAMAR, EMC DATA PROTECTION ADVISOR, AND EMC HOMEBASE Deliver backup services for cloud and traditional hosted environments Reduce storage space and increase

More information

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014

VMware Virtual SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Data Protection Advanced SEPTEMBER 2014 VMware SAN Backup Using VMware vsphere Table of Contents Introduction.... 3 vsphere Architectural Overview... 4 SAN Backup

More information

Maximize Your Virtual Environment Investment with EMC Avamar. Rob Emsley Senior Director, Product Marketing

Maximize Your Virtual Environment Investment with EMC Avamar. Rob Emsley Senior Director, Product Marketing 1 Maximize Your Virtual Environment Investment with EMC Avamar Rob Emsley Senior Director, Product Marketing 2 Private Cloud is the Vision Virtualized Data Center Internal Cloud Trusted Flexible Control

More information

Cloud De-duplication Cost Model THESIS

Cloud De-duplication Cost Model THESIS Cloud De-duplication Cost Model THESIS Presented in Partial Fulfillment of the Requirements for the Degree Master of Science in the Graduate School of The Ohio State University By Christopher Scott Hocker

More information

Data Deduplication Background: A Technical White Paper

Data Deduplication Background: A Technical White Paper Data Deduplication Background: A Technical White Paper NOTICE This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice

More information

Efficient Backup with Data Deduplication Which Strategy is Right for You?

Efficient Backup with Data Deduplication Which Strategy is Right for You? Efficient Backup with Data Deduplication Which Strategy is Right for You? Rob Emsley Senior Director, Product Marketing CPU Utilization CPU Utilization Exabytes Why So Much Interest in Data Deduplication?

More information

A Fast Dual-level Fingerprinting Scheme for Data Deduplication

A Fast Dual-level Fingerprinting Scheme for Data Deduplication A Fast Dual-level Fingerprinting Scheme for Data Deduplication 1 Jiansheng Wei, *1 Ke Zhou, 1,2 Lei Tian, 1 Hua Wang, 1 Dan Feng *1,Corresponding Author Wuhan National Laboratory for Optoelectronics, School

More information

Open Access Improving Read Performance with BP-DAGs for Storage-Efficient File Backup

Open Access Improving Read Performance with BP-DAGs for Storage-Efficient File Backup Send Orders for Reprints to reprints@benthamscience.net 90 The Open Electrical & Electronic Engineering Journal, 2013, 7, 90-97 Open Access Improving Read Performance with BP-DAGs for Storage-Efficient

More information

Backup and Recovery 1

Backup and Recovery 1 Backup and Recovery What is a Backup? Backup is an additional copy of data that can be used for restore and recovery purposes. The Backup copy is used when the primary copy is lost or corrupted. This Backup

More information

Understanding Data Locality in VMware Virtual SAN

Understanding Data Locality in VMware Virtual SAN Understanding Data Locality in VMware Virtual SAN July 2014 Edition T E C H N I C A L M A R K E T I N G D O C U M E N T A T I O N Table of Contents Introduction... 2 Virtual SAN Design Goals... 3 Data

More information

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard UNDERSTANDING DATA DEDUPLICATION Tom Sas Hewlett-Packard SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material

More information

es T tpassport Q&A * K I J G T 3 W C N K V [ $ G V V G T 5 G T X K E G =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX *VVR YYY VGUVRCUURQTV EQO

es T tpassport Q&A * K I J G T 3 W C N K V [ $ G V V G T 5 G T X K E G =K ULLKX LXKK [VJGZK YKX\OIK LUX UTK _KGX *VVR YYY VGUVRCUURQTV EQO Testpassport Q&A Exam : E22-280 Title : Avamar Backup and Data Deduplication Exam Version : Demo 1 / 9 1. What are key features of EMC Avamar? A. Disk-based archive RAID, RAIN, clustering and replication

More information

Reducing impact of data fragmentation caused by in-line deduplication

Reducing impact of data fragmentation caused by in-line deduplication Reducing impact of data fragmentation caused by in-line deduplication Michal Kaczmarczyk, Marcin Barczynski, Wojciech Kilian, and Cezary Dubnicki 9LivesData, LLC {kaczmarczyk, barczynski, wkilian, dubnicki}@9livesdata.com

More information

EMC Data Domain Boost for Oracle Recovery Manager (RMAN)

EMC Data Domain Boost for Oracle Recovery Manager (RMAN) White Paper EMC Data Domain Boost for Oracle Recovery Manager (RMAN) Abstract EMC delivers Database Administrators (DBAs) complete control of Oracle backup, recovery, and offsite disaster recovery with

More information

Get Success in Passing Your Certification Exam at first attempt!

Get Success in Passing Your Certification Exam at first attempt! Get Success in Passing Your Certification Exam at first attempt! Exam : E22-280 Title : Avamar Backup and Data Deduplication Exam Version : Demo 1. What are key features of EMC Avamar? A. Disk-based archive

More information

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group

DEDUPLICATION NOW AND WHERE IT S HEADING. Lauren Whitehouse Senior Analyst, Enterprise Strategy Group DEDUPLICATION NOW AND WHERE IT S HEADING Lauren Whitehouse Senior Analyst, Enterprise Strategy Group Need Dedupe? Before/After Dedupe Deduplication Production Data Deduplication In Backup Process Backup

More information

WHITE PAPER. How Deduplication Benefits Companies of All Sizes An Acronis White Paper

WHITE PAPER. How Deduplication Benefits Companies of All Sizes An Acronis White Paper How Deduplication Benefits Companies of All Sizes An Acronis White Paper Copyright Acronis, Inc., 2000 2009 Table of contents Executive Summary... 3 What is deduplication?... 4 File-level deduplication

More information

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Benjamin Zhu Data Domain, Inc. Kai Li Data Domain, Inc. and Princeton University Hugo Patterson Data Domain, Inc. Abstract Disk-based

More information

RevoScaleR Speed and Scalability

RevoScaleR Speed and Scalability EXECUTIVE WHITE PAPER RevoScaleR Speed and Scalability By Lee Edlefsen Ph.D., Chief Scientist, Revolution Analytics Abstract RevoScaleR, the Big Data predictive analytics library included with Revolution

More information

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper

WHITE PAPER. DATA DEDUPLICATION BACKGROUND: A Technical White Paper WHITE PAPER DATA DEDUPLICATION BACKGROUND: A Technical White Paper CONTENTS Data Deduplication Multiple Data Sets from a Common Storage Pool.......................3 Fixed-Length Blocks vs. Variable-Length

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 58.7 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive

More information

Reducing Replication Bandwidth for Distributed Document Databases

Reducing Replication Bandwidth for Distributed Document Databases Reducing Replication Bandwidth for Distributed Document Databases Lianghong Xu 1, Andy Pavlo 1, Sudipta Sengupta 2 Jin Li 2, Greg Ganger 1 Carnegie Mellon University 1, Microsoft Research 2 #1 You can

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

Data Compression and Deduplication. LOC 2010 2010 Cisco Systems, Inc. All rights reserved.

Data Compression and Deduplication. LOC 2010 2010 Cisco Systems, Inc. All rights reserved. Data Compression and Deduplication LOC 2010 2010 Systems, Inc. All rights reserved. 1 Data Redundancy Elimination Landscape VMWARE DeDE IBM DDE for Tank Solaris ZFS Hosts (Inline and Offline) MDS + Network

More information