Comprehensive study of data de-duplication

Size: px
Start display at page:

Download "Comprehensive study of data de-duplication"

Transcription

1 International Conference on Cloud, ig Data and Trust 2013, Nov 13-15, RGPV Comprehensive study of data de-duplication Deepak Mishra School of Information Technology, RGPV hopal, India Dr. Sanjeev Sharma School of Information Technology, RGPV hopal, India Abstract Cloud computing is an emerging computing paradigm in which resources of the computing infrastructure are provided as services over the Internet. There is limited source of storage and networking in cloud system. So data de-duplication takes an important role in the cloud structure. In this paper data de-duplication process will be discussed in detail. In data deduplication there are several methods available that makes it easy to implement. In this paper we will examine about all methods, processes that are used in data de-duplication. Keywords: cloud computing, Scalability, virtual machine, deduplication. I. INTRODUCTION Clouds are large pools of easily usable and reachable resources. In cloud all resources connected virtually to create single system image. These resources can be dynamically reconfigured to adjust to a flexible load (scale), allowing optimum resource utilization. Cloud storage refers to scalable and elastic storage capabilities that are delivered as a service using Internet technologies with elastic provisioning and usebased pricing that does not penalize users for changing their storage consumption without notice. There are five basic characteristic of any cloud system that are: On-demand self-service road network access Resource pooling Measured service Rapid elasticity Cloud computing contains both hardware and applications provided to users as a service by the Internet. With the fast development of cloud computing, ever more cloud services have emerged, such PaaS (platform as a service), as SaaS (software as a service), and IaaS (infrastructure as a service). Computing resources [19] are limited and eventually any system which grows in data or usage will saturate the resources available to it. The resources in question may be e.g. processing capacity for computationally intensive systems or storage capacity for data intensive systems. Network capability is an important scalability point in distributed systems. Structural scalability alarms the interior design of a system and provides methods to itself to manipulate the data model. It means it provides methods to shrink data model and expand data model. It can understand like its deployment. Cloud storage is a paradigm in which the online storage is networked and records are stored on several committed storage servers. Sometimes these storage servers can be maintained by other third parties. The concept of cloud storage is derived from cloud computing. It denotes to a storage device accessed over the Internet via Web service application program interfaces (API). For example: HDFS (Hadoop Distributed File System, hadoop.apache.org) is a distributed file system that runs on commodity hardware; it was introduced by Apache for managing huge data. Data de-duplication is also known as single instancing or intelligent compression. It essentially points to the removal of replicate data. In the de-duplication process, duplicate data is deleted, only one copy or single instance of the data to be stored in the database. Data de-duplication is a term used to describe an algorithm or technique that eliminates duplicate copies of data from a storage. Data de-duplication is commonly performed on secondary storage systems such as archival and backup storage. II. DATA DE-DUPLICATION The term data de-duplication points to the techniques that [1] saves only one single instance of replicated data, and provide links to that instance of copy in place of storing other original copies of this data. y the evolution of services from tape to disk, data de-duplication has turn into a key element in the backup process. It specifies that only one copy of that data is saved in the datacenter [10]. Every user, who want to access that copy linked to that single instance of copy. So it is clear that data de-duplication help to decrease the size of datacenter. So it could say that de-duplication means that the number of the replication of data that were usually duplicated on the cloud should be controlled and managed to shrink the physical storage space requested for such replications. The basic steps for de-duplication are: a) In first step files are divided into small segments. b) After the segment creation new and the existing data are checked for similarity by comparing fingerprints created by SHA-1 algorithm (another methods can also applicable). c) Then Metadata structures are updated. d) Segments are compressed. 107

2 International Conference on Cloud, ig Data and Trust 2013, Nov 13-15, RGPV e) All the duplicate data is deleted and data integrity check is performed. III. REQUIREMENTS FOR DATA DE-DUPLICATION There is only one necessary condition for the data deduplication is that data de-duplication should be scalable [20]. It means that de-duplication should be elastic. It doesn t affect to the overall storage structure. To handle scalable deduplication, two methods have been proposed, 1. Sparse indexing: Sparse indexing is a method used to solve the chunk lookup blockage caused by disk access, by using sampling and manipulating the inherent locality inside backup streams. It picks a small slice of the chunks in the stream as samples; then, sparse index maps these samples to the existing sections in which they occur. The arriving streams are fragmented into relatively big segments, and each segment is de-duplicated against only some of the most similar previous sections (segments). 2. loom filters with caching. The loom filter exploits Summary Vector. asically summary vector is a compact in-memory data structure, for discovering new segments; and Stream-Informed Segment arrangement, which is a data arrangement method to improve on-disk locality, for consecutively accessed segments; and Locality Well-maintained Caching with cache fragments, which maintains the locality of the impressions of duplicated segments, to achieve high cache hit ratios. IV. TYPES OF DATA DE-DUPLICATION There are two major categories [18] of data de-duplication: Offline Data de-duplication:[7] In an offline deduplication state, first data is written to the storage disk and de-duplication process take place at a later time. Online Data de-duplication: In an online deduplication state, replicate data is deleted before being written to the storage disk. Once the timing of data de-duplication has been decided then there are number of existing techniques that can be apply. The most used de-duplication approaches are: whole file hashing (WFH), sub file hashing (SFH), and delta encoding (DE). Whole File Hashing: In a whole file hashing (WFH) technique, the whole file is directed to a hashing function. The hashing function is always cryptographic hash like MD5 or SHA-1. The cryptographic hash is used to find entire replicate files. This approach is speedy with low computation and low additional metadata overhead. It works very well for complete system backups when total duplicate files are more common. However, the larger granularity of replicate matching stops it from matching two files that only differ by one single byte or bit of data. Sub File Hashing: [12] Sub file hashing (SFH) is appropriately named. Whenever SFH is being used, it means the file is broken into a number of smaller sections before data de-duplication. The number of sections depends on the type of SFH that is being used. The two most common types of SFH are fixedsize chunking and variable-length chunking. In a fixed-size chunking approach, a file is divided up into a number of fixed-size pieces called chunks. In a variable-length chunking approach, a file is broken up into chunks of variable length. Some techniques such as Rabin fingerprinting [28] are applied to determine chunk boundaries. Each section is passed to a cryptographic hash function (usually MD5 or SHA-1) to get the chunk identifier. The chunk identifier is used to locate replicate data. oth of these SFH approaches find replicate data at a finer granularity but at a price. Delta Encoding: The term delta encoding (DE) is comes from the mathematical use of the delta symbol. In math and science, delta is used to calculate the change or rate of change in an object. Delta encoding is applied to show the difference between a source object and a target object. Suppose, if block A is the source and block is the target, the DE of is the difference between A and that is unique to. The expression and storage of the difference depends on how delta encoding is applied. Normally it is used when SFH does not produce results but there is a strong enough similarity between two items/ locks / chunks that storing the difference would take less space than storing the nonduplicate block. V. CATEGORY OF DATA DE-DUPLICATION STRATEGIES Data de-duplication strategies can be categorized according to their operational area. In this respect there are two main data de-duplication strategies: (1) File-level De-duplication [3]: File level de-duplication is performed over a single file. In this type of de-duplication two or more files are identified as similar if they have the same hash value. (2) lock-level de-duplication [3]: lock level deduplication is performed over blocks. It first divide files into blocks and stores only a single copy of each block. It could either use fixed-sized blocks or variable-sized chunks. It can be further divided on the basis of their targeted area [3]. I. Target based de-duplication: This type of de-duplication performed on the target data storage center. In this case the client is unmodified and not aware of any deduplication. This technology improves storage utilization, but does not save bandwidth 108

3 II. Source based de-duplication: This type of deduplication performed on the data at the source before it s transferred. A de-duplication aware backup agent is installed on the client which backs up only unique data. The result is increased bandwidth and storage efficiency. ut, this enforces extra computational load on the backup client. Replicates are changed by pointers and the actual replicate data is never sent over the network. International Conference on Cloud, ig Data and Trust 2013, Nov 13-15, RGPV Andy.txt Andrew.txt Sim.txt VI. KEY FACTOR When data de-duplication is considering two questions arises in the mind those are: 1. How does the system find the data duplications? 2. How does the system maintain and manipulate the data to reduce the repetitions, in other words, to deduplicate them? For the first point, system can use MD5 and SHA-1 [2] algorithm to make a unique fingerprint for each file or data block, and set up a fast fingerprint index to identify the duplications. To delete the duplications, the first step is to discover the duplications. The two normal ways to discover duplications are: 1. Comparing data blocks or files bit by bit: System can compare data blocks or file bit by bit. It guarantee accuracy, but it cost of additional time consumption 2. Comparing data blocks or files by hash values: To compare blocks or files by hash value would be more efficient, but the chances of accidental collisions would be increased. The chance of accidental collision depends on the hash algorithm. However, the chances are so less. Thus, using a combination hash value to discover the duplications will greatly reduce the collision probability. Therefore, it is acceptable to use a hash function to discover duplications. For the second problem, system set up a distribution file system to store data and develop link files to manage files in a distributed file system. The de-duplication process may be very tough, if not incredible, to be performed manually, since real databases may have thousands of millions of records. VII. HOW IT WORKS The data de-duplication process is very systematic and controlled. Here a simple process of data de-duplication are explained with example. Suppose [5] there are some files that will be written in the database. So de-duplication process can be understand by this example. Fig.1: Sample files data and segments In the below figure we have three files named Andy.txt, Andrew.txt and Sim.txt that to be written in the database. When first file Andy.txt stored. Data de-duplication system breaks this file into four parts named A,, C and D. There can be more segments according to de-duplication approach. After breaking into segments system add a hash identifier to all segments for reconstruction [11]. Now all segments stores in the database separately. When another file named Andrew.txt comes to written in the database. Again it divided into four parts named A,, C and D. These segments are as same as Andy.txt file s segments. So now de-duplication system will not store it. System will delete this copy and provide a link to last stored segments. Now when another file named Sim.txt comes to written, System will again breaks it into several parts. In our example it was broken into four parts named E,, C and D. Here only one part E is new. Other parts are already store in the database. So system will store only E part and provide link for another part. So at the end, only five segments of data will be stored in place of 12 blocks as explained in fig.1. So it is clear that it reduces the storage space. If one segment s size is 1M and if we are not applying data de-duplication approach then total space needed is 12M. ut de-duplication system stores only 5 unique segments and provide link to another segments. In this way it saves 7M space. VIII. EXAMPLES OF DE-DUPLICATION STORAGE SYSTEM So far, several de-duplication storage systems have been previously designed. They are used for different storage purpose. The similarity between them is just that all are data de-duplication based storage system. Some of them are: Venti: It is a network storage system. It applies identical hash values to find block contents so that it decreases the data occupation of storage area. Venti generates blocks for huge storage applications and inspire a write-once policy to avoid collision of the data. This network storage system emerged in the early stages of network storage, so it is not suitable to deal with avast data, and the system is not scalable. 109

4 HYDRAstor [4]: It is a scalable, secondary storage solution, which includes a back-end consisting of a grid of storage nodes with a decentralized hash index, and a traditional file system interface as a front-end [15]. The back-end of HYDRAstor is based on Directed Acyclic Graph, which is able to organize large-scale, variable-size, content addressed, immutable, and highly-resilient data blocks. HYDRAstor detects duplications according to the hash table. The ultimate target of this approach is to form a backup system. It does not consider the situation when multiple users need to share files. Extreme inning: It is a scalable, paralleled deduplication approach aiming at a non-traditional backup workload which is composed of low-locality individual files. Extreme inning exploits file similarity instead of locality, and allows only one disk access for blocks lookup per file. Extreme inning arranges similar data files into bins and removes duplicated chunks inside each bin. Duplicates exist among different bins. Extreme inning only keeps the primary index in memory in order to reduce RAM occupation. This approach is not a strict de-duplication method, because duplications will exist among bins. MAD2: It is an accurate de-duplication network backup service which works at both the file level and the block level. It uses four techniques: a hash bucket matrix, a loom filters array, dual cache, and Distributed Hash Table based load balancing, to achieve high performance. This approach is designed for backup service not for a pure storage system. International Conference on Cloud, ig Data and Trust 2013, Nov 13-15, RGPV Example 1: efore De-duplication applied File Name: dic.jpg Compressed file size = 33.1 Kb After De-duplication applied File Name: dic.jpg Number of segments = 1 Compressed file size = 33.1 Kb Now consider another file diccas.jpg which is a copy of dic.jpg efore De-duplication applied File Name: diccas.jpg Compressed file size =33.1 Kb After de-duplication applied File Name: diccas.jpg Duplicate Data Elimination (DDE): DDE applies a combination of content hashing, then copy-on-write, and lazy updates to get the functions of discovering and coalescing similar data blocks in a storage area network file system. It always works in the background. On the other hand, what sets DeDu decisively apart from all these approaches is that DeDu exactly de-duplicates and calculates hash values at the client side right before data transmission, all at file level. IX. PERFORMANCE METRICS Storage Space: When de-duplicated segments are saved on the cloud storage, storage space is reduced. Two runs were performed for this. It can test how it saves storage space by send two files, one original file and a duplicate file. The same process was repeated for different file sizes which get saved in different data bins. Storage space used in both test was recorded. Test outcomes are explained below. Number of segments created = 1 Compressed file size = 33.1 Kb After the de-duplication is performed, only one of the files (either dic.jpg or diccas.jpg) is stores. The other file is removed. Results are shown below. Average file size (of both the files) = (62.7+0)/2 = 31.3 Kb Total space saved= 31.3 Kb Here is an performance chart shown that show clearly how the data de-duplication reduce the overall size of the storage system. 110

5 International Conference on Cloud, ig Data and Trust 2013, Nov 13-15, RGPV Fig.2: Performance chart In this example, the size of the file was in K.That makes it fewer important. ut what will be condition for entire cloud. For the entire cloud value of data could be very large. If system include it for whole cloud then data saving could be so much. It is clear by the above table. In this table only 2 copies of data are considered. It might be many copies which makes it more important. File Name No of copie s efore Deduplication File size Dic.jpg 2 2M Ryno.txt 2 26K Sh.mp M Dam.avi 2 2G After comp. File size After De-duplication No. of seg. After Comp Space saved 1.5M 2M 1 1.5M 1M 20K 26K 7 20K 13K 18M 20M 23 18M 10M 1.8G 2G G 1G X. SECURITY ISSUES Although de-duplication is secure method of the cloud system [7]. ut there is some security holes as well present in this process. asic property of data de-duplication creates a problem to itself. For example: When a query going to server then server responses to that query. Suppose a file is uploaded then a question condition arises that is Did anyone prior store a copy? It means that this particular file is already stored or not? [8][9] Then, this question is answered by the attacker requesting to upload a copy of the file, and noticing whether de-duplication occurs. Note that this is a restricted query: First, the answer is a true/false answer which does not detail who performed the task. Furthermore, in the basic form of attack the attacker can only requests this query once the query is requested by doing an upload of the file; afterwards the file is saved at the upload service and hence the answer to the query will at all times be positive [7].The latter limitation can be removed by the following strategy, the attacker starts uploading a file, and notices whether de-duplication occurs. If de-duplication does not happen, and a full upload starts, then the attacker shuts down the communication channel and terminates the upload. As a result, the copy of the file kept by the attacker is not saved at the server. This facilitates the attacker to repeat the same test at a later time, and check again whether the file was uploaded. Moreover, by applying this procedure at regular breaks, the attacker can get the time window where the file is uploaded. Three attacks [16] on online storage services are possible due to de-duplication. The first two enable an attacker to learn about the contents of files of other users, whereas the third attack describes a new covert channel. Attack I: Discovering Files Attack II: Learning the details of Files Attack III: A Covert Channel XI. ENEFIT There is number of benefit by data de-duplication in cloud network. Cloud network is interconnected. So when one things gives advantages then automatically all system s throughput increases. ut mainly there are three benefit from data deduplication those are: A. Storage-based de-duplication [13] decreases the amount of storage space needed for a given set of data files. It is most in effect in applications where so many replicas of very alike or even indistinguishable data are stored on a common single disk a unexpectedly common scenario. In the case of data backups, which regularly are performed to guard against data harm, most data in a given backup isn't changed from the earlier backup. Common backup systems try to exploit this by neglecting (or hard linking) files that haven't altered or keeping differences between files. Neither approach contains all repetition, however. Hard linking doesn t help with huge files that have only altered in minor ways, such as an database; dissimilarities only find redundancies in end-to-end versions of a single file (assume a segment that was deleted and later added, or a logo image involved in many forms).. Network data de-duplication [17] is used to cut the number of data bytes that must be shifted between endpoints, which can decrease the quantity of bandwidth required. C. Virtual servers help from de-duplication because it permits nominally distinct system files for every virtual server to be merged into a common single storage. At the same time, if a given server alters a 111

6 International Conference on Cloud, ig Data and Trust 2013, Nov 13-15, RGPV file, de-duplication will not alter the files on the servers. XII. TRADE-OFF Whenever data is changed, concerns arise [6] about potential loss of data. y definition of data de-duplication, systems saves data as it is written. It saves data in a different way. So, users are worried about integrity of their data. While data de-duplication can increase storage efficiency, the benefit achieves at a cost. After data de-duplication is performed, a file written to disk serially could appear to be written unsystematically. So, sometimes a read task performed can decrease time to read data. One method for de-duplicating data trusts on the use of hash functions to recognize matching segments of data [14]. This hash function is also used n cryptography. If two different segments of information produce the same hash value, then it is well-known as a collision. The chance of a collision relies upon that hash function which is used, and although the chances are very small, they are every time non zero. The computational resource strength of the procedure can be a challenge to data de-duplication. However, this is seldom concern for stand-alone devices or machines, as the computation is totally unloaded from other systems. This can be a concern when the de-duplication is implanted inside devices providing other services. To increase performance, many systems employ strong and weak hash function. Weak hashes are fast and easy to calculate but there is a greater threat of a hash collision. The reconstruction of files does not need this processing and any incremental performance punishment linked with re-associates of data chunks is not likely to affect application performance. Another issue with de-duplication is related with effect on backups, snapshots and archival, especially where deduplication is applied over primary storage (for example NASfiler). Reading data out of a storage device causes full reconstruction of the files, thus any secondary replica of the data set is possible to be bigger than the primary copy. If consider snapshots, if a file is snapshotted prior to deduplication, the post-de-duplication snapshot will preserve the whole original file. It indicates that while storage volume for primary file copies will shrink, capacity required for snapshots may enlarge theatrically. Another issue is the result of encryption and compression. While de-duplication is a variety of compression, it works in pressure with old compression technique. De-duplication reaches better efficiency against smaller chunks, however compression achieves better proficiency against bigger chunks. The aim of encryption is to remove any discernible forms in the data. Thus encrypted data cannot be de-duplicated, even though the basic data may be redundant. De-duplication finally reduces redundancy. If it was not expected and calculated for, it may destroy the underlying reliability of the system. Scaling (elasticity) has also been an issue for de-duplication systems. Data de-duplication is very profitable for space saving but it generates challenges for reliability and performance. XIII. FUTURE WORK AND ENHANCEMENTS De-duplication is a technique which saves storage space and bandwidth requirements. This technique is implemented on Amazon cloud platform. Eucalyptus platform is another example. A small sub set of the functionalities were successfully implemented. Few of the enhancements that will be desired for the present application are I. De-duplication of files is done for every user bucket. It would like to extend to the entire cloud. Keeping each user bucket as a physical virtualization for only user images and not for the data. II. Concurrent uploads from many node controllers. III. Metadata structures are maintained as files in this version of the application. Going forward, metadata can be stored in a database for easy querying. XIV. CONCLUSION In this paper all information about data de-duplication is discussed. It included all detail about data de-duplication and methods to achieve it. In this paper we discussed wide range of the topic and areas of future research works in the field of data de-duplication. It is not necessary that the area where applying data de-duplication is a homogeneous system. So this leads a challenge to data de-duplication. These challenges can be greater for unstructured files. It can also create challenges to future file system designs. However, data de-duplication is most necessary element of the cloud system. This will definitely lead to development of performance of the cloud system for user and business perspective. REFERENCES [1] H. Huang, W. Hung, and K. G. Shin. Fs2: dynamic data replication in free disk space for improving disk performance and energy consumption. On 2005 in Proc. 20th ACM Symposium on Operating Systems Principles. [2] P. Kulkarni, J. LaVoie, F. Douglis and J. Tracey Redundancy elimination within large collections of files. On 2004 in Proc. USENIX 2004 Annual Technical Conference. [3] K. Jin and E. Miller. The effectiveness of deduplication on virtual machine disk images. In Proc. SYSTOR 2009: The Israeli Experimental Systems Conference. [4]. Atkin, C. Ungureanu, C. Dubnicki, A. Aranya, S. Rago, S. Gokhale, G. Cakowski, and A. ohra. Hydrafs: A highthroughput file system for the Hydrastor contentaddressable storage system In Proc. 8th USENIX Conference on File and Storage Technologies. [5] C. Kruus. And E. Ungureanu imodal content defined chunking for backup streams, On 2010 In Proc. 8Th USENIX Conference on File and Storage Technologies. [6]. Zhu, H. Patterson and K. Li. Avoiding the disk bottleneck in the Data Domain deduplication file system. In Proc. 6th USENIX Conference on File and Storage Technologies, [7] Qian Wang, Cong Wang, Jin Li, Kui Ren, Wenjing Lou: Enabling Public Verifiability and Data Dynamics for Storage Security in Cloud Computing. ESORICS 2009:

7 International Conference on Cloud, ig Data and Trust 2013, Nov 13-15, RGPV [8] Ari Juels, urton S. Kaliski Jr.: Pors: proofs of retrievability for large files. ACM Conference on Computer and Communications Security 2007: [9] Ralph C. Merkle: A Certified Digital Signature. CRYPTO 1989: [10] Dave Russell: Data De-duplication Will e Even igger in 2010, Gartner, 8 February [11] Hovav Shacham, rent Waters: Compact Proofs of Retrievability. ASIACRYPT 2008: [12] A Study of Practical Deduplication Dutch T. Meyer * and William J. olosky * * Microsoft Research and The University of ritish Columbia {[email protected]. edu, [13] John K. Ousterhout and Mendel Rosenblum. The design and implementation of a log structured file system. ACM Transactions on Computer Systems, 10(1):26 52, February [14] Mark W. Storer, Kevin M. Greenan, Darrell D. E. Long, and Ethan L. Miller. Secure data deduplication. In Proceedings of the 2008 ACM Workshop on Storage Security and Survivability, October [15] Cristian Ungureanu, enjamin Atkin, Akshat Aranya, Salil Gokhale, Stephen Rago, Grzegorz Calkowski, Cezary Dubnicki, and Aniruddha ohra. HydraFS: a highthroughput file system for the HYDRAstor contentaddressable storage system. In Proceedings of the 8th USENIX Conference on File and Storage Technologies FAST), San Jose, CA, February [16] E. L. Miller, D. D. E. Long, W. E. Freeman, and. C. Reed. Strong security for network-attached storage. In Proceedings of the 2002 Conference on File and Storage Technologies (FAST), pages 1 13, Monterey, CA, Jan [17] A. Muthitacharoen,. Chen, and D. Mazières. A lowbandwidth network file system. In Proceedings of the 18th ACM Symposium on Operating Systems Principles (SOSP 01), pages , Oct [18] Atmos Multi-tenant, distributed cloud storage for unstructured content, [Online]. Available: atmos.htm. [19] Fujitsu s storage systems and related technologies supporting cloud computing, [Online]. Available: 113

A Survey on Deduplication Strategies and Storage Systems

A Survey on Deduplication Strategies and Storage Systems A Survey on Deduplication Strategies and Storage Systems Guljar Shaikh ((Information Technology,B.V.C.O.E.P/ B.V.C.O.E.P, INDIA) Abstract : Now a day there is raising demands for systems which provide

More information

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services

MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,

More information

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE

IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,

More information

DEXT3: Block Level Inline Deduplication for EXT3 File System

DEXT3: Block Level Inline Deduplication for EXT3 File System DEXT3: Block Level Inline Deduplication for EXT3 File System Amar More M.A.E. Alandi, Pune, India [email protected] Zishan Shaikh M.A.E. Alandi, Pune, India [email protected] Vishal Salve

More information

Online De-duplication in a Log-Structured File System for Primary Storage

Online De-duplication in a Log-Structured File System for Primary Storage Online De-duplication in a Log-Structured File System for Primary Storage Technical Report UCSC-SSRC-11-03 May 2011 Stephanie N. Jones [email protected] Storage Systems Research Center Baskin School

More information

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP

INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP INTENSIVE FIXED CHUNKING (IFC) DE-DUPLICATION FOR SPACE OPTIMIZATION IN PRIVATE CLOUD STORAGE BACKUP 1 M.SHYAMALA DEVI, 2 V.VIMAL KHANNA, 3 M.SHAHEEN SHAH 1 Assistant Professor, Department of CSE, R.M.D.

More information

Data Backup and Archiving with Enterprise Storage Systems

Data Backup and Archiving with Enterprise Storage Systems Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia [email protected],

More information

A Novel Deduplication Avoiding Chunk Index in RAM

A Novel Deduplication Avoiding Chunk Index in RAM A Novel Deduplication Avoiding Chunk Index in RAM 1 Zhike Zhang, 2 Zejun Jiang, 3 Xiaobin Cai, 4 Chengzhang Peng 1, First Author Northwestern Polytehnical University, 127 Youyixilu, Xi an, Shaanxi, P.R.

More information

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique

A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of

More information

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS

IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department

More information

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose

A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements

More information

Cumulus: filesystem backup to the Cloud

Cumulus: filesystem backup to the Cloud Michael Vrable, Stefan Savage, a n d G e o f f r e y M. V o e l k e r Cumulus: filesystem backup to the Cloud Michael Vrable is pursuing a Ph.D. in computer science at the University of California, San

More information

Multi-level Metadata Management Scheme for Cloud Storage System

Multi-level Metadata Management Scheme for Cloud Storage System , pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1

More information

A Deduplication-based Data Archiving System

A Deduplication-based Data Archiving System 2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System

More information

An Efficient Deduplication File System for Virtual Machine in Cloud

An Efficient Deduplication File System for Virtual Machine in Cloud An Efficient Deduplication File System for Virtual Machine in Cloud Bhuvaneshwari D M.E. computer science and engineering IndraGanesan college of Engineering,Trichy. Abstract Virtualization is widely deployed

More information

DATA SECURITY IN CLOUD USING ADVANCED SECURE DE-DUPLICATION

DATA SECURITY IN CLOUD USING ADVANCED SECURE DE-DUPLICATION DATA SECURITY IN CLOUD USING ADVANCED SECURE DE-DUPLICATION Hasna.R 1, S.Sangeetha 2 1 PG Scholar, Dhanalakshmi Srinivasan College of Engineering, Coimbatore. 2 Assistant Professor, Dhanalakshmi Srinivasan

More information

Verifying Correctness of Trusted data in Clouds

Verifying Correctness of Trusted data in Clouds Volume-3, Issue-6, December-2013, ISSN No.: 2250-0758 International Journal of Engineering and Management Research Available at: www.ijemr.net Page Number: 21-25 Verifying Correctness of Trusted data in

More information

A Deduplication File System & Course Review

A Deduplication File System & Course Review A Deduplication File System & Course Review Kai Li 12/13/12 Topics A Deduplication File System Review 12/13/12 2 Traditional Data Center Storage Hierarchy Clients Network Server SAN Storage Remote mirror

More information

Deploying De-Duplication on Ext4 File System

Deploying De-Duplication on Ext4 File System Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College

More information

A Survey on Data Deduplication in Cloud Storage Environment

A Survey on Data Deduplication in Cloud Storage Environment 385 A Survey on Data Deduplication in Cloud Storage Environment Manikantan U.V. 1, Prof.Mahesh G. 2 1 (Department of Information Science and Engineering, Acharya Institute of Technology, Bangalore) 2 (Department

More information

Inline Deduplication

Inline Deduplication Inline Deduplication [email protected] 1.1 Inline Vs Post-process Deduplication In target based deduplication, the deduplication engine can either process data for duplicates in real time (i.e.

More information

Theoretical Aspects of Storage Systems Autumn 2009

Theoretical Aspects of Storage Systems Autumn 2009 Theoretical Aspects of Storage Systems Autumn 2009 Chapter 3: Data Deduplication André Brinkmann News Outline Data Deduplication Compare-by-hash strategies Delta-encoding based strategies Measurements

More information

Byte-index Chunking Algorithm for Data Deduplication System

Byte-index Chunking Algorithm for Data Deduplication System , pp.415-424 http://dx.doi.org/10.14257/ijsia.2013.7.5.38 Byte-index Chunking Algorithm for Data Deduplication System Ider Lkhagvasuren 1, Jung Min So 1, Jeong Gun Lee 1, Chuck Yoo 2 and Young Woong Ko

More information

A Middleware Strategy to Survive Compute Peak Loads in Cloud

A Middleware Strategy to Survive Compute Peak Loads in Cloud A Middleware Strategy to Survive Compute Peak Loads in Cloud Sasko Ristov Ss. Cyril and Methodius University Faculty of Information Sciences and Computer Engineering Skopje, Macedonia Email: [email protected]

More information

N TH THIRD PARTY AUDITING FOR DATA INTEGRITY IN CLOUD. R.K.Ramesh 1, P.Vinoth Kumar 2 and R.Jegadeesan 3 ABSTRACT

N TH THIRD PARTY AUDITING FOR DATA INTEGRITY IN CLOUD. R.K.Ramesh 1, P.Vinoth Kumar 2 and R.Jegadeesan 3 ABSTRACT N TH THIRD PARTY AUDITING FOR DATA INTEGRITY IN CLOUD R.K.Ramesh 1, P.Vinoth Kumar 2 and R.Jegadeesan 3 1 M.Tech Student, Department of Computer Science and Engineering, S.R.M. University Chennai 2 Asst.Professor,

More information

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage

Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Low-Cost Data Deduplication for Virtual Machine Backup in Cloud Storage Wei Zhang, Tao Yang, Gautham Narayanasamy, and Hong Tang University of California at Santa Barbara, Alibaba Inc. Abstract In a virtualized

More information

ISSN 2278-3091. Index Terms Cloud computing, outsourcing data, cloud storage security, public auditability

ISSN 2278-3091. Index Terms Cloud computing, outsourcing data, cloud storage security, public auditability Outsourcing and Discovering Storage Inconsistencies in Cloud Through TPA Sumathi Karanam 1, GL Varaprasad 2 Student, Department of CSE, QIS College of Engineering and Technology, Ongole, AndhraPradesh,India

More information

Data Deduplication and Tivoli Storage Manager

Data Deduplication and Tivoli Storage Manager Data Deduplication and Tivoli Storage Manager Dave annon Tivoli Storage Manager rchitect March 2009 Topics Tivoli Storage, IM Software Group Deduplication technology Data reduction and deduplication in

More information

SECURE CLOUD STORAGE PRIVACY-PRESERVING PUBLIC AUDITING FOR DATA STORAGE SECURITY IN CLOUD

SECURE CLOUD STORAGE PRIVACY-PRESERVING PUBLIC AUDITING FOR DATA STORAGE SECURITY IN CLOUD Volume 1, Issue 7, PP:, JAN JUL 2015. SECURE CLOUD STORAGE PRIVACY-PRESERVING PUBLIC AUDITING FOR DATA STORAGE SECURITY IN CLOUD B ANNAPURNA 1*, G RAVI 2*, 1. II-M.Tech Student, MRCET 2. Assoc. Prof, Dept.

More information

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB

BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB BENCHMARKING CLOUD DATABASES CASE STUDY on HBASE, HADOOP and CASSANDRA USING YCSB Planet Size Data!? Gartner s 10 key IT trends for 2012 unstructured data will grow some 80% over the course of the next

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Paul Krzyzanowski Rutgers University October 28, 2012 1 Introduction The classic network file systems we examined, NFS, CIFS, AFS, Coda, were designed as client-server applications.

More information

SHARED DATA & INDENTITY PRIVACY PRESERVING IN CLOUD AND PUBLIC AUDITING

SHARED DATA & INDENTITY PRIVACY PRESERVING IN CLOUD AND PUBLIC AUDITING SHARED DATA & INDENTITY PRIVACY PRESERVING IN CLOUD AND PUBLIC AUDITING Er. Kavin M 1, Mr.J.Jayavel 2 1 PG Scholar, 2 Teaching Assistant, Department of Information Technology, Anna University Regional

More information

Data Deduplication: An Essential Component of your Data Protection Strategy

Data Deduplication: An Essential Component of your Data Protection Strategy WHITE PAPER: THE EVOLUTION OF DATA DEDUPLICATION Data Deduplication: An Essential Component of your Data Protection Strategy JULY 2010 Andy Brewerton CA TECHNOLOGIES RECOVERY MANAGEMENT AND DATA MODELLING

More information

Quanqing XU [email protected]. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud

Quanqing XU Quanqing.Xu@nicta.com.au. YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Quanqing XU [email protected] YuruBackup: A Highly Scalable and Space-Efficient Incremental Backup System in the Cloud Outline Motivation YuruBackup s Architecture Backup Client File Scan, Data

More information

Trends in Enterprise Backup Deduplication

Trends in Enterprise Backup Deduplication Trends in Enterprise Backup Deduplication Shankar Balasubramanian Architect, EMC 1 Outline Protection Storage Deduplication Basics CPU-centric Deduplication: SISL (Stream-Informed Segment Layout) Data

More information

A block based storage model for remote online backups in a trust no one environment

A block based storage model for remote online backups in a trust no one environment A block based storage model for remote online backups in a trust no one environment http://www.duplicati.com/ Kenneth Skovhede (author, [email protected]) René Stach (editor, [email protected]) Abstract

More information

A Secure & Efficient Data Integrity Model to establish trust in cloud computing using TPA

A Secure & Efficient Data Integrity Model to establish trust in cloud computing using TPA A Secure & Efficient Data Integrity Model to establish trust in cloud computing using TPA Mr.Mahesh S.Giri Department of Computer Science & Engineering Technocrats Institute of Technology Bhopal, India

More information

TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE

TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE TECHNICAL WHITE PAPER: ELASTIC CLOUD STORAGE SOFTWARE ARCHITECTURE Deploy a modern hyperscale storage platform on commodity infrastructure ABSTRACT This document provides a detailed overview of the EMC

More information

Speeding Up Cloud/Server Applications Using Flash Memory

Speeding Up Cloud/Server Applications Using Flash Memory Speeding Up Cloud/Server Applications Using Flash Memory Sudipta Sengupta Microsoft Research, Redmond, WA, USA Contains work that is joint with B. Debnath (Univ. of Minnesota) and J. Li (Microsoft Research,

More information

15-2394-3696 RIGOROUS PUBLIC AUDITING SUPPORT ON SHARED DATA STORED IN THE CLOUD BY PRIVACY-PRESERVING MECHANISM

15-2394-3696 RIGOROUS PUBLIC AUDITING SUPPORT ON SHARED DATA STORED IN THE CLOUD BY PRIVACY-PRESERVING MECHANISM RIGOROUS PUBLIC AUDITING SUPPORT ON SHARED DATA STORED IN THE CLOUD BY PRIVACY-PRESERVING MECHANISM Dhanashri Bamane Vinayak Pottigar Subhash Pingale Department of Computer Science and Engineering SKN

More information

Data Storage Security in Cloud Computing for Ensuring Effective and Flexible Distributed System

Data Storage Security in Cloud Computing for Ensuring Effective and Flexible Distributed System Data Storage Security in Cloud Computing for Ensuring Effective and Flexible Distributed System 1 K.Valli Madhavi A.P [email protected] Mobile: 9866034900 2 R.Tamilkodi A.P [email protected] Mobile:

More information

[email protected]

sulbhaghadling@gmail.com www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 3 March 2015, Page No. 10715-10720 Data DeDuplication Using Optimized Fingerprint Lookup Method for

More information

Keywords Cloud Storage, Error Identification, Partitioning, Cloud Storage Integrity Checking, Digital Signature Extraction, Encryption, Decryption

Keywords Cloud Storage, Error Identification, Partitioning, Cloud Storage Integrity Checking, Digital Signature Extraction, Encryption, Decryption Partitioning Data and Domain Integrity Checking for Storage - Improving Cloud Storage Security Using Data Partitioning Technique Santosh Jogade *, Ravi Sharma, Prof. Rajani Kadam Department Of Computer

More information

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation

Top Ten Questions. to Ask Your Primary Storage Provider About Their Data Efficiency. May 2014. Copyright 2014 Permabit Technology Corporation Top Ten Questions to Ask Your Primary Storage Provider About Their Data Efficiency May 2014 Copyright 2014 Permabit Technology Corporation Introduction The value of data efficiency technologies, namely

More information

Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup

Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup Enhanced Intensive Indexing (I2D) De-Duplication for Space Optimization in Private Cloud Storage Backup M. Shyamala Devi and Steven S. Fernandez Abstract Cloud Storage provide users with abundant storage

More information

3Gen Data Deduplication Technical

3Gen Data Deduplication Technical 3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and

More information

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup

Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup International Journal of Machine Learning and Computing, Vol. 4, No. 4, August 2014 Enhanced Dynamic Whole File De-Duplication (DWFD) for Space Optimization in Private Cloud Storage Backup M. Shyamala

More information

FAST 11. Yongseok Oh <[email protected]> University of Seoul. Mobile Embedded System Laboratory

FAST 11. Yongseok Oh <ysoh@uos.ac.kr> University of Seoul. Mobile Embedded System Laboratory CAFTL: A Content-Aware Flash Translation Layer Enhancing the Lifespan of flash Memory based Solid State Drives FAST 11 Yongseok Oh University of Seoul Mobile Embedded System Laboratory

More information

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms

Distributed File System. MCSN N. Tonellotto Complements of Distributed Enabling Platforms Distributed File System 1 How do we get data to the workers? NAS Compute Nodes SAN 2 Distributed File System Don t move data to workers move workers to the data! Store data on the local disks of nodes

More information

Enterprise Backup and Restore technology and solutions

Enterprise Backup and Restore technology and solutions Enterprise Backup and Restore technology and solutions LESSON VII Veselin Petrunov Backup and Restore team / Deep Technical Support HP Bulgaria Global Delivery Hub Global Operations Center November, 2013

More information

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside

STORAGE. Buying Guide: TARGET DATA DEDUPLICATION BACKUP SYSTEMS. inside Managing the information that drives the enterprise STORAGE Buying Guide: DEDUPLICATION inside What you need to know about target data deduplication Special factors to consider One key difference among

More information

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage

A Brief Analysis on Architecture and Reliability of Cloud Based Data Storage Volume 2, No.4, July August 2013 International Journal of Information Systems and Computer Sciences ISSN 2319 7595 Tejaswini S L Jayanthy et al., Available International Online Journal at http://warse.org/pdfs/ijiscs03242013.pdf

More information

The assignment of chunk size according to the target data characteristics in deduplication backup system

The assignment of chunk size according to the target data characteristics in deduplication backup system The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,

More information

Secure Hybrid Cloud Architecture for cloud computing

Secure Hybrid Cloud Architecture for cloud computing Secure Hybrid Cloud Architecture for cloud computing Amaresh K Sagar Student, Dept of Computer science and Eng LAEC Bidar Email Id: [email protected] Sumangala Patil Associate prof and HOD Dept of

More information

Deduplication Demystified: How to determine the right approach for your business

Deduplication Demystified: How to determine the right approach for your business Deduplication Demystified: How to determine the right approach for your business Presented by Charles Keiper Senior Product Manager, Data Protection Quest Software Session Objective: To answer burning

More information

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON

UNDERSTANDING DATA DEDUPLICATION. Thomas Rivera SEPATON UNDERSTANDING DATA DEDUPLICATION Thomas Rivera SEPATON SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material

More information

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) PERCEIVING AND RECOVERING DEGRADED DATA ON SECURE CLOUD

INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) PERCEIVING AND RECOVERING DEGRADED DATA ON SECURE CLOUD INTERNATIONAL JOURNAL OF COMPUTER ENGINEERING & TECHNOLOGY (IJCET) International Journal of Computer Engineering and Technology (IJCET), ISSN 0976- ISSN 0976 6367(Print) ISSN 0976 6375(Online) Volume 4,

More information

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups

RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups RevDedup: A Reverse Deduplication Storage System Optimized for Reads to Latest Backups Chun-Ho Ng and Patrick P. C. Lee Department of Computer Science and Engineering The Chinese University of Hong Kong,

More information

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System

Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Avoiding the Disk Bottleneck in the Data Domain Deduplication File System Benjamin Zhu Data Domain, Inc. Kai Li Data Domain, Inc. and Princeton University Hugo Patterson Data Domain, Inc. Abstract Disk-based

More information

Data Deduplication Scheme for Cloud Storage

Data Deduplication Scheme for Cloud Storage 26 Data Deduplication Scheme for Cloud Storage 1 Iuon-Chang Lin and 2 Po-Ching Chien Abstract Nowadays, the utilization of storage capacity becomes an important issue in cloud storage. In this paper, we

More information

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control

Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control Developing Scalable Smart Grid Infrastructure to Enable Secure Transmission System Control EP/K006487/1 UK PI: Prof Gareth Taylor (BU) China PI: Prof Yong-Hua Song (THU) Consortium UK Members: Brunel University

More information

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk.

Index Terms : Load rebalance, distributed file systems, clouds, movement cost, load imbalance, chunk. Load Rebalancing for Distributed File Systems in Clouds. Smita Salunkhe, S. S. Sannakki Department of Computer Science and Engineering KLS Gogte Institute of Technology, Belgaum, Karnataka, India Affiliated

More information

Distributed File Systems

Distributed File Systems Distributed File Systems Mauro Fruet University of Trento - Italy 2011/12/19 Mauro Fruet (UniTN) Distributed File Systems 2011/12/19 1 / 39 Outline 1 Distributed File Systems 2 The Google File System (GFS)

More information

EMPOWER DATA PROTECTION AND DATA STORAGE IN CLOUD COMPUTING USING SECURE HASH ALGORITHM (SHA1)

EMPOWER DATA PROTECTION AND DATA STORAGE IN CLOUD COMPUTING USING SECURE HASH ALGORITHM (SHA1) EMPOWER DATA PROTECTION AND DATA STORAGE IN CLOUD COMPUTING USING SECURE HASH ALGORITHM (SHA1) A.William Walls Research Scholar Department of Computer Science SSM College of Arts and Science Komarapalayam,

More information

An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space

An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space Jannu.Prasanna Krishna M.Tech Student, Department of CSE,

More information

Near Sheltered and Loyal storage Space Navigating in Cloud

Near Sheltered and Loyal storage Space Navigating in Cloud IOSR Journal of Engineering (IOSRJEN) e-issn: 2250-3021, p-issn: 2278-8719 Vol. 3, Issue 8 (August. 2013), V2 PP 01-05 Near Sheltered and Loyal storage Space Navigating in Cloud N.Venkata Krishna, M.Venkata

More information

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s.

UNDERSTANDING DATA DEDUPLICATION. Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s. UNDERSTANDING DATA DEDUPLICATION Jiří Král, ředitel pro technický rozvoj STORYFLEX a.s. SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual

More information

UniFS A True Global File System

UniFS A True Global File System UniFS A True Global File System Introduction The traditional means to protect file data by making copies, combined with the need to provide access to shared data from multiple locations, has created an

More information

Proof of Retrivability: A Third Party Auditor Using Cloud Computing

Proof of Retrivability: A Third Party Auditor Using Cloud Computing Proof of Retrivability: A Third Party Auditor Using Cloud Computing Vijayaraghavan U 1, Madonna Arieth R 2, Geethanjali K 3 1,2 Asst. Professor, Dept of CSE, RVS College of Engineering& Technology, Pondicherry

More information

EMC DATA DOMAIN OPERATING SYSTEM

EMC DATA DOMAIN OPERATING SYSTEM EMC DATA DOMAIN OPERATING SYSTEM Powering EMC Protection Storage ESSENTIALS High-Speed, Scalable Deduplication Up to 58.7 TB/hr performance Reduces requirements for backup storage by 10 to 30x and archive

More information

Hypertable Architecture Overview

Hypertable Architecture Overview WHITE PAPER - MARCH 2012 Hypertable Architecture Overview Hypertable is an open source, scalable NoSQL database modeled after Bigtable, Google s proprietary scalable database. It is written in C++ for

More information

DATA BACKUP & RESTORE

DATA BACKUP & RESTORE DATA BACKUP & RESTORE Verizon Terremark s Data Backup & Restore provides secure, streamlined online-offsite data storage and retrieval that is highly scalable and easily customizable. Offsite backup is

More information

WINDOWS AZURE DATA MANAGEMENT

WINDOWS AZURE DATA MANAGEMENT David Chappell October 2012 WINDOWS AZURE DATA MANAGEMENT CHOOSING THE RIGHT TECHNOLOGY Sponsored by Microsoft Corporation Copyright 2012 Chappell & Associates Contents Windows Azure Data Management: A

More information

Snapshots in Hadoop Distributed File System

Snapshots in Hadoop Distributed File System Snapshots in Hadoop Distributed File System Sameer Agarwal UC Berkeley Dhruba Borthakur Facebook Inc. Ion Stoica UC Berkeley Abstract The ability to take snapshots is an essential functionality of any

More information

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard

UNDERSTANDING DATA DEDUPLICATION. Tom Sas Hewlett-Packard UNDERSTANDING DATA DEDUPLICATION Tom Sas Hewlett-Packard SNIA Legal Notice The material contained in this tutorial is copyrighted by the SNIA. Member companies and individual members may use this material

More information

StorReduce Technical White Paper Cloud-based Data Deduplication

StorReduce Technical White Paper Cloud-based Data Deduplication StorReduce Technical White Paper Cloud-based Data Deduplication See also at storreduce.com/docs StorReduce Quick Start Guide StorReduce FAQ StorReduce Solution Brief, and StorReduce Blog at storreduce.com/blog

More information

Deduplication and Beyond: Optimizing Performance for Backup and Recovery

Deduplication and Beyond: Optimizing Performance for Backup and Recovery Beyond: Optimizing Gartner clients using deduplication for backups typically report seven times to 25 times the reductions (7:1 to 25:1) in the size of their data, and sometimes higher than 100:1 for file

More information

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344

Where We Are. References. Cloud Computing. Levels of Service. Cloud Computing History. Introduction to Data Management CSE 344 Where We Are Introduction to Data Management CSE 344 Lecture 25: DBMS-as-a-service and NoSQL We learned quite a bit about data management see course calendar Three topics left: DBMS-as-a-service and NoSQL

More information

A Survey on Secure Auditing and Deduplicating Data in Cloud

A Survey on Secure Auditing and Deduplicating Data in Cloud A Survey on Secure Auditing and Deduplicating Data in Cloud Tejaswini Jaybhaye 1 ; D. H. Kulkarni 2 PG Student, Dept. of Computer Engineering, SKNCOE, Pune, India 1 Assistant Professor, Dept. of Computer

More information

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011

Availability Digest. www.availabilitydigest.com. Data Deduplication February 2011 the Availability Digest Data Deduplication February 2011 What is Data Deduplication? Data deduplication is a technology that can reduce disk storage-capacity requirements and replication bandwidth requirements

More information

[Sudhagar*, 5(5): May, 2016] ISSN: 2277-9655 Impact Factor: 3.785

[Sudhagar*, 5(5): May, 2016] ISSN: 2277-9655 Impact Factor: 3.785 IJESRT INTERNATIONAL JOURNAL OF ENGINEERING SCIENCES & RESEARCH TECHNOLOGY AVOID DATA MINING BASED ATTACKS IN RAIN-CLOUD D.Sudhagar * * Assistant Professor, Department of Information Technology, Jerusalem

More information

A Secure Strategy using Weighted Active Monitoring Load Balancing Algorithm for Maintaining Privacy in Multi-Cloud Environments

A Secure Strategy using Weighted Active Monitoring Load Balancing Algorithm for Maintaining Privacy in Multi-Cloud Environments IJSTE - International Journal of Science Technology & Engineering Volume 1 Issue 10 April 2015 ISSN (online): 2349-784X A Secure Strategy using Weighted Active Monitoring Load Balancing Algorithm for Maintaining

More information

Turnkey Deduplication Solution for the Enterprise

Turnkey Deduplication Solution for the Enterprise Symantec NetBackup 5000 Appliance Turnkey Deduplication Solution for the Enterprise Mayur Dewaikar Sr. Product Manager, Information Management Group White Paper: A Deduplication Appliance Solution for

More information

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM

A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information

More information

How To Encrypt Data With A Power Of N On A K Disk

How To Encrypt Data With A Power Of N On A K Disk Towards High Security and Fault Tolerant Dispersed Storage System with Optimized Information Dispersal Algorithm I Hrishikesh Lahkar, II Manjunath C R I,II Jain University, School of Engineering and Technology,

More information

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM

PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM PERFORMANCE ANALYSIS OF PaaS CLOUD COMPUTING SYSTEM Akmal Basha 1 Krishna Sagar 2 1 PG Student,Department of Computer Science and Engineering, Madanapalle Institute of Technology & Science, India. 2 Associate

More information

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2

Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 Load Rebalancing for File System in Public Cloud Roopa R.L 1, Jyothi Patil 2 1 PDA College of Engineering, Gulbarga, Karnataka, India [email protected] 2 PDA College of Engineering, Gulbarga, Karnataka,

More information

Identifying Data Integrity in the Cloud Storage

Identifying Data Integrity in the Cloud Storage www.ijcsi.org 403 Identifying Data Integrity in the Cloud Storage Saranya Eswaran 1 and Dr.Sunitha Abburu 2 1 Adhiyamaan College of Engineering, Department of Computer Application, Hosur. 2 Professor and

More information

Metadata Feedback and Utilization for Data Deduplication Across WAN

Metadata Feedback and Utilization for Data Deduplication Across WAN Zhou B, Wen JT. Metadata feedback and utilization for data deduplication across WAN. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 31(3): 604 623 May 2016. DOI 10.1007/s11390-016-1650-6 Metadata Feedback

More information