SURVEY ON DISTRIBUTED DEDUPLICATION SYSTEM WITH AUDITING AND IMPROVED RELIABILITY IN CLOUD Rekha R 1, ChandanRaj BR 2 1 MTech, 4th Sem, Dept. Of Computer Science and Engineering EWIT, Bengaluru-91 2 Assistant Professor, Dept. Of Computer Science and Engineering EWIT, Bengaluru-91 ABSTRACT: Data deduplication is a strategy for removing duplicate copies of data, and has been widely utilized in cloud storage to decrease storage space and transfer bandwidth. On the other hand, there is only single copy for each file stored in cloud and it is owned by a huge number of users. Thus, deduplication system enhances storage utilization while reducing reliability. In addition, the concern of privacy for user-sensitive data also exist when they are outsourced to cloud. Planning to address the above security test, this paper builds the first effort to establish the idea of distributed reliable deduplication system. This paper recommends a new distributed deduplication system with increased dependability in which the data chunks are distributed across multiple cloud servers. The safety needs of data privacy and tag stability are also accomplished by introducing a deterministic secret sharing scheme in distributed storage systems, instead of using convergent encryption as in previous deduplication systems. Auditing mechanism is implemented in order to track user activities on cloud. Keywords: Deduplication, secret sharing, distributed storage system, reliability, data integrity, auditing [1] INTRODUCTION By the unpredictable advancement of digital information, deduplication techniques areextensively engaged to backup data and decrease network and storage overhead by identifying and getting rid of redundancy among data. As an alternative of storing different data copies with the same content, deduplication removes redundant data by storing only single copy and referring other redundant data to that copy. The mechanism of deduplication is shown in 41
Figure: 1. Deduplication mechanism Deduplication has attracted much focus from both scholarly world and industry since it can truly recover storage utilization and retain storage space, particularly for the applications with high deduplication ratio, for example archival storage systems. A number of deduplication systems have been projected based on various deduplication scheme such as client side or server-side deduplication, file-level or block-level deduplication. Specially, with the emergence of cloud storage, data deduplication strategy develops to be more fundamental for the administration of steadily expanding amount of data in cloud storage services which inspires to outsource data storage to third-party cloud providers. If we consider a portion of the samples as verifications: Today s cloud storage services, such as, Google Drive, Drop box have been relating deduplication to save the network bandwidth and the storage cost with client-side deduplication. Regardless of the fact that deduplication method can amass the storage space, it decreases the consistency (uniformity) of the system. Data consistency is truly an extremely indispensable because there is only one copy for each file aggregated in the server pooled by all the data owners. If the estimation of a file were ascertained in terms of the amount of file data that would be lost in case of missing a single chunk, then the quantity of user data lost when a file in the storage system is ruined grows with the number of the unity of the chunk. Thus, how to guarantee high data consistency in deduplication system is a crucial problem. [2] LITERATURE SURVEY The conventional encryption mechanisms, including public key encryption and symmetric key encryption, require different users to encrypt their data with their own keys. This is because when encryption is applied over the data, deduplication is impossible. Hence, commercial storage service providers are hesitant to encrypt data. As a result, identical data copies of different users will lead to distinct ciphertext. To address the problems of confidentiality and deduplication, the notion of convergent encryption has been proposed and widely adopted to enforce data confidentiality while realizing deduplication. However, these systems achieved confidentiality of outsourced data at the cost of decreased error resilience. Therefore, supporting both confidentiality and reliability while achieving deduplication in a cloud storage system is still a challenge. Section 2.1 explains 42
Convergent Encryption, section 2.2 discusses Secured Auditing and deduplication in cloud, section 2.3 is Distributed Deduplication with enhanced reliability. [2.1] CONVERGENT ENCRYPTION Convergent encryption [2] guarantees data privacy in deduplication. Bellare etal. [6] formalized this primitive as message-locked encryption, and investigated its application in space productive secure outsourced storage. Figure: 2. Architecture of Convergent Encryption based Authorized Deduplication Data security in deduplication is provided by Convergent Encryption. From each and every unique data copy, users get a convergent key and they encrypt the unique data copy with the convergent key. The user determines a tag for each unique data copy, which will be utilized to recognize duplicate copies. Architecture diagram of Convergent Encryption scheme is shown in [Figure-2]. In order to discover the duplicate copies, the user first sends the tag to the server to confirm if the duplicate copy already exists. The convergent key and tags are individually evaluated, and tags cannot understand the convergent key to break the data security. The server stores the encrypted data copy and the respective tag. The convergent encryption system can be defined by four basic functions: KeyGence (M) K -key generation algorithm which maps an information data copy M to convergent key K. Encce (K, M) C -symmetric encryption algorithm that receives the input of both data copy M and convergent key K, then gives output cipher text C. Decce(K,C) M decrypting algorithm which receives the input of the convergent key K and cipher text C, then gives the output of the original data copy M. TagGen (M) T(M) tags generating algorithm which maps original data copy M and gives output tag T(M). [2.2] SECURE AUDITING AND DEDUPLICATING DATA IN CLOUD Despite the fact that distributed storage framework has been broadly received, it fails to oblige some important emerging needs such as the capacities of auditing integrity of cloud files by cloud clients and recognizing duplicated files by cloud servers. The first problem is integrity 43
auditing. The cloud server is able to relieve clients from the heavy burden of storage management and maintenance. Data is not under control of the clients by any means, which definitely raises clients incredible worries on the integrity of their data. The second problem is secure deduplication. The quick adoption of cloud services is accompanied by expanding volumes of data stored at remote cloud servers DISTRIBUTED DEDUPLICATION SYSTEM WITH AUDITING AND IMPROVED RELIABILITY Figure: 3. SecureCloud Architecture Aiming at accomplishing data integrity and deduplication in cloud, two secure systems in particular SecCloud and SecCloud+ are proposed [7]. As depicted in Architecture diagram in [Figure-3], SecCloud presents an auditing entity with a support of a MapReduce cloud, which offers clients some assistance for generating data tags before transferring and in addition audit the integrity of data having been stored in cloud. This design fixes the issue of previous work that the computational load at user or auditor is excessively big for tag generation. For completeness of fine-grained, the functionality of auditing designed in SecCloud is supported on both block level and sector level. Furthermore, SecCloud also empowers secure deduplication. Notice that the security that is part of SecCloud is the prevention of spillage of side channel information. In order to prevent the spillage of such side channel information, we follow the convention of and design a proof of ownership protocol between clients and cloud servers, which permits clients to demonstrate to cloud servers that they precisely own the target data. Motivated by the way that customers always want to encrypt their data before transferring, for reasons extending from individual privacy to corporate policy, we introduce a key server into SecCloud as with and propose the SecCloud+ schema. Besides supporting integrity auditing and secure deduplication, SecCloud+ empowers the assurance of file confidentiality. [2.3] DISTRIBUTED DEDUPLICATION SYSTEM WITH IMPROVED RELIABILITY 44
The design of secure deduplication systems with higher reliability in cloud computing is shown in [8]. The distributed cloud storage servers are introduced into deduplication systems to provide better fault tolerance. To further protect data confidentiality, the secret sharing technique is utilized, which is also compatible with the distributed storage systems. In more details, a file is first split and encoded into fragments by using the technique of secret sharing, instead of encryption mechanisms. These shares will be distributed across multiple independent storage servers. Furthermore, to support deduplication, a short cryptographic hash value of the content will also be computed and sent to each storage server as the fingerprint of the fragment stored at each server. Only the data owner who first transfers the data is required to compute and distribute such secret shares, while all following users who own the same data copy do not need to compute and store these shares any more. To recover data copies, users must access a minimum number of storage servers through authentication and obtain the secret shares to recreate the data. In other words, the secret shares of data will only be accessible by the authorized users who own the corresponding data copy. Another recognizable highlight of this proposal is that data integrity, including tag consistency, can be accomplished. The conventional deduplication methods cannot be straightforwardly extended and applied in distributed and multi-server systems. In other words, any of the servers can acquire shares of the data stored at the other servers with the same short value as proof of ownership. Moreover, the tag consistency to prevent the duplicate/ciphertext replacement attack, is considered in this protocol. In more details, it prevents a user from transferring a maliciously-generated ciphertext such that its tag is the same with another honestly-generated ciphertext. To accomplish this, a deterministic secret sharing method has been formalized and utilized. To our knowledge, no existing work on secure deduplication can properly address the reliability and tag consistency issue in distributed storage systems System Architecture diagram is shown in [Figure-4]. Figure: 4. Architecture Diagram of Distributed Deduplication System 45
File-level and Block-level Distributed Deduplication System: To keep up efficient duplicate check, tags for each file/block will be computed and are directed to S-CSPs. File Upload- To accomplish the deduplication, the user relates with S-CSPs and uploads a file F. The user firstly computes and transfers the file tag ϕf = TagGen(F) to S-CSPs for the file duplicate check. The user will be provided a pointer for the shard stored at server. If no duplicate is found, he runs the secret sharing algorithm SS. TagGen is the tag generation algorithm that considers the original data copy C and outputs a tag T (C). This tag will be produced by the user and used to achieve the duplicate check with the server. Alternative tag generation algorithm TagGen precedes as input a file C and an index j and outputs a tag. This tag, generated by users, is used for the proof of ownership. File Download- In order to download a file F, the user first downloads the secret shares {cj} of the file from k out of n storage servers. Precisely, the user sends the pointer of F to k out o f f ( x) = a 2 0 + a1x + a2x +... + a k x k 1 1 n S-CSPs. The user reconstructs file F by using the algorithm of Recover({cj}) after meeting enough shares. This technique provides fault tolerance and lets the user to remain accessible even if any limited subsets of storage servers fail. Same applies for block level Lagranges Formula is made used- Ramp Secret Sharing Scheme- two algorithms in a secret sharing scheme are Share and Recover. The secret is separated and shared by using Share. With enough shares, the secret can be pulled out and regenerated with the algorithm of Recover. Here, the Ramp secret sharing scheme (RSSS) is assumed to secretly split a secret into shards. Auditing module- this component is used for tracking user activities in Cloud Service Provider. If there are any additions/modifications/deletions done for the data along with user details and the timing are recorded. Data owner can later view the auditing report [3] CONCLUSION The deduplication systems discussed here increase the consistency of data. Distributed deduplication system with Ramp Sharing scheme is the best to improve the reliability of data while achieving the confidentiality of the users outsourced data without an encryption mechanism. The security of tag consistency and integrity were attained. REFERENCES [1] J. R. Douceur, A. Adya, W. J. Bolosky, D. Simon, and M. Theimer, Reclaiming space from duplicate files in a serverless distributed file system. in ICDCS, 2002, pp. 617 624. [2] M. Bellare, S. Keelveedhi, and T. Ristenpart, Dupless: Server aided encryption for deduplicated storage, in USENIX Security Symposium, 2013. 46
[3] [4] G. R. Blakley and C. Meadows, Security of ramp schemes, in Advances in Cryptology: Proceedings of CRYPTO 84, ser. Lecture Notes in Computer Science, G. R. Blakley and D. Chaum, Eds. Springer-Verlag Berlin/Heidelberg, 1985, vol. 196, pp. 242 268 [5] D. Santis and B. Masucci, Multiple ramp schemes, IEEE Transactions on Information Theory, vol. 45, no. 5, pp. 1720 1728, Jul. 1999. [6] J. Li, X. Chen, M. Li, J. Li, P. Lee, and W. Lou, Secure deduplication with efficient and reliable convergent key management, in IEEE Transactions on Parallel and Distributed Systems, 2014, pp. vol. 25(6), pp. 1615 1625 [7] M. Bellare, S. Keelveedhi, and T. Ristenpart, Message-locked encryption and secure deduplication, in Advances in Cryptology EUROCRYPT 2013, ser. Lecture Notes in Computer Science, T. Johansson and P. Nguyen, Eds. Springer Berlin Heidelberg, 2013, vol. 7881, pp.296 312. [8] Jingwei Li, Jin Li, Dongqing Xie and Zhang Cai, Secure Auditing and Deduplicating Data in Cloud in IEEE Transactions on Computers [9] Jin Li, Xiaofeng Chen, Xinyi Huang, Shaohua Tang and Yang Xiang, Mohammad Mehedi Hassan and Abdulhameed Alelaiwi, Secure Distributed Deduplication Systems with Improved Reliability in IEEE Transactions on Computers Volume: PP Year: 2015 Author[s] brief Introduction Rekha R is currently pursuing MTech (4 th Semester) in Computer Science and Engineering from East West Institute of Technology, Bengaluru-91 ChandanRaj BR is working as Assistant Professor, Department of Computer Science, East West Institute of Technology, Bengaluru-91. His areas of specialization are Mobile computing, 4G Network management, sensor networks and network security 47
Corresponding Address Rekha R #33, Nisarga, 2 nd cross, Maruthinagar, Madeshwaranagr 2 nd stage Bengaluru-560091 Mobile: 9900860466 ChandanRaj BR Assistant Professor, Dept. of Computer Science and Engineering East West Institute of Technology Off Magadi Road, Vishwaneedam Post Bengaluru-560091 Mobile: 9342954123 48