ISSN 2319-8885 Vol.04,Issue.19, June-2015, Pages:3633-3638 www.ijsetr.com Refining Efficiency of Cloud Storage Services using De-Duplication SARA BEGUM 1, SHAISTA NOUSHEEN 2, KHADERBI SHAIK 3 1 PG Scholar, Dept of CS, Shadan Women s College of Engineering and Technology, Hyderabad, India, E-mail:saraali5560@gmail.com. 2 Asst Prof, Dept of CSE, Shadan Women s College of Engineering and Technology, Hyderabad, India, E-mail: shaista09517@gmail.com. 3 HOD, Dept of CS, Shadan Women s College of Engineering and Technology, Hyderabad, TS, India, E-mail: khadarbi@gmail.com. Abstract: File storage in cloud storage is handled by third parties. Files can be integrated, so that the users are able to access the files using the centralized management. Due to the great number of users and devices in the cloud network, the managers cannot effectively manage the efficiency of storage node. Therefore, hardware is wasted and the complexity for managing the files also increases. In order to reduce workloads due to duplicate files, we propose the index name servers (INS). It helps to reduce file storage, data de-duplication, optimized node selection, and server load balancing, file compression, chunk matching, real-time feedback control, IP information, and busy level index monitoring. Performance is also increased. By using INS the files can also be reasonably distributed and workload can be decreased. So we identified research gap as large number of storage of duplicate files on cloud systems from different users. For this reason, to decrease the workload caused by duplicated files, this paper proposes a new data index technique, which integrates data de-duplication for storage optimization on cloud systems. Keywords: Cloud Storage, De-Duplication, Hash Code, Load Balancing. I. INTRODUCTION Cloud computing technique which is most widely used to days in that, computing is done over the large communication network like Internet. It is an important solution for business storage in low cost. Cloud computing provide vast storage in all sector like government, enterprise, also for storing our personal data on cloud. Without background implementation details, platform user can access and share different resources on cloud. The most important problem in cloud computing is that large amount of storage space and security issues. One critical challenge of cloud storage to management of everincreasing volume of data to improve scalability, storage problem data deduplication is most important technique and has attracted more at tension recently. It is an important technique for data compression, it simply avoid the duplicate copies of data and store single copy of data. Data deduplication take place in either block level or file level. In file level approach of network resources. II. LITERATURE SURVEY Title: Cloud Storage Architecture Author: Gurudatt Kulkarni1, Rani Waghmare, Rajnikant Palwe, Year : 2012 Description Designing storage architectures for emerging data-intensive applications presents several challenges and opportunities. Tackling these problems requires a combination of architectural optimizations to the storage devices and layers of the memory/storage hierarchy as well as hardware/software techniques to manage the flow of data between the cores and storage. As we move deeper into an era in which data is a first-class citizen in architecture design, optimizing the storage architecture will become more important. Cloud Storage Architecture is major topic in now a day because the data usage and the storage capacity are increased double year by year. So that some of the major companies are mainly concentrated on demand storage option like cloud storage. The existing cloud storage providers are mainly concentrated on performance, cost issues and multiple storage options. Title: Cloud Storage Performance Enhancement by Realtime Feedback Control and De-duplication Author: Tin-Yu Wu, Wei-Tsong Lee, Chia Fan Lin, Year : 2011, Description In a cloud storage environment, file distribution and storage is processed by storage devices providers or physical storage devices rented from the third-party companies. Through centralized management and virtualization, files are integrated into available resources for users to access. However, because of the increasing number of files, the manager cannot guarantee the optimal status of each storage node. The great number of files not only leads to the waste of hardware resources, but also worsens the control complexity of data center, which further degrades the performance of the cloud storage system. For this reason, to decrease the workload caused by duplicated files, this paper proposes a new data management structure: Index Name Server (INS), which integrates data de-duplication with nodes optimization mechanisms for cloud storage performance enhancement. INS can manage and optimize the nodes according to the client-side transmission conditions. By INS, each node can be controlled to work in the best status and matched to suitable clients as possible. In such a manner, we can improve the performance of the cloud Copyright @ 2015 IJSETR. All rights reserved.
storage system efficiently and distribute the files reasonably to reduce the load of each storage node. II. INDEX NAME SERVER (INS) Index Name Server (INS), an index server similar to Domain Name System (DNS) structure, manages the cloud data by a complex P2P-like architecture. Although INS resembles DNS in structure and functions, INS mainly processes the one-to-many matches of the storage nodes IP addresses and hash codes. In general, INS has three chief functions: Switching between the fingerprints and their corresponding storage nodes. Affirming and balancing the storage nodes load. Satisfying client demands for transmission as much as possible. SARA BEGUM, SHAISTA NOUSHEEN, KHADERBI SHAIK information during transmissions. Whenever a new INS is established, the storage node with the maximum throughput in the domain will be selected for backup. Since the tasks of INS center on data computation and transmission, we put the emphasis on the performance of the database and the throughput of data transmission, instead of storage space. For file transmission optimization, every INS has exclusive databases of its own domain, which include the fingerprints and their corresponding storage nodes. However, for WAN cloud network environment, to manage the file system by few INSs will cause great burden on the INSs. Therefore, based on the existing DNS structure, we propose to divide the INSs according to the domains and loading capacity and adopt hierarchical management architecture to reduce the workload of the INSs. Fig.2. INS Control Diagram. B. De-duplication Owing to the large-scale architecture of WAN cloud network system and the similarity of user habits and user groups, data duplication rate greatly increases as shown in Fig.2. Thus, de-duplication technique is adopted in our scheme to scatter and remix the data at local hosts, divide the file into several chunks for uploading, and designate a unique fingerprint to each file by MD5. Because of its uniqueness, every fingerprint is regarded as the identification and fingerprint of a data chunk. After checking a requested fingerprint, the INSs will make sure whether the file chunk of the same fingerprint exists in the storage space. If not, the system continues the following uploading procedure and assigns tasks to the storage node. Fig.1. Hierarchical INS architecture. A. INS Architecture Based on the stack structure of DNS, INS manages the storage nodes in the domain and handles the client file-access requirements according to its database. Though similar to DNS in architecture and functions, INS not only manages and queries the data between fingerprints and storage nodes, but also coordinates the transmissions by feedback control between storage nodes and clients. As shown in Fig.1, INSs, the central managers of the nodes, have the server-client relationships with one another in a hierarchical architecture and takes down the fingerprints and the storage node information of each data chunk. It is worth noticing that INSs only records the locations of the fingerprints and manages the storage nodes, and other information will not be taken down. Each storage node provides its own status and information for INSs to record while clients requests INSs for related III. PROPOSED SYSTEM A. Existing System Generally speaking, the commonly seen storage protocols. Nevertheless, due to the great number of users and devices in the cloud network, the managers often cannot effectively manage the efficiency of various storage nodes. such as run length encoding(rle), dictionary coding, calculation for the digital fingerprinting of data chunks, distributed hash table(dht), and bloom filter, there have been several investigations into load balancing in cloud computing systems. It uses DHT algorithm. IV. PROPOSED SYSTEM In our system we implement a project that includes the public cloud and the private cloud and also the hybrid cloud which is a combination of the both public cloud and private cloud. In general by if we used the public cloud we can t provide the security to our private data and hence our private data will be loss. So that we have to provide the security to our data for that we make a use of private cloud also. When we use a private cloud the greater security can be provided. In this system we also provide the data deduplication. Which is used to avoid the duplicate copies of data? User can upload
Refining Efficiency of Cloud Storage Services Using De-Duplication and download the files from public cloud but private cloud provides the security for that data. That means only the authorized person can upload and download the files from the public cloud. For that user generates the key and stored that key onto the private cloud. At the time of downloading user request to the private cloud for key and then access that Particular file. We use Index name servers (INS) to dynamically monitor IP information and busy. Level index to avoid network congestion or long waiting times during transmissions. IV. ALGORITHMS A. Load-Balancing The load of each chunk server is proportional to the number of chunks hosted by the server. In a load-balanced cloud, the resources can be well used while maximizing the performance of applications. Load balancing algorithm will reduce the workload. A weighted algorithm uses weight to help and determine which server receives the next request. Fig.4.LoginPage. B. Distributed Hash Table Algorithm DHT node does not maintain and possess all the information in the network, but stores only its own data.using DHT we can develop more complex distributed network architectures. DHT Features: Decentralization Scalability Fault tolerance Hash Table data structure that maps keys to values. essential building block in software systems. Distributed Hash Table(DHT) similar, but spread across many hosts Interface insert(key, value) lookup(key) DHT algorithm even will store the information of the users uploaded data and at a certain index and based an index it is going to retrieve the information of the user. Fig.5.Register. V. RESULTS AND EVALUATION Results and evaluation of this paper is shown in bellow Figs.3 to 17. Fig.6. Key Maker Generate. Fig.3.Home Page. In this above fig.6 shows the design for key maker generate a Key Details such as type, size, etc,
SARA BEGUM, SHAISTA NOUSHEEN, KHADERBI SHAIK The above fig.9 shows the design for this index name server (INS) is check file is not here direct access to cloud storage. Fig.7.Key Generation. The below fig.7 shows the design for key maker is download a public and private key. Fig.10. Download File. The above fig.10 shows the design for user chooses download file. Fig.8. File Upload. The above fig.8 shows the design for users choose file and upload and public key. Fig.11.Download File. The above fig.11 shows the design for key maker is download a private key and decrypt file. Fig.9.File Upload. Fig.12.View File Details.
Refining Efficiency of Cloud Storage Services Using De-Duplication The above fig.12 shows the design for View user file details. The above fig.15 shows the design for index name server is identifying duplicate file and sending acknowledgement in data owner. Fig.13.Check Duplicate File. The above fig.13 shows the design for index name server check a duplicated file Fig.16.Cloud Storage Details. The above fig.16 shows the design cloud storage is maintain a file details. Fig.14.Check Duplicate File. The above fig.14 shows the design for file is not here it will be stored in cloud but file is identifying duplicate file and sending acknowledgement in data owner. Fig.17.Data User View Details. Fig.15.Sending Feedback. The above fig.17 shows the design for index name server is sending acknowledgement in data user view. Data user is view details. VI. CONCLUSION In this paper, the idea of authorized data de-duplication was proposed to protect the data security by including differential authority of users in the duplicate check. Cloud computing has reached a maturity that leads it into a productive phase. This means that most of the main issues with cloud computing have been addressed to a degree that clouds have become interesting for full commercial exploitation. We proposed the SHA-1 to process not only file compression, chunk matching, data de-duplication, real-time feedback control, IP information, and busy level index monitoring, but also file storage, optimized node selection, and server load
balancing. Based on several SHA parameters that monitor IP information and the busy level index of each node, our proposed scheme can determine the location of maximum loading and trace back to the source of demands to determine the optimal backup node. According to the transmission states of storage nodes and clients, the SHA-1 received the feedback of the previous transmissions and adjusted the transmission parameters to attain the optimal performance for the storage nodes. By compressing and partitioning the files according to the chunk size of the cloud file system. VII. REFERENCES [1] Tin-Yu Wu, Member, IEEE, Jeng-Shyang Pan, Member, IEEE, and Chia-Fan Lin, Improving Accessing Efficiency of Cloud Storage Using De-Duplication and Feedback Schemes, IEEE Systems Journal, Vol. 8, No. 1, March 2014. [2] Y.-M. Huo, H.-Y. Wang, L.-A. Hu and H.-G. Yang, A cloud storage architecture model for data-intensive applications, in Proc. Int. Conf. Comput. Manage. May 2011, pp. 1 4. [3] L. B. Costa and M. Ripeanu, Towards automating the configuration of a distributed storage system, in Proc. 11th IEEE/ACM Int. Conf. Grid Comput., Oct. 2010, pp. 201 208. [4] H. Ohsaki, S. Watanabe, and M. Imase, On dynamic resource management mechanism using control theoretic approach for wide-area grid computing, in Proc. IEEE Conf. Control Appl., Aug. 2005, pp. 891 897. [5] H. Dezhi and F. Fu, Research on self-adaptive distributed storage system, in Proc. 4th Int. Conf. Wireless Commun. Netw. Mobile Comput. Oct. 2008, pp. 1 4. [6] J. Wang, P. Varman, and C.-S. Xie, Avoiding performance fluctuation in cloud storage, in Proc. Int. Conf. High Performance Comput., Dec. 2008, pp. 1 9. [7] C.-Y. Chen, K.-D. Chang and H.-C. Chao, Transaction pattern based anomaly detection algorithm for IP multimedia subsystem, IEEE Trans. Inform. Forensics Security, vol. 6, no. 1, pp. 152 161, Mar. 2011. [8] G. Urdaneta, G. Pierre, and M. Van Steen, A survey of DHT security techniques, ACM Comput. Surveys (CSUR), vol. 43, no. 2, pp. 8:1 8:49, Jan. 2011. [9] X. Sun, K. Li, and Y. Liu, An efficient replica location method in hierarchical P2P networks, in Proc. 8th IEEE/ACIS Int. Conf. Comput. Inform. Sci., Jun. 2009, pp. 769 774. [10] T.-Y. Wu, W.-T. Lee, and C. F. Lin, Cloud storage performance enhancement by real-time feedback control and de-duplication, in Proc. Wireless Telecommun. Symp. Apr. 2012, pp. 1 5. [11] H. He and L. Wang, P&P: A combined push-pull model for resource monitoring in cloud computing environment, in Proc. IEEE 3rd Int. Conf. Cloud Comput., Jul. 2010, pp. 260 267. [12] W. Li and H. Shi, Dynamic load balancing algorithm based on FCFS, in Proc. 4th Int. Conf. Innovative Comput. Inform. Control, Dec. 2009, pp. 1528 1531. SARA BEGUM, SHAISTA NOUSHEEN, KHADERBI SHAIK Author s Profile: Ms. Sara Begum has completed her B.tech in CSIT Department from Shadan Women s College Of Engineering and Technology, Jntu University, Hyderabad. Presently, she is pursuing her Masters in Computer Science from Shadan Women s College of Engineering and Technology, Khairatabad, Hyderabad, India. Ms. Shaista Nousheen has completed B.Tech (CSE) from JNTU University, M.Tech (Software Engineering) from JNTU University. She is having 6 years of experience in teaching field. Currently, she is working as an Assistant Professor of CSE Department in Shadan Women s College of Engineering and Technology, Hyderabad, T.S, India. Ms. Khadarbi shaik has completed B.Tech (CSE) from Andhra University, M.Tech (CSE) from SRM University. She is having 6 years of experience in teaching field. Currently, she is working as an Associate Professor of CSE Department in Shadan Women s College of Engineering and Technology, Hyderabad, T.S, India.