A SURVEY ON DEDUPLICATION METHODS
|
|
|
- Amy Stevens
- 10 years ago
- Views:
Transcription
1 A SURVEY ON DEDUPLICATION METHODS A.FARITHA BANU *1, C. CHANDRASEKAR #2 *1 Research Scholar in Computer Science #2 Assistant professor, Dept. of Computer Applications Sree Narayana Guru College Coimbatore , TamilNadu, India. ABSTRACT: There is an increasing demand for systems that can provide secure data storage in a cost-effective manner. Having duplicate records occupies more space and even increases the access time. Thus there is a need to eliminate duplicate records. This sounds to be simple but requires an tedious work since duplicate records do not share a common key and/or they contain errors that make duplicate matching a difficult task. Errors are also introduced as the result of transcription errors, incomplete information, lack of standard formats, or any combination of these factors. Several approaches are proposed to eliminate duplicate data first at the file level and then at the chunk level to reduce the duplicatelookup complexity. In this paper,few of the methods are discussed with its advantages and disadvantages. And also a better solution is proposed. Key words: data storage, managing data, deduplication I. INTRODUCTION The quick expansion of information sources has necessitated the users to make use of automated tools to locate desired information resources and to follow and asses their usage patterns. Thus the need for building server side and client side intelligent systems that can mine for knowledge in a successful manner emerged. Deduplication is one such tasks which fastens the search and help producing efficient results. Deduplication is a task of identifying record replicas in a data repository that refer to the same real world entity or object and systematically substitutes the reference pointers for the redundant blocks; also known as storage capacity optimization. As the volume of data in an organization grows, the amount of repeated data takes a toll on storage availability. Using deduplication has two big advantages over a normal file system: Reduced Storage Allocation - Deduplication can reduce storage needs by up to 90%-95% for files such VMDKs and backups. Basically situations where you are storing a lot of redundant data can see huge benefits. Efficient Volume Replication - Since only unique data is written disk, only those blocks need to be replicated. This can reduce traffic for replicating data by 90%-95% depending on the application. Generally deduplication is done in three ways. They are 1) Chunking: Between commercial deduplication implementations, technology varies primarily in chunking method and in architecture. In some systems, chunks are defined by physical layer constraints or in few systems only complete files are compared, which is called Single Instance Storage or SIS. The most intelligent method to chunk is generally sliding-block in which a sliding block, a window is passed along the file stream to seek out more naturally occurring internal file boundaries. 2) Client backup deduplication: This is the process where the deduplication hash calculations are initially created on the source machines and files that have identical hashes to files already in the target device are not sent. Thus the target device just creates appropriate internal links to reference the duplicated data. The benefit of this is that it avoids data being unnecessarily sent across the network thereby reducing traffic load. ISSN: Page 364
2 3) Primary storage and secondary storage: Primary storage systems are designed for optimal performance, rather than lowest possible cost. The design criteria for these systems are to increase performance, at the expense of other considerations. Moreover, primary storage systems are much less tolerant of any operation that can negatively impact performance. Also secondary storage systems contain primarily duplicate or secondary copies of data. These copies of data are typically not used for actual production operations and thus they are more tolerant of some performance degradation, in exchange for increased efficiency. Whenever data is transformed, concerns arise about potential loss of data. By definition, data deduplication systems store data differently from how it was written. As a result, users are concerned with the integrity of their data. The various methods of deduplicating data all employ slightly different techniques. However, the integrity of the data will ultimately depend upon the design of the deduplicating system, and the quality used to implement the algorithms. This paper shares a few of the concepts briefly. A. Encrypting data: II.STUDIES ON DEDUPLICATION Deduplication takes advantage of data similarity in order to achieve a reduction in storage space. Meanwhile the goal of cryptography is to make cipher text indistinguishable from theoretically random data. Thus, the goal of a secure deduplication system is to provide data security, against both nside and outside adversaries, without compromising the space efficiency[4]. First step is to find out the chunks in the documents. After identifying chunks of data both within and between file, they are encrypted using keys. Then deduplication exploits identical content by matching encrypted data. The resulting deduplication can yield cost savings by increasing the utility of a given amount of storage. The drawback is that if the same content is encrypted with two different keys it will results in very different cipher text. Thus it does not sound to be better approach. Sender Clear text Hello Encryption Recipient s Public key Encrypted text %fd$jh Fig 1: Concept of Encryption B. Fast Dual-level Fingerprinting: Recipient Clear Text Hello Decryption Recipient s Private key In this scheme [3], fingerprint datasets both at the file level and at the chunk level is obtained in a single scan of the contents. FDF breaks the dual-level fingerprinting process into task segments and three techniques to optimize the performance are employed. FDF utilizes Rabin s fingerprinting algorithm [5] to divide files into variable-sized chunks while capturing and eliminating hot zero-chunks simultaneously by judiciously selecting chunk boundaries. FDF employs SHA-1 algorithm to generate fingerprints for files and chunks and further defines hash context to preserve the intermediate result of the hash algorithm, so that the file-level hashing and the chunk-level hashing can be performed in parallel by sharing the same data cache. FDF resolves cache conflicts between different task segments and further pipeline the fingerprinting process by leveraging the computing resources of modern multi-core CPUs, thus the time overhead can be greatly reduced. The fingerprints are compared to obtain the duplicate data records. Even though it reduces time, computation is more 365
3 since different algorithms are used at different steps and the result is to be stored which occupies more space. Local/remote files Read/receive hdfhdgdfread/re Switch cache Data Cache Group First, each field s weight is set according to its relative distance, i.e., dissimilarity, among records from the approximated negative training set. Then, the first classifier utilizes the weights set to match records from different data sources. Next, with the matched records being a positive set and the nonduplicate records in the negative set, the second classifier further identifies new duplicates. Finally, all the identified duplicates and non-duplicates are used to adjust the field weights set in the first step and a new iteration begins by again employing the first classifier to identify new duplicates. The iteration stops when no new duplicates can be identified. This method is well suited for only web based data but still it requires an initial approximated training set to assign weight. Algorithm Data Cache Group Switch cache Data Cache Group Hash f File fingerprints <fingerprint1> <fingerprint2>.. Switch cache Chunk Boundary Cache Group Boundary cache group Hash c Chunk fingerprints <fingerprint1> <fingerprint2> 1. D=0 2. Set the parameters W of c 1 according to N 3. Use C 1 to get a set of duplicate vector pairs d 1 from P 4. Use C 1 to get a set of duplicate vector pairs f from N 5. P=P-d 1 6. While d N =N-f 8. D=D + d 1 +f 9. Train c 2 using D and N 10. Classify P using c 2 and get a set of newly identified duplicate vector pairs d P=P-d D=D+d Adjust the parameters W of c 1 according to N and D 14. Use C 1 to get a set of duplicate vector pairs d 1 from P 15. Use C 1 to get a set of duplicate vector pairs f from N 16. N=N 17. Return D Fig 3: Algorithm for UDD Duplicate Detection Fig 2: Parallelism of Fingerprinting Task Segments C. Unsupervised Duplicate Detection: UDD can effectively identify duplicates from the query result records of multiple Web databases for a given query it uses two classifiers. The WCSS classifier act as the weak classifier which is used to identify strong positive examples and an SVM classifier acts as the second classifier. Data Generation Technique: Publicly available test data [7] with known deduplication or linkage status is used to check the new linkage algorithms and techniques which are latter evaluated and compared. However, publication of data containing personal information is impossible due to privacy and confidentiality issues. So an alternative is to use artificially created data. In these data, content and error rates can be controlled, and the deduplication or linkage status is known. Artificially 366
4 generated data seems to be an attractive alternative. It models the content and statistical properties of comparable real world data sets, including the frequency distributions of attribute values, error types and distributions, and error positions within these values. The first step is to develop a data generator. A data generator creates data sets containing names and addresses, dates, telephone and identifier numbers. Then the original records are created by collecting details from various data sources. These original records are used as reference to identify the duplicate copies. Clearly this technique requires a prior work to develop original records which increases the overhead. D. Interactive Deduplication Using Active Learning: The main challenge in deduplication task is to find a function that can resolve when two records refer to the same entity in spite of errors and inconsistencies in the data. A learning based deduplication system ALIAS2 allows automatic construction of the deduplication function by interactively discovering challenging training pairs. The idea is to simultaneously build several redundant functions and the disagreement amongst them to discover new kinds inconsistencies amongst duplicates in the dataset are exploited. An active learner actively picks subsets of instances which when labeled will provide the highest information gain to the learner. The difficult task of bringing together the potentially confusing record pairs is automated by the learner. The user has to only perform the task of labeling the selected pairs of records as duplicate or not. Architecture: Fig 4: Architecture of Interactive Deduplication Database of records (D): The original set D of records in which duplicates need to be detected. The data has d attributes a 1, a d, each of which could be textual or numeric. The goal of the system is to obtain the subset of pairs in the cross-product D D that can be labeled as duplicates. Initial training pairs (L): An optional small seed L of training records arranged in pairs of duplicates or non-duplicates. Similarity functions (F): A set F of n f functions each of which computes a similarity match between two records r 1, r 2 based on any subset of d attributes. Steps: At first map the initial training records in L into a pair format through a mapper module. The mapper module takes as input a pair of records r 1, r 2, computes the n f similarity functions F and returns the result as a new record with n f attributes. For each duplicate pair, assign a class-label of 1" and for all the other pairs in L L, assign a class label of 0". At the end of this step a mapped training dataset L p ia obtained. These L p instances are used to initialize the learning component of the system. Initial training records( L) Similarity Functions(F) unlabeled input records Mapper Mapper Maped labeled instances Pool of Mapped unlabeled instances Clasifier Training data T Train Select n instances for labeling use r 1. Input: L, D and F. 2. Create Pairs Lp from the labeled data L and F. 3. Create pairs Dp from unlabeled data D and F. 4. Initial training set T = Lp. 5. Loop until user satisfaction Train classifier C using T. Use C to select a S of n instances from Dp for labeling. If S is empty, exit Loop. Collect user feedback on the labels of S. Add S to T and remove S from Dp. 6. Output classifier C. Similarity indices Predicate for uncertain region 367
5 Fig 5: Steps Used In Interactive Deduplication E. Genetic programming: To deal with the deduplication problem, an approach based on Genetic programming [6] is used. This approach combines several different pieces of evidence extracted from the data content to produce a deduplication function that is able to identify whether two or more entries in a repository are replicas or not. Record deduplication is a time consuming process hence make out duplication function for small repository so that the resulting function can be applied to other areas. The resulting function should be able to efficiently maximize the identification of record replicas. The steps of Genetic algorithm are the following: 1. Initialize the population (with random or user provided individuals). 2. Evaluate all individuals in the present population, assigning a numeric rating or fitness value to each one. 3. If the termination criterion is fulfilled, then execute 4. The last step. Otherwise continue. 5. Reproduce the best n individuals into the next generation population. 6. Select m individuals that will compose the next generation with the best parents. 7. Apply the genetic operations to all individuals selected. Their offspring will compose the next population. Replace the existing generation by the generated population and go back to Step Present the best individual(s) in the population as the output of the evolutionary process. The deduplication function obtained using genetic approach is implemented in servers or client side system so that data which is to be stored are verified before they reach the database. The function is automatically generated which reduces the human effort. When a new document is to be stored, function is derived from the evidences collected from it and it is used to check for duplicate availability in data storage. If found a match, the duplicate copy is prevented from storing again. This technique is efficient and produces a outcome. III.CONCLUSION Duplicate detection and removal is an important problem in data cleaning, and an adaptive approach that learns to identify duplicate records and removing them has clear advantages. In this paper, surveys of various techniques for deduplication which are previously defined are discussed. The performance of the outcome can be still improved by using PSO algorithm for generating deduplication function. PSO appears similar to GA in term of its selecting strategy of the best child (or the best swarm), but it is really different. It utilizes the intercommunication between each individual swarm with the best one to update its position and velocity. This algorithm operates with randomly created population of potential solutions and searches for the optimum value by creating the successive population of solutions PSO reduces complexity by using simpler terms of computations because its crossover and mutation operation are done simultaneously. Thus a better solution to the deduplication problem will be provided by using PSO generated deduplication function. IV. REFERENCES [1]R.A. Baeza-Yates and B.A. Ribeiro-Neto, Modern Information Retrieval. ACM Press/Addison-Wesley, [2]R. Bell and F. Dravis, Is You Data Dirty? and Does that Matter?, Accenture Whiter Paper, [3]Jiansheng Wei,1Ke Zhou, 2Lei Tian, 1Hua Wang, Dan Feng, A Fast Dual-level Fingerprinting Scheme for Data Deduplication [4]Mark W. Storer Kevin Greenan Darrell D. E. Long Ethan L. Miller, Secure Data Deduplication [5]Michael O. Rabin, "Fingerprinting by random polynomials", Technical Report, No. TR-15-81, Center for Research in Computing Technology, Harvard University, Cambridge, MA, USA, [6]Moise s G. de Carvalho, Alberto H.F. Laender, Marcos Andre Gonc alves, and Altigran S. da Silva. A Genetic Programming Approach to Record Deduplication [7]Peter Christen. Probabilistic Data Generation for Deduplication and Data Linkage, [8]Weifeng Su, Jiying Wang, and Frederick H. Lochovsky, Record Matching over Query Results from Multiple Web Databases, IEEE Transactions On Knowledge And Data Engineering, VOL. 22, NO. 4, APRIL
6 369
An Improving Genetic Programming Approach Based Deduplication Using KFINDMR
An Improving Genetic Programming Approach Based Deduplication Using KFINDMR P.Shanmugavadivu #1, N.Baskar *2 # Department of Computer Engineering, Bharathiar University Sri Ramakrishna Polytechnic College,
Deploying De-Duplication on Ext4 File System
Deploying De-Duplication on Ext4 File System Usha A. Joglekar 1, Bhushan M. Jagtap 2, Koninika B. Patil 3, 1. Asst. Prof., 2, 3 Students Department of Computer Engineering Smt. Kashibai Navale College
A Genetic Programming Approach for Record Deduplication
A Genetic Programming Approach for Record Deduplication L.Chitra Devi 1, S.M.Hansa 2, Dr.G.N.K.Suresh Babu 3 Research Scholars, Dept. of Computer Applications, GKM College of Engineering and Technology,
How To Make A Backup Service More Efficient
AENSI Journals Advances in Natural and Applied Sciences ISSN:1995-0772 EISSN: 1998-1090 Journal home page: www.aensiweb.com/anas Deduplication In Cloud Backup Services Using Geneic Algorithm For Personal
An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space
An Authorized Duplicate Check Scheme for Removing Duplicate Copies of Repeating Data in The Cloud Environment to Reduce Amount of Storage Space Jannu.Prasanna Krishna M.Tech Student, Department of CSE,
A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose
A Survey on Aware of Local-Global Cloud Backup Storage for Personal Purpose Abhirupa Chatterjee 1, Divya. R. Krishnan 2, P. Kalamani 3 1,2 UG Scholar, Sri Sairam College Of Engineering, Bangalore. India
DATA SECURITY IN CLOUD USING ADVANCED SECURE DE-DUPLICATION
DATA SECURITY IN CLOUD USING ADVANCED SECURE DE-DUPLICATION Hasna.R 1, S.Sangeetha 2 1 PG Scholar, Dhanalakshmi Srinivasan College of Engineering, Coimbatore. 2 Assistant Professor, Dhanalakshmi Srinivasan
A Survey on Deduplication Strategies and Storage Systems
A Survey on Deduplication Strategies and Storage Systems Guljar Shaikh ((Information Technology,B.V.C.O.E.P/ B.V.C.O.E.P, INDIA) Abstract : Now a day there is raising demands for systems which provide
3Gen Data Deduplication Technical
3Gen Data Deduplication Technical Discussion NOTICE: This White Paper may contain proprietary information protected by copyright. Information in this White Paper is subject to change without notice and
IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS
IMPLEMENTATION OF SOURCE DEDUPLICATION FOR CLOUD BACKUP SERVICES BY EXPLOITING APPLICATION AWARENESS Nehal Markandeya 1, Sandip Khillare 2, Rekha Bagate 3, Sayali Badave 4 Vaishali Barkade 5 12 3 4 5 (Department
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique
A Novel Way of Deduplication Approach for Cloud Backup Services Using Block Index Caching Technique Jyoti Malhotra 1,Priya Ghyare 2 Associate Professor, Dept. of Information Technology, MIT College of
A Survey on Intrusion Detection System with Data Mining Techniques
A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,
The assignment of chunk size according to the target data characteristics in deduplication backup system
The assignment of chunk size according to the target data characteristics in deduplication backup system Mikito Ogata Norihisa Komoda Hitachi Information and Telecommunication Engineering, Ltd. 781 Sakai,
MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services
MAD2: A Scalable High-Throughput Exact Deduplication Approach for Network Backup Services Jiansheng Wei, Hong Jiang, Ke Zhou, Dan Feng School of Computer, Huazhong University of Science and Technology,
Improving data integrity on cloud storage services
International Journal of Engineering Science Invention ISSN (Online): 2319 6734, ISSN (Print): 2319 6726 Volume 2 Issue 2 ǁ February. 2013 ǁ PP.49-55 Improving data integrity on cloud storage services
Client Perspective Based Documentation Related Over Query Outcomes from Numerous Web Databases
Beyond Limits...Volume: 2 Issue: 2 International Journal Of Advance Innovations, Thoughts & Ideas Client Perspective Based Documentation Related Over Query Outcomes from Numerous Web Databases B. Santhosh
[email protected]
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 3 March 2015, Page No. 10715-10720 Data DeDuplication Using Optimized Fingerprint Lookup Method for
A Deduplication-based Data Archiving System
2012 International Conference on Image, Vision and Computing (ICIVC 2012) IPCSIT vol. 50 (2012) (2012) IACSIT Press, Singapore DOI: 10.7763/IPCSIT.2012.V50.20 A Deduplication-based Data Archiving System
Multi-level Metadata Management Scheme for Cloud Storage System
, pp.231-240 http://dx.doi.org/10.14257/ijmue.2014.9.1.22 Multi-level Metadata Management Scheme for Cloud Storage System Jin San Kong 1, Min Ja Kim 2, Wan Yeon Lee 3, Chuck Yoo 2 and Young Woong Ko 1
Efficient and Effective Duplicate Detection Evaluating Multiple Data using Genetic Algorithm
Efficient and Effective Duplicate Detection Evaluating Multiple Data using Genetic Algorithm Dr.M.Mayilvaganan, M.Saipriyanka Associate Professor, Dept. of Computer Science, PSG College of Arts and Science,
Dynamic Query Updation for User Authentication in cloud Environment
Dynamic Query Updation for User Authentication in cloud Environment Gaurav Shrivastava 1, Dr. S. Prabakaran 2 1 Research Scholar, Department of Computer Science, SRM University, Kattankulathur, Tamilnadu,
Peer-to-peer Cooperative Backup System
Peer-to-peer Cooperative Backup System Sameh Elnikety Mark Lillibridge Mike Burrows Rice University Compaq SRC Microsoft Research Abstract This paper presents the design and implementation of a novel backup
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
Distributed Computing and Big Data: Hadoop and MapReduce
Distributed Computing and Big Data: Hadoop and MapReduce Bill Keenan, Director Terry Heinze, Architect Thomson Reuters Research & Development Agenda R&D Overview Hadoop and MapReduce Overview Use Case:
Efficient and Secure Dynamic Auditing Protocol for Integrity Verification In Cloud Storage
Efficient and Secure Dynamic Auditing Protocol for Integrity Verification In Cloud Storage Priyanga.R 1, Maheswari.B 2, Karthik.S 3 PG Scholar, Department of CSE, SNS College of technology, Coimbatore-35,
DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM
INTERNATIONAL JOURNAL OF RESEARCH IN COMPUTER APPLICATIONS AND ROBOTICS ISSN 2320-7345 DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM M. Mayilvaganan 1, S. Aparna 2 1 Associate
Large-Scale Data Sets Clustering Based on MapReduce and Hadoop
Journal of Computational Information Systems 7: 16 (2011) 5956-5963 Available at http://www.jofcis.com Large-Scale Data Sets Clustering Based on MapReduce and Hadoop Ping ZHOU, Jingsheng LEI, Wenjun YE
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology November-2015 Volume 2, Issue-6
International Journal of Engineering Research ISSN: 2348-4039 & Management Technology Email: [email protected] November-2015 Volume 2, Issue-6 www.ijermt.org Modeling Big Data Characteristics for Discovering
A Review of Anomaly Detection Techniques in Network Intrusion Detection System
A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In
Record Deduplication By Evolutionary Means
Record Deduplication By Evolutionary Means Marco Modesto, Moisés G. de Carvalho, Walter dos Santos Departamento de Ciência da Computação Universidade Federal de Minas Gerais 31270-901 - Belo Horizonte
Big Data With Hadoop
With Saurabh Singh [email protected] The Ohio State University February 11, 2016 Overview 1 2 3 Requirements Ecosystem Resilient Distributed Datasets (RDDs) Example Code vs Mapreduce 4 5 Source: [Tutorials
Data Deduplication and Tivoli Storage Manager
Data Deduplication and Tivoli Storage Manager Dave Cannon Tivoli Storage Manager rchitect Oxford University TSM Symposium September 2007 Disclaimer This presentation describes potential future enhancements
CLOUD COMPUTING SECURITY ARCHITECTURE - IMPLEMENTING DES ALGORITHM IN CLOUD FOR DATA SECURITY
CLOUD COMPUTING SECURITY ARCHITECTURE - IMPLEMENTING DES ALGORITHM IN CLOUD FOR DATA SECURITY Varun Gandhi 1 Department of Computer Science and Engineering, Dronacharya College of Engineering, Khentawas,
Big Data with Rough Set Using Map- Reduce
Big Data with Rough Set Using Map- Reduce Mr.G.Lenin 1, Mr. A. Raj Ganesh 2, Mr. S. Vanarasan 3 Assistant Professor, Department of CSE, Podhigai College of Engineering & Technology, Tirupattur, Tamilnadu,
Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
Keywords Cloud Storage, Error Identification, Partitioning, Cloud Storage Integrity Checking, Digital Signature Extraction, Encryption, Decryption
Partitioning Data and Domain Integrity Checking for Storage - Improving Cloud Storage Security Using Data Partitioning Technique Santosh Jogade *, Ravi Sharma, Prof. Rajani Kadam Department Of Computer
Data Refinery with Big Data Aspects
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 3, Number 7 (2013), pp. 655-662 International Research Publications House http://www. irphouse.com /ijict.htm Data
Data Integrity by Aes Algorithm ISSN 2319-9725
Data Integrity by Aes Algorithm ISSN 2319-9725 Alpha Vijayan Nidhiya Krishna Sreelakshmi T N Jyotsna Shukla Abstract: In the cloud computing, data is moved to a remotely located cloud server. Cloud will
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES
DECISION TREE INDUCTION FOR FINANCIAL FRAUD DETECTION USING ENSEMBLE LEARNING TECHNIQUES Vijayalakshmi Mahanra Rao 1, Yashwant Prasad Singh 2 Multimedia University, Cyberjaya, MALAYSIA 1 [email protected]
WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression
WHITE PAPER Improving Storage Efficiencies with Data Deduplication and Compression Sponsored by: Oracle Steven Scully May 2010 Benjamin Woo IDC OPINION Global Headquarters: 5 Speen Street Framingham, MA
Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG
Arnab Roy Fujitsu Laboratories of America and CSA Big Data WG 1 Security Analytics Crypto and Privacy Technologies Infrastructure Security 60+ members Framework and Taxonomy Chair - Sree Rajan, Fujitsu
IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE
IDENTIFYING AND OPTIMIZING DATA DUPLICATION BY EFFICIENT MEMORY ALLOCATION IN REPOSITORY BY SINGLE INSTANCE STORAGE 1 M.PRADEEP RAJA, 2 R.C SANTHOSH KUMAR, 3 P.KIRUTHIGA, 4 V. LOGESHWARI 1,2,3 Student,
ADVANCE SECURITY TO CLOUD DATA STORAGE
Journal homepage: www.mjret.in ADVANCE SECURITY TO CLOUD DATA STORAGE ISSN:2348-6953 Yogesh Bhapkar, Mitali Patil, Kishor Kale,Rakesh Gaikwad ISB&M, SOT, Pune, India Abstract: Cloud Computing is the next
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier
Feature Selection using Integer and Binary coded Genetic Algorithm to improve the performance of SVM Classifier D.Nithya a, *, V.Suganya b,1, R.Saranya Irudaya Mary c,1 Abstract - This paper presents,
A UPS Framework for Providing Privacy Protection in Personalized Web Search
A UPS Framework for Providing Privacy Protection in Personalized Web Search V. Sai kumar 1, P.N.V.S. Pavan Kumar 2 PG Scholar, Dept. of CSE, G Pulla Reddy Engineering College, Kurnool, Andhra Pradesh,
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM
A REVIEW PAPER ON THE HADOOP DISTRIBUTED FILE SYSTEM Sneha D.Borkar 1, Prof.Chaitali S.Surtakar 2 Student of B.E., Information Technology, J.D.I.E.T, [email protected] Assistant Professor, Information
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
Keywords Cloud Computing, CRC, RC4, RSA, Windows Microsoft Azure
Volume 3, Issue 11, November 2013 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Cloud Computing
Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior
Sustaining Privacy Protection in Personalized Web Search with Temporal Behavior N.Jagatheshwaran 1 R.Menaka 2 1 Final B.Tech (IT), [email protected], Velalar College of Engineering and Technology,
Project Proposal. Data Storage / Retrieval with Access Control, Security and Pre-Fetching
1 Project Proposal Data Storage / Retrieval with Access Control, Security and Pre- Presented By: Shashank Newadkar Aditya Dev Sarvesh Sharma Advisor: Prof. Ming-Hwa Wang COEN 241 - Cloud Computing Page
A survey on Data Mining based Intrusion Detection Systems
International Journal of Computer Networks and Communications Security VOL. 2, NO. 12, DECEMBER 2014, 485 490 Available online at: www.ijcncs.org ISSN 2308-9830 A survey on Data Mining based Intrusion
Cloud SQL Security. Swati Srivastava 1 and Meenu 2. Engineering College., Gorakhpur, U.P. Gorakhpur, U.P. Abstract
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 5 (2014), pp. 479-484 International Research Publications House http://www. irphouse.com /ijict.htm Cloud
How To Make A Secure Storage On A Mobile Device Secure
Outsourcing with secure accessibility in mobile cloud computing Monika Waghmare 1, Prof T.A.Chavan 2 Department of Information technology, Smt.Kashibai Navale College of Engineering, Pune, India. Abstract
Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm
Journal of Al-Nahrain University Vol.15 (2), June, 2012, pp.161-168 Science Memory Allocation Technique for Segregated Free List Based on Genetic Algorithm Manal F. Younis Computer Department, College
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION
A BRIEF REVIEW ALONG WITH A NEW PROPOSED APPROACH OF DATA DE DUPLICATION Suprativ Saha 1 and Avik Samanta 2 1 Department of Computer Science and Engineering, Global Institute of Management and Technology,
APPLICATION OF MULTI-AGENT SYSTEMS FOR NETWORK AND INFORMATION PROTECTION
18-19 September 2014, BULGARIA 137 Proceedings of the International Conference on Information Technologies (InfoTech-2014) 18-19 September 2014, Bulgaria APPLICATION OF MULTI-AGENT SYSTEMS FOR NETWORK
Deduplication Demystified: How to determine the right approach for your business
Deduplication Demystified: How to determine the right approach for your business Presented by Charles Keiper Senior Product Manager, Data Protection Quest Software Session Objective: To answer burning
A Layered Signcryption Model for Secure Cloud System Communication
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 6, June 2015, pg.1086
Data Deduplication and Tivoli Storage Manager
Data Deduplication and Tivoli Storage Manager Dave annon Tivoli Storage Manager rchitect March 2009 Topics Tivoli Storage, IM Software Group Deduplication technology Data reduction and deduplication in
Optimized And Secure Data Backup Solution For Cloud Using Data Deduplication
RESEARCH ARTICLE OPEN ACCESS Optimized And Secure Data Backup Solution For Cloud Using Data Deduplication Siva Ramakrishnan S( M.Tech ) 1,Vinoth Kumar P (M.E) 2 1 ( Department Of Computer Science Engineering,
Journal of Electronic Banking Systems
Journal of Electronic Banking Systems Vol. 2015 (2015), Article ID 614386, 44 minipages. DOI:10.5171/2015.614386 www.ibimapublishing.com Copyright 2015. Khaled Ahmed Nagaty. Distributed under Creative
Secure Collaborative Privacy In Cloud Data With Advanced Symmetric Key Block Algorithm
Secure Collaborative Privacy In Cloud Data With Advanced Symmetric Key Block Algorithm Twinkle Graf.F 1, Mrs.Prema.P 2 1 (M.E- CSE, Dhanalakshmi College of Engineering, Chennai, India) 2 (Asst. Professor
Efficient File Sharing Scheme in Mobile Adhoc Network
Efficient File Sharing Scheme in Mobile Adhoc Network 1 Y. Santhi, 2 Mrs. M. Maria Sheeba 1 2ndMECSE, Ponjesly College of engineering, Nagercoil 2 Assistant professor, Department of CSE, Nagercoil Abstract:
International Journal of Emerging Technology & Research
International Journal of Emerging Technology & Research An Implementation Scheme For Software Project Management With Event-Based Scheduler Using Ant Colony Optimization Roshni Jain 1, Monali Kankariya
A block based storage model for remote online backups in a trust no one environment
A block based storage model for remote online backups in a trust no one environment http://www.duplicati.com/ Kenneth Skovhede (author, [email protected]) René Stach (editor, [email protected]) Abstract
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques
Comparision of k-means and k-medoids Clustering Algorithms for Big Data Using MapReduce Techniques Subhashree K 1, Prakash P S 2 1 Student, Kongu Engineering College, Perundurai, Erode 2 Assistant Professor,
Email Spam Detection Using Customized SimHash Function
International Journal of Research Studies in Computer Science and Engineering (IJRSCSE) Volume 1, Issue 8, December 2014, PP 35-40 ISSN 2349-4840 (Print) & ISSN 2349-4859 (Online) www.arcjournals.org Email
Cloud Database Storage Model by Using Key-as-a-Service (KaaS)
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 7 July 2015, Page No. 13284-13288 Cloud Database Storage Model by Using Key-as-a-Service (KaaS) J.Sivaiah
Secure Group Oriented Data Access Model with Keyword Search Property in Cloud Computing Environment
Secure Group Oriented Data Access Model with Keyword Search Property in Cloud Computing Environment Chih Hung Wang Computer Science and Information Engineering National Chiayi University Chiayi City 60004,
Public Auditing & Automatic Protocol Blocking with 3-D Password Authentication for Secure Cloud Storage
Public Auditing & Automatic Protocol Blocking with 3-D Password Authentication for Secure Cloud Storage P. Selvigrija, Assistant Professor, Department of Computer Science & Engineering, Christ College
An Agent-Based Concept for Problem Management Systems to Enhance Reliability
An Agent-Based Concept for Problem Management Systems to Enhance Reliability H. Wang, N. Jazdi, P. Goehner A defective component in an industrial automation system affects only a limited number of sub
ISSN: 2321-7782 (Online) Volume 2, Issue 1, January 2014 International Journal of Advance Research in Computer Science and Management Studies
ISSN: 2321-7782 (Online) Volume 2, Issue 1, January 2014 International Journal of Advance Research in Computer Science and Management Studies Research Paper Available online at: www.ijarcsms.com New Challenges
Information Security in Big Data using Encryption and Decryption
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Information Security in Big Data using Encryption and Decryption SHASHANK -PG Student II year MCA S.K.Saravanan, Assistant Professor
A Content based Spam Filtering Using Optical Back Propagation Technique
A Content based Spam Filtering Using Optical Back Propagation Technique Sarab M. Hameed 1, Noor Alhuda J. Mohammed 2 Department of Computer Science, College of Science, University of Baghdad - Iraq ABSTRACT
MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT
MINIMIZING STORAGE COST IN CLOUD COMPUTING ENVIRONMENT 1 SARIKA K B, 2 S SUBASREE 1 Department of Computer Science, Nehru College of Engineering and Research Centre, Thrissur, Kerala 2 Professor and Head,
Ranked Keyword Search Using RSE over Outsourced Cloud Data
Ranked Keyword Search Using RSE over Outsourced Cloud Data Payal Akriti 1, Ms. Preetha Mary Ann 2, D.Sarvanan 3 1 Final Year MCA, Sathyabama University, Tamilnadu, India 2&3 Assistant Professor, Sathyabama
EMC PERSPECTIVE. An EMC Perspective on Data De-Duplication for Backup
EMC PERSPECTIVE An EMC Perspective on Data De-Duplication for Backup Abstract This paper explores the factors that are driving the need for de-duplication and the benefits of data de-duplication as a feature
LOAD BALANCING IN CLOUD COMPUTING
LOAD BALANCING IN CLOUD COMPUTING Neethu M.S 1 PG Student, Dept. of Computer Science and Engineering, LBSITW (India) ABSTRACT Cloud computing is emerging as a new paradigm for manipulating, configuring,
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor
A Genetic Algorithm-Evolved 3D Point Cloud Descriptor Dominik Wȩgrzyn and Luís A. Alexandre IT - Instituto de Telecomunicações Dept. of Computer Science, Univ. Beira Interior, 6200-001 Covilhã, Portugal
AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK
Abstract AN OVERVIEW OF QUALITY OF SERVICE COMPUTER NETWORK Mrs. Amandeep Kaur, Assistant Professor, Department of Computer Application, Apeejay Institute of Management, Ramamandi, Jalandhar-144001, Punjab,
A QoS-Aware Web Service Selection Based on Clustering
International Journal of Scientific and Research Publications, Volume 4, Issue 2, February 2014 1 A QoS-Aware Web Service Selection Based on Clustering R.Karthiban PG scholar, Computer Science and Engineering,
SECURE AND TRUSTY STORAGE SERVICES IN CLOUD COMPUTING
SECURE AND TRUSTY STORAGE SERVICES IN CLOUD COMPUTING Saranya.V 1, Suganthi.J 2, R.G. Suresh Kumar 3 1,2 Master of Technology, Department of Computer Science and Engineering, Rajiv Gandhi College of Engineering
A Secure Model for Medical Data Sharing
International Journal of Database Theory and Application 45 A Secure Model for Medical Data Sharing Wong Kok Seng 1,1,Myung Ho Kim 1, Rosli Besar 2, Fazly Salleh 2 1 Department of Computer, Soongsil University,
How To Secure Cloud Computing, Public Auditing, Security, And Access Control In A Cloud Storage System
REVIEW ARTICAL A Novel Privacy-Preserving Public Auditing and Secure Searchable Data Cloud Storage Dumala Harisha 1, V.Gouthami 2 1 Student, Computer Science & Engineering-Department, JNTU Hyderabad India
PORs: Proofs of Retrievability for Large Files
PORs: Proofs of Retrievability for Large Files Ari Juels RSA Laboratories Burt Kaliski EMC Innovation Network ACM CCS Conference, 2007 presented by Serkan Uzunbaz Source: www.rsa.com/rsalabs/node.asp?id=3357
Introduction to Machine Learning and Data Mining. Prof. Dr. Igor Trajkovski [email protected]
Introduction to Machine Learning and Data Mining Prof. Dr. Igor Trajkovski [email protected] Ensembles 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training
DELEGATING LOG MANAGEMENT TO THE CLOUD USING SECURE LOGGING
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320 088X IJCSMC, Vol. 3, Issue.
Data Backup and Archiving with Enterprise Storage Systems
Data Backup and Archiving with Enterprise Storage Systems Slavjan Ivanov 1, Igor Mishkovski 1 1 Faculty of Computer Science and Engineering Ss. Cyril and Methodius University Skopje, Macedonia [email protected],
REMOTE BACKUP-WHY SO VITAL?
REMOTE BACKUP-WHY SO VITAL? Any time your company s data or applications become unavailable due to system failure or other disaster, this can quickly translate into lost revenue for your business. Remote
