AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING(27-32) AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING
|
|
|
- Charleen Hines
- 10 years ago
- Views:
Transcription
1 AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE MINING Ravindra Kumar Tiwari Ph.D Scholar, Computer Sc. AISECT University, Bhopal Abstract-The recent advancement in data mining technology to analyze vast amount of data has played an important role in several areas of Business processing. Data mining also opens new threats to privacy and information security if not done or used properly. The main problem is that from non-sensitive data, one is able to infer sensitive information, including personal information, fact or even patterns which are generated by any algorithm of data mining. In order to focusing on privacy preserving association rule mining, the simplistic solution to address the problem of privacy is presented. The solution is to survey different aspects which are discussed in the several research papers and after analyzing those research papers conclude a new solution which is best in efficiency and performance. Before analyzing the algorithms, the data structure of database and sensitive association rule mining set have been analyzed to build the more effective model. Keywords -Data Mining, Association Rule Mining, Privacy Preserving 1. INTRODUCTION Data mining services is not alone sufficient. Data mining services play an important role in the field of Communication industry. The recent advancement in data mining technology to analyze vast amount of data has played an important role in several areas of Business processing. Data mining also opens new threats to privacy and information security if not done or used properly. The main problem is that to hide sensitive information, including personal information, even patterns which are generated by any algorithm of data mining. In order to focusing on privacy preserving association rule mining. The statistical significance of a pattern (called support) was measured as a percentage of data sequences containing the pattern. In the problem was generalized by adding taxonomy (is-a hierarchy) on items and time constraints such as minimum and maximum gap between adjacent elements of a pattern, where discovered patterns (called episodes) could have different type of ordering: full (serial episodes), none (parallel episodes) or partial and had to appear within a user-defined time window. The episodes were mined over a single event sequence and their statistical significance was measured as a percentage of windows containing the episode (frequency) or as a number of occurrences. Efficient algorithms were presented for serial and parallel episodes. In the model was extended to handle events described by a set of attributes. Episodes mined in sequences of such events were build of a set of unary and binary predicates on event attributes. To make discovery of such complex episodes feasible, it was assumed that a user has to specify a class of interesting patterns by providing a template. In a language capable of specifying episodes of interest based on logical predicates was presented and a few further extensions to the model were added. 1.1 Hiding Purposes The PPDM algorithms [4] is classified into two types :Data hiding and Rule hiding, According to the purposes of hiding, Data hiding refers to the cases where the sensitive data from original database like identity, name, and address that can be linked, directly or indirectly, to an individual person are hided. In contrast, the Rule hiding, the sensitive knowledge (rule) derived from original database after applying data mining are hided. Majority of the PPDM algorithms used data hiding techniques. Most PPDM algorithms hide sensitive patterns by modifying data. Currently, the PPDM algorithms are mainly used on the tasks of classification, association rule and clustering. Association analysis involves the discovery of associated rules, showing attribute value and conditions that occur frequently in a given set of data. Classification is the process of finding a set of models that describe and distinguish data classes or concepts, for the purpose of being able to use the model to predict the class of objects whose class label is unknown. Clustering Analysis concerns the problem of decomposing or partitioning a data set (usually multivariate) into groups so that the points in one group are similar to each other and are as different as possible from the points in other groups. 1.2 Goal of Privacy Preservation The privacy preserving goal [5] is to mine the raw data while privacy is not being leaked. Current technology is mainly realized from these two aspects: 1) The sensitive raw data in database such as names, certificate numbers, addresses and hobbies can be modified or cut to avoid the leak of personal private information. That is to say, without visiting privacy data, correct results can be gained by using data mining algorithms. 2) Sensitive rules included in data mining results can be eliminated through rule algorithms. That is, try to protect potential sensitive rules in mining process not to be Vol. 1(1), January 2014 (ISSN: ) Page
2 obtained by the party with ill intention who will maliciously reason. 1.3 Privacy Preservation Techniques Several privacy-preserving techniques [13] for association rule mining have also been proposed in the past few years. Various proposals and algorithms have been developed for centralized data, while others refer to a distributed data scenario. Distributed data scenarios can also be classified as horizontal data distribution and vertical data distribution. The purpose of privacy preserving [13] is to discover accurate patterns without precise access to the original data. The algorithm of association rule mining is to mine the association rule based on the given minimal support and minimal confidence. Therefore, the most direct method to hide association rule is to reduce the support or confidence of the association rule below the minimal support of minimal confidence. A lot of implementations [2] of the confidentiality of data and knowledge are applied in association rule mining process. According to privacy protection technologies, at present, privacy preserving association rule mining algorithms commonly can be divided into three categories: i) Heuristic-based techniques ii) Reconstruction-based techniques iii)cryptography-based techniques Heuristic based techniques is used for centralized data set and cryptography-based techniques are designed for protecting privacy in a distributed dataset by using encryption technique. Heuristic-based techniques [2] are to resolve how to select the appropriate data sets for data modification. Since the optimal selective data modification or sanitization is an NP-Hard problem, heuristics can be used to address the complexity issues. The methods of Heuristic-based modification include perturbation, which is accomplished by the alteration of an attribute value by a new value (i.e., changing a 1-value to a 0-value, or adding noise), and blocking, which is the replacement of an existing attribute value with a?. There is a basic principle of choosing the transaction or the item of item set to be modified that we should reduce the influence of the original database as far as possible. 2. MOTIVATION Successful applications of data mining techniques have been demonstrated in many areas that benefit commercial, social and human activities. Along with the success of these techniques, they pose a threat to privacy. One can easily disclose other s sensitive information or knowledge by using these techniques. So, before releasing database, sensitive information or knowledge must be hidden from unauthorized access. To solve privacy problem, PPDM has become a hotspot in data mining and database security field. In order to focusing on privacy preserving association rule mining, the simplistic solution to address the problem of privacy is presented. To overcome these problems, Improved Privacy Preserving Algorithm Using Association Rule Mining is proposed which is based on the random Perturbation technique and gives best result in terms of efficiency and performance. Proposed algorithm is a good way to apply data mining techniques with security that hides logical instances from others. Data mining is an interactive and iterative process. A user formulate a data mining task as a KDD query in a high level language. The query is sent to the knowledge Discovery Management System which retrieve the data from the database, chooses the right data mining algorithm and return result in a form of frequent pattern, association rule and pruning result to the user. The system should provide mechanism for storing discovered knowledge in a database for further selective analyses. So far proposed an SQL like language for specifying all tasks concerning discovery of frequent pattern, association rule and pruning resulting databases. The language is MineSQL, which is an extension of SQL proposed to handle association rules queries. This approach seems to be reasonable because association rules and sequential patterns are very often mined in the same datasets. MineSQL is designed as a query language for advanced users but it can also serve as an Application Programming Interface (API) for building business application dealing with knowledge discovery. MineSQL provides mechanisms for storing patterns in relational tables by offering new complex data types. MineSQL allows a user to specify various constraints defining the requested class of patterns. Current algorithm does not handle item constraints at all or require too detailed information on the structure of patterns. In this Dissertation an algorithm using item constraints in the mining process will be presented. A special emphasis will be laid on the fact that the source data is likely to be stored in relational tables. 3. PRIVACY PROTECTION TECHNIQUE There are various of privacy protection [7] Technique what apply to centralized distribution like Reconstruction Technique, Random response technique, Random perturbation technique, Heuristic Technology, Isometric transformation technology. There are various of privacy protection technique what apply to distributed distribution Vol. 1(1), January 2014 (ISSN: ) Page
3 Like Switching encryption technique, Secure multiparty computation. Among them Random perturbation technique is to convert the raw data randomly according to the set of probability which has a great advantage in the privacy data mining. 3.1 Data Distribution The PPDM algorithms [13] can be first divided into two major categories, centralized and distributed data, based on the distribution of data. In a centralized database environment, data are all stored in a single database; while, in a distributed database environment, data are stored in different databases. Distributed data scenarios can be further classified into horizontal and vertical data distributions. Horizontal distributions refer to the cases where different records of the same data attributes are resided in different places. While in a vertical data distribution, different attributes of the same record of data are resided in different places. Earlier research has been predominately focused on dealing with privacy preservation in a centralized database. The difficulties of applying PPDM algorithms to a distributed database can be attributed to: first, the data owners have privacy concerns so they may not willing to release their own data for others; second, even if they are willing to share data, the communication cost between the sites is too expensive. 3.2 Randomization method The randomization method [6] provides an effective yet simple way of preventing the user from learning sensitive data, which can be easily implemented at data collection phase for privacy preserving data mining, because the noise added to a given record is independent of the behaviour of other data records. When the randomization Age Sex Blood pressure EC G Maximum heart rate Resul t Male Hyp Healt hy Male Hyp Sick Fema Hyp Healt le hy Fema Nor Sick le mal Male Nor Sick mal Male Nor Healt mal hy method is carried out, the data collection process consists of two steps.the first step is for the data providers to randomize their data and transmit the randomized data to the data receiver. In the second step, the data receiver estimates the original distribution of the data by employing a distribution reconstruction algorithm. The model of randomization is shown in Figure 3.2 Figure 3.2 : The Model Of Randomization 3.3 Random Perturbation Technique Age Sex Blood Pressure ECG This method [7] can deal with character type,boolean type, number types of discrete data and to facilitate conversion of data sets, it is necessary to preprocess the original data set. The data preprocessing is divided into discrete data, attribute coding, data sets coded data set,three parts. A (max) - A (min)/n = length A is continuous attributes, n is the number of discrete, length is the length of the discrete interval. When the interval length is a decimal, round to the nearest integer, the first interval of discrete begin from A(min), the last interval is A(max). In this paper, the attributes of number are seen as continuous attributes, taking Table I as an example, the continuous attributes have age, resting blood pressure and maximum heart rate. TABLE I CARDIOLOGY DATE SET When n is 5, the discrete data sets are shown in Table II. Attribute coding find out different values of each attribute domain by querying the discrete data sets, and then use natural numbers to encode these different attribute values to generate attribute coding sheet. (As shown in Table III, IV) Table II DISCRETE DATA SET Table III ATTRIBUTE DOMAIN CODE Maxi Mum heart rate Result 39 Male 128 Hyp 130 Healthy 60 Male 135 Hyp 170 Sick 58 Female 137 Hyp 147 Healthy 45 Female 142 Normal 163 Sick 62 Male 140 Normal 151 Sick 70 Male 146 Normal 148 Healthy Vol. 1(1), January 2014 (ISSN: ) Page
4 Age Cod Ing Sex Cod ing Blood pressure Cod ing Female Male E ECG Coding Maximum heart rate Coding Result Table V PERTURBATION DATA SET Table IV ATT RIBU TE DOM AIN COD Setting data set into a set of encoded data is to replace the attribute values of discrete data set with the corresponding code according to the attribute table, and then form data set encoding. (As shown in Table V) Apriori algorithms having a two-step process. Coding Hyp Healthy 1 Normal Sick Age Sex Blood ECG Maximum Result heart rate Step 1: To find L k, a set of candidate k item sets is generated by joining L k-1 with itself. This set of candidate is denoted C k. Step 2 (Prune Step ): C k is the superset of L k, that is, its members may or may not be frequent, but all of the frequent k-itemsets are included in C k. A scan of the databases to determine the count of each candidate in C k would result in the determination of L k. (i.e. all candidates having a count no less than the minimum support count are frequent by definition, and therefore belongs to L k ) 4. PROPOSED WORK In this Paper, proposed algorithm named Improved Privacy Preserving Mining (IPPM). The entire system architecture consists of five phases: Proposed algorithm is a good way to apply data mining techniques with security that hides our logical instances from others. 1) Check for Authentication. 2) Reading 3) Association Rule Mining 4) Encoded and decoded the data by using random perturbation technique 5) Perform Pruning. Data mining techniques [4] are used in the discovery of user behavior patterns using several algorithms. Data mining can find interesting valuable patterns or relationships describing the data and predictive or classify the behavior of the model based on available data. In other words. It uses automated tools that employ several methodologies and algorithms to discover mainly hidden patterns, associations, frequent structure from large amounts of data stored in data warehouses or other information repositories and filter necessary information from this big dataset. Telecommunications industry is a typical data intensive industry, competition is also becoming fierce increasingly. Compared with other industries, the telecommunications industry have more crucial personal user s data, which can help people analyze the data accurately and obtain useful knowledge, in order to maintain and win the competition, people should find more interactive business opportunities and provide users with better service with short time duration. As a result, data warehouse and data mining has important value in the telecommunications industry. In this paper, propose an efficient data mining algorithm named Improved Privacy Preserving Mining (IPPM). 4.1 Proposed Method: IPPM There is some terminology which is important for understanding the novel technique. 1) Frequent Pattern- Frequent pattern means the item set which are used by the customer frequently. For example if item I1 is purchased by 10 customers and item I2 is purchased by 5 customers then the item I1 is most frequently used. So the owner must concentrate on I1 Items because it is visited by more no of customers. 2) Minimum support-for Item to be a frequent member we decide a minimum support count by which we will determine that the item is in the list of Frequent Pattern or not. For Example if minimum support is 2 then the item which count or customer visiting no is = or > 2 is the most frequent one, which will be consider for pruning. 3) Data Pruning The act of removing those item set which is not necessary is called data Pruning. Vol. 1(1), January 2014 (ISSN: ) Page
5 Memory (MB) AN IMPROVED PRIVACY PRESERVING ALGORITHM USING ASSOCIATION RULE 4) Encryption/Decryption :-We will provide encryption/ decryption at four level such as transaction,frequent item,association rule,pruning result Working Procedure Our module is divided into two parts. We can login as the normal user or by the Admin. If we enter as the normal user we can sub categorize our model of Improved Privacy Preserving Mining (IPPM) in five phases: 1) Check for Authentication. 2) Reading 3) Association Rule Mining 4) Encoded and decoded the data by using random perturbation technique 5) Perform Pruning. 5. RESULT ANALYSIS The result analysis is based on IPPM and SPADE algorithm. The new method shows in the graph that the time is less in comparison of old methods like spade. So it is more efficient. One taking Spade algorithm and IPPM techniques to analyze several aspects like Memory and computation time. It possibly takes a very long time on large inputs until the program has completed its work and gives a sign of life again. Sometimes it makes sense to be able to estimate the running time before starting a program. Obviously, the running time depends on the number n of the strings to be sorted. If we analyze SPADE (Sequential Pattern Discovery using Equivalence classes) algorithm for discovering the set of all frequent sequences the key features of SPADE algorithm is 1. They use a vertical id-list database format, where they associate with each sequence a list of objects in which it occurs, along with the time-stamps. They show that all frequent sequences can be enumerated via simple temporal joins (or intersections) on id-lists. 2. They use a lattice-theoretic approach to decompose the original search space (lattice) into smaller pieces (sublattices) which can be processed independently in mainmemory. 3. Their approach usually requires three database scans, or only a single scan with some pre-processed information, thus minimizing the I/O costs in comparison of Generalized Sequential Pattern. SPADE not only minimizes I/O costs by reducing database scans, but also minimizes computational costs by using efficient search schemes. The vertical id-list based approach is also insensitive to data-skew. An extensive set of experiments shows that SPADE outperforms previous approaches by a factor of two, and by an order of magnitude if we have some additional off-line information. Furthermore, SPADE scales linearly in the database size, and a number of other database parameters. In spade he main steps include for the computation of the frequent 1-sequences and 2-sequences, the decomposition into prefix-based parent equivalence classes, and the enumeration of all other frequent sequences via BFS or DFS search within each class. In proposed algorithm one only compute pre subset for the computation so one only include on side subset not the whole as well as we not consider the candidate generation. Time efficiency estimates depend on what we define to be a step. For the analysis to correspond usefully to the actual execution time, the time required to perform a step must be guaranteed to be bounded above by a constant. One must be careful here; for instance, some analyses count an addition of two numbers as one step. This assumption may not be warranted in certain contexts. The Graphs show that proposed method is better in comparison to spade. 5.1 Memory Based graph Min Support Figure 5.1 Memory Based graph Above figure shows that proposed algorithm IPPM takes less memory as comparison of Spade algorithm. At the min support 1 Proposed algorithm IPPM requires <= 500 MB memory for storing frequent item set while Spade requires 1000 MB memory because proposed algorithm work on either pre or post basis while Spade work on pre and post both. 5.2 Time Based graph Time (ms) Vol. 1(1), January 2014 (ISSN: ) Page
6 Min Support Figure 5.2 Time Based graph Above figure shows that proposed algorithm IPPM takes less computation time as comparison of Spade algorithm. At the min support 1 Proposed algorithm IPPM requires <= 500 millisecond computation time while Spade requires 1000 millisecond computation time because proposed algorithm work on either pre or post basis while Spade work on pre and post both. 6. CONCLUSION The recent advancement in data mining technology to analyse vast amount of data has played an important role in several areas of Business processing. Data mining also opens new threats to privacy and information security if not done or used properly. The main problem is that from non-sensitive data, one is able to infer sensitive information, including personal information, fact or even patterns which are generated by any algorithm of data mining. In order to focusing on privacy preserving association rule mining, the simplistic solution is presented, which is best in terms of efficiency and performance.because proposed algorithm takes just half computation time and memory in comparison of Spade algorithm. [6] Pingshui WANG, Survey on Privacy Preserving Data Mining, International Journal of Digital Content Technology and its Applications, Vol. 4, No. 9, 2010 [7] Brian, C.S. Loh and Patrick, H.H. Then, Ontology- Enhanced Interactive Anonymization in Domain- Driven Data Mining Outsourcing, IEEE, Second International Symposium on Data, Privacy, and E- Commerce,,2010 [8] Chirag N. Modi, Udai Pratap Rao and Dhiren R. Patel, Maintaining privacy and data quality in privacy preserving association rule mining, IEEE, International Conference on Advances in Communication, Network, and Computing, [9] Wang Yan, Le Jiajin and Huang Dongmei, A Method for Privacy Preserving Mining of Association Rules Based on Web Usage Mining, IEEE,International Conference on Web Information Systems and Mining, Vol.1, pp , FUTURE WORK In future one also include the simulation result which shows proposed method is good than other traditional methods.and one can overcome this limitation by providing one more additional key as for security purpose at time of accessing high confidential data. REFERENCES [1] R. Agrawal and R. Srikant, Fast Algorithms for Mining Association Rules, 20th International Conference on Very Large Data Bases, pp , [2] Vassilios S. Verykios, Elisa Bertino,et al., Stateof-the-art in Privacy Preserving Data Mining, SIGMOD Record, Vol. 33, pp.50-57, March [3] Alan F. Karr, Xiaodong Lin, Ashish P. Sanil and Jerome P. Reiter Privacy-Preserving Analysis of Vertically Partitioned Data Using Secure Matrix Products Journal of Official Statistics, Vol. 25, pp , [4] J. Han and M. Kamber, Data Mining: Concepts and Techniques. [5] Yanguang Shen, Junrui Han and HuiShao, Research on Privacy-Preserving Technology of Data Mining, IEEE, Second International Conference on Intelligent Computation Technology and Automation, Vol. 2, pp , Vol. 1(1), January 2014 (ISSN: ) Page
Mining various patterns in sequential data in an SQL-like manner *
Mining various patterns in sequential data in an SQL-like manner * Marek Wojciechowski Poznan University of Technology, Institute of Computing Science, ul. Piotrowo 3a, 60-965 Poznan, Poland [email protected]
Data Outsourcing based on Secure Association Rule Mining Processes
, pp. 41-48 http://dx.doi.org/10.14257/ijsia.2015.9.3.05 Data Outsourcing based on Secure Association Rule Mining Processes V. Sujatha 1, Debnath Bhattacharyya 2, P. Silpa Chaitanya 3 and Tai-hoon Kim
International Journal of Advanced Computer Technology (IJACT) ISSN:2319-7900 PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS
PRIVACY PRESERVING DATA MINING IN HEALTH CARE APPLICATIONS First A. Dr. D. Aruna Kumari, Ph.d, ; Second B. Ch.Mounika, Student, Department Of ECM, K L University, [email protected]; Third C.
International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014
RESEARCH ARTICLE OPEN ACCESS A Survey of Data Mining: Concepts with Applications and its Future Scope Dr. Zubair Khan 1, Ashish Kumar 2, Sunny Kumar 3 M.Tech Research Scholar 2. Department of Computer
International Journal of Scientific & Engineering Research, Volume 4, Issue 10, October-2013 ISSN 2229-5518 1582
1582 AN EFFICIENT CRYPTOGRAPHIC APPROACH FOR PRESERVING PRIVACY IN DATA MINING T.Sujitha 1, V.Saravanakumar 2, C.Saravanabhavan 3 1. M.E. Student, [email protected] 2. Assistant Professor, [email protected]
Information Security in Big Data using Encryption and Decryption
International Research Journal of Computer Science (IRJCS) ISSN: 2393-9842 Information Security in Big Data using Encryption and Decryption SHASHANK -PG Student II year MCA S.K.Saravanan, Assistant Professor
A Novel Technique of Privacy Protection. Mining of Association Rules from Outsourced. Transaction Databases
A Novel Technique of Privacy Protection Mining of Association Rules from Outsource Transaction Databases 1 Dhananjay D. Wadkar, 2 Santosh N. Shelke 1 Computer Engineering, Sinhgad Academy of Engineering
A Time Efficient Algorithm for Web Log Analysis
A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,
PRIVACY PRESERVING ASSOCIATION RULE MINING
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 3, Issue. 10, October 2014,
How To Use Neural Networks In Data Mining
International Journal of Electronics and Computer Science Engineering 1449 Available Online at www.ijecse.org ISSN- 2277-1956 Neural Networks in Data Mining Priyanka Gaur Department of Information and
Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
131-1. Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10
1/10 131-1 Adding New Level in KDD to Make the Web Usage Mining More Efficient Mohammad Ala a AL_Hamami PHD Student, Lecturer m_ah_1@yahoocom Soukaena Hassan Hashem PHD Student, Lecturer soukaena_hassan@yahoocom
Enhancement of Security in Distributed Data Mining
Enhancement of Security in Distributed Data Mining Sharda Darekar 1, Prof.D.K.Chitre, 2 1,2 Department Of Computer Engineering, Terna Engineering College,Nerul,Navi Mumbai. 1 [email protected],
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree
Predicting the Risk of Heart Attacks using Neural Network and Decision Tree S.Florence 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, K.Malathi 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules
Homomorphic Encryption Schema for Privacy Preserving Mining of Association Rules M.Sangeetha 1, P. Anishprabu 2, S. Shanmathi 3 Department of Computer Science and Engineering SriGuru Institute of Technology
SPATIAL DATA CLASSIFICATION AND DATA MINING
, pp.-40-44. Available online at http://www. bioinfo. in/contents. php?id=42 SPATIAL DATA CLASSIFICATION AND DATA MINING RATHI J.B. * AND PATIL A.D. Department of Computer Science & Engineering, Jawaharlal
Data Mining Project Report. Document Clustering. Meryem Uzun-Per
Data Mining Project Report Document Clustering Meryem Uzun-Per 504112506 Table of Content Table of Content... 2 1. Project Definition... 3 2. Literature Survey... 3 3. Methods... 4 3.1. K-means algorithm...
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan
A Survey on Intrusion Detection System with Data Mining Techniques
A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,
Understanding Web personalization with Web Usage Mining and its Application: Recommender System
Understanding Web personalization with Web Usage Mining and its Application: Recommender System Manoj Swami 1, Prof. Manasi Kulkarni 2 1 M.Tech (Computer-NIMS), VJTI, Mumbai. 2 Department of Computer Technology,
Performing Data Mining in (SRMS) through Vertical Approach with Association Rules
Performing Data Mining in (SRMS) through Vertical Approach with Association Rules Mr. Ambarish S. Durani 1 and Miss. Rashmi B. Sune 2 MTech (III rd Sem), Vidharbha Institute of Technology, Nagpur, Nagpur
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING
EFFICIENT DATA PRE-PROCESSING FOR DATA MINING USING NEURAL NETWORKS JothiKumar.R 1, Sivabalan.R.V 2 1 Research scholar, Noorul Islam University, Nagercoil, India Assistant Professor, Adhiparasakthi College
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, [email protected] D.S. Rajpoot Registrar,
A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING
A COGNITIVE APPROACH IN PATTERN ANALYSIS TOOLS AND TECHNIQUES USING WEB USAGE MINING M.Gnanavel 1 & Dr.E.R.Naganathan 2 1. Research Scholar, SCSVMV University, Kanchipuram,Tamil Nadu,India. 2. Professor
Data Mining Analytics for Business Intelligence and Decision Support
Data Mining Analytics for Business Intelligence and Decision Support Chid Apte, T.J. Watson Research Center, IBM Research Division Knowledge Discovery and Data Mining (KDD) techniques are used for analyzing
Enhanced Boosted Trees Technique for Customer Churn Prediction Model
IOSR Journal of Engineering (IOSRJEN) ISSN (e): 2250-3021, ISSN (p): 2278-8719 Vol. 04, Issue 03 (March. 2014), V5 PP 41-45 www.iosrjen.org Enhanced Boosted Trees Technique for Customer Churn Prediction
A Statistical Text Mining Method for Patent Analysis
A Statistical Text Mining Method for Patent Analysis Department of Statistics Cheongju University, [email protected] Abstract Most text data from diverse document databases are unsuitable for analytical
Use of Data Mining Techniques to Improve the Effectiveness of Sales and Marketing
Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology IJCSMC, Vol. 4, Issue. 4, April 2015,
Customer Classification And Prediction Based On Data Mining Technique
Customer Classification And Prediction Based On Data Mining Technique Ms. Neethu Baby 1, Mrs. Priyanka L.T 2 1 M.E CSE, Sri Shakthi Institute of Engineering and Technology, Coimbatore 2 Assistant Professor
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Prediction of Heart Disease Using Naïve Bayes Algorithm
Prediction of Heart Disease Using Naïve Bayes Algorithm R.Karthiyayini 1, S.Chithaara 2 Assistant Professor, Department of computer Applications, Anna University, BIT campus, Tiruchirapalli, Tamilnadu,
Social Media Mining. Data Mining Essentials
Introduction Data production rate has been increased dramatically (Big Data) and we are able store much more data than before E.g., purchase data, social media data, mobile phone data Businesses and customers
Chapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Proceedings of the 2014 International Conference on Industrial Engineering and Operations Management Bali, Indonesia, January 7 9, 2014 Mobile Phone APP Software Browsing Behavior using Clustering Analysis
Formal Methods for Preserving Privacy for Big Data Extraction Software
Formal Methods for Preserving Privacy for Big Data Extraction Software M. Brian Blake and Iman Saleh Abstract University of Miami, Coral Gables, FL Given the inexpensive nature and increasing availability
DATA MINING TECHNIQUES AND APPLICATIONS
DATA MINING TECHNIQUES AND APPLICATIONS Mrs. Bharati M. Ramageri, Lecturer Modern Institute of Information Technology and Research, Department of Computer Application, Yamunanagar, Nigdi Pune, Maharashtra,
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
PartJoin: An Efficient Storage and Query Execution for Data Warehouses
PartJoin: An Efficient Storage and Query Execution for Data Warehouses Ladjel Bellatreche 1, Michel Schneider 2, Mukesh Mohania 3, and Bharat Bhargava 4 1 IMERIR, Perpignan, FRANCE [email protected] 2
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, [email protected]
Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved
Indian Journal of Science The International Journal for Science ISSN 2319 7730 EISSN 2319 7749 2016 Discovery Publication. All Rights Reserved Perspective Big Data Framework for Healthcare using Hadoop
Selection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
Multi-table Association Rules Hiding
Multi-table Association Rules Hiding Shyue-Liang Wang 1 and Tzung-Pei Hong 2 1 Department of Information Management 2 Department of Computer Science and Information Engineering National University of Kaohsiung
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH
EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH SANGITA GUPTA 1, SUMA. V. 2 1 Jain University, Bangalore 2 Dayanada Sagar Institute, Bangalore, India Abstract- One
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM
ORGANIZATIONAL KNOWLEDGE MAPPING BASED ON LIBRARY INFORMATION SYSTEM IRANDOC CASE STUDY Ammar Jalalimanesh a,*, Elaheh Homayounvala a a Information engineering department, Iranian Research Institute for
Privacy-preserving Analysis Technique for Secure, Cloud-based Big Data Analytics
577 Hitachi Review Vol. 63 (2014),. 9 Featured Articles Privacy-preserving Analysis Technique for Secure, Cloud-based Big Data Analytics Ken Naganuma Masayuki Yoshino, Ph.D. Hisayoshi Sato, Ph.D. Yoshinori
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining
Extend Table Lens for High-Dimensional Data Visualization and Classification Mining CPSC 533c, Information Visualization Course Project, Term 2 2003 Fengdong Du [email protected] University of British Columbia
Standardization and Its Effects on K-Means Clustering Algorithm
Research Journal of Applied Sciences, Engineering and Technology 6(7): 399-3303, 03 ISSN: 040-7459; e-issn: 040-7467 Maxwell Scientific Organization, 03 Submitted: January 3, 03 Accepted: February 5, 03
New Matrix Approach to Improve Apriori Algorithm
New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, [email protected] Associate
A Review of Anomaly Detection Techniques in Network Intrusion Detection System
A Review of Anomaly Detection Techniques in Network Intrusion Detection System Dr.D.V.S.S.Subrahmanyam Professor, Dept. of CSE, Sreyas Institute of Engineering & Technology, Hyderabad, India ABSTRACT:In
Improving Apriori Algorithm to get better performance with Cloud Computing
Improving Apriori Algorithm to get better performance with Cloud Computing Zeba Qureshi 1 ; Sanjay Bansal 2 Affiliation: A.I.T.R, RGPV, India 1, A.I.T.R, RGPV, India 2 ABSTRACT Cloud computing has become
Building A Smart Academic Advising System Using Association Rule Mining
Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 [email protected] Qutaibah Althebyan +962796536277 [email protected] Baraq Ghalib & Mohammed
Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data
Enhance Preprocessing Technique Distinct User Identification using Web Log Usage data Sheetal A. Raiyani 1, Shailendra Jain 2 Dept. of CSE(SS),TIT,Bhopal 1, Dept. of CSE,TIT,Bhopal 2 [email protected]
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India [email protected]
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
SPADE: An Efficient Algorithm for Mining Frequent Sequences
Machine Learning, 42, 31 60, 2001 c 2001 Kluwer Academic Publishers. Manufactured in The Netherlands. SPADE: An Efficient Algorithm for Mining Frequent Sequences MOHAMMED J. ZAKI Computer Science Department,
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE
International Journal of Computer Science and Applications, Vol. 5, No. 4, pp 57-69, 2008 Technomathematics Research Foundation PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE
A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan
, pp.217-222 http://dx.doi.org/10.14257/ijbsbt.2015.7.3.23 A Framework for Data Warehouse Using Data Mining and Knowledge Discovery for a Network of Hospitals in Pakistan Muhammad Arif 1,2, Asad Khatak
DATA MINING - 1DL360
DATA MINING - 1DL360 Fall 2013" An introductory class in data mining http://www.it.uu.se/edu/course/homepage/infoutv/per1ht13 Kjell Orsborn Uppsala Database Laboratory Department of Information Technology,
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Database and Data Mining Security
Database and Data Mining Security 1 Threats/Protections to the System 1. External procedures security clearance of personnel password protection controlling application programs Audit 2. Physical environment
Secure Collaborative Privacy In Cloud Data With Advanced Symmetric Key Block Algorithm
Secure Collaborative Privacy In Cloud Data With Advanced Symmetric Key Block Algorithm Twinkle Graf.F 1, Mrs.Prema.P 2 1 (M.E- CSE, Dhanalakshmi College of Engineering, Chennai, India) 2 (Asst. Professor
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),
DATA MINING TECHNOLOGY. Keywords: data mining, data warehouse, knowledge discovery, OLAP, OLAM.
DATA MINING TECHNOLOGY Georgiana Marin 1 Abstract In terms of data processing, classical statistical models are restrictive; it requires hypotheses, the knowledge and experience of specialists, equations,
Chapter 23. Database Security. Security Issues. Database Security
Chapter 23 Database Security Security Issues Legal and ethical issues Policy issues System-related issues The need to identify multiple security levels 2 Database Security A DBMS typically includes a database
How To Solve The Kd Cup 2010 Challenge
A Lightweight Solution to the Educational Data Mining Challenge Kun Liu Yan Xing Faculty of Automation Guangdong University of Technology Guangzhou, 510090, China [email protected] [email protected]
Introduction. A. Bellaachia Page: 1
Introduction 1. Objectives... 3 2. What is Data Mining?... 4 3. Knowledge Discovery Process... 5 4. KD Process Example... 7 5. Typical Data Mining Architecture... 8 6. Database vs. Data Mining... 9 7.
Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset.
White Paper Using LSI for Implementing Document Management Systems Turning unstructured data from a liability to an asset. Using LSI for Implementing Document Management Systems By Mike Harrison, Director,
Healthcare Measurement Analysis Using Data mining Techniques
www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 03 Issue 07 July, 2014 Page No. 7058-7064 Healthcare Measurement Analysis Using Data mining Techniques 1 Dr.A.Shaik
IMPROVED MASK ALGORITHM FOR MINING PRIVACY PRESERVING ASSOCIATION RULES IN BIG DATA
International Conference on Computer Science, Electronics & Electrical Engineering-0 IMPROVED MASK ALGORITHM FOR MINING PRIVACY PRESERVING ASSOCIATION RULES IN BIG DATA Pavan M N, Manjula G Dept Of ISE,
A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM
A SURVEY ON GENETIC ALGORITHM FOR INTRUSION DETECTION SYSTEM MS. DIMPI K PATEL Department of Computer Science and Engineering, Hasmukh Goswami college of Engineering, Ahmedabad, Gujarat ABSTRACT The Internet
A Study of Data Perturbation Techniques For Privacy Preserving Data Mining
A Study of Data Perturbation Techniques For Privacy Preserving Data Mining Aniket Patel 1, HirvaDivecha 2 Assistant Professor Department of Computer Engineering U V Patel College of Engineering Kherva-Mehsana,
KNOWLEDGE DISCOVERY FOR SUPPLY CHAIN MANAGEMENT SYSTEMS: A SCHEMA COMPOSITION APPROACH
KNOWLEDGE DISCOVERY FOR SUPPLY CHAIN MANAGEMENT SYSTEMS: A SCHEMA COMPOSITION APPROACH Shi-Ming Huang and Tsuei-Chun Hu* Department of Accounting and Information Technology *Department of Information Management
Keywords: Mobility Prediction, Location Prediction, Data Mining etc
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Data Mining Approach
Database security. André Zúquete Security 1. Advantages of using databases. Shared access Many users use one common, centralized data set
Database security André Zúquete Security 1 Advantages of using databases Shared access Many users use one common, centralized data set Minimal redundancy Individual users do not have to collect and maintain
Database Marketing, Business Intelligence and Knowledge Discovery
Database Marketing, Business Intelligence and Knowledge Discovery Note: Using material from Tan / Steinbach / Kumar (2005) Introduction to Data Mining,, Addison Wesley; and Cios / Pedrycz / Swiniarski
Enhanced data mining analysis in higher educational system using rough set theory
African Journal of Mathematics and Computer Science Research Vol. 2(9), pp. 184-188, October, 2009 Available online at http://www.academicjournals.org/ajmcsr ISSN 2006-9731 2009 Academic Journals Review
An Overview of Knowledge Discovery Database and Data mining Techniques
An Overview of Knowledge Discovery Database and Data mining Techniques Priyadharsini.C 1, Dr. Antony Selvadoss Thanamani 2 M.Phil, Department of Computer Science, NGM College, Pollachi, Coimbatore, Tamilnadu,
COURSE RECOMMENDER SYSTEM IN E-LEARNING
International Journal of Computer Science and Communication Vol. 3, No. 1, January-June 2012, pp. 159-164 COURSE RECOMMENDER SYSTEM IN E-LEARNING Sunita B Aher 1, Lobo L.M.R.J. 2 1 M.E. (CSE)-II, Walchand
Implementation of Data Mining Techniques to Perform Market Analysis
Implementation of Data Mining Techniques to Perform Market Analysis B.Sabitha 1, N.G.Bhuvaneswari Amma 2, G.Annapoorani 3, P.Balasubramanian 4 PG Scholar, Indian Institute of Information Technology, Srirangam,
Pattern-Aided Regression Modelling and Prediction Model Analysis
San Jose State University SJSU ScholarWorks Master's Projects Master's Theses and Graduate Research Fall 2015 Pattern-Aided Regression Modelling and Prediction Model Analysis Naresh Avva Follow this and
