Distributed Data Mining Algorithm Parallelization
|
|
|
- Aldous Oliver
- 9 years ago
- Views:
Transcription
1 Distributed Data Mining Algorithm Parallelization B.Tech Project Report By: Rishi Kumar Singh (Y6389) Abhishek Ranjan (10030) Project Guide: Prof. Satyadev Nandakumar Department of Computer Science and Engineering IIT Kanpur
2 Acknowledgements We would like to thank our project guide Prof. Satyadev Nandakumar for guiding us throughout the project tenure and giving necessary advice and instructions in this project. He has been truly supporting and helped us whenever we were in need. We thank him for allotting some precious time from his busy schedule without which it would be very hard for us to proceed in the project.
3 Abstract Data mining is processing of large amount of data and extracting useful information from it. The goal of the project is to use Erlang to parallelize and make the work simpler using distributed systems. To do so, we take an Association Rule Mining algorithm, Apriori, formulate its distributed version and implement them using erlang. We then check the code by running it on multiple machines setup with several nodes on each, in parallel using message passing from erlang. This algorithm is very easily coded in Erlang using very minimal number of lines of coding. Seeing this, we are encouraged to use Erlang to parallelize many other similar data mining algorithms. Introduction Data Mining and Association Rule Mining Data Mining is the computational process of discovering useful patterns in large data sets. These data patterns can be then transformed to some understandable form for analyzing or other further uses. Association Rule Mining is a type of Data Mining where specific relations between the items of large data sets are found and strong rules are formulated connecting some set of items to another. This method is widely used in super markets where they have large number of items and large data of the transactions made by many customers. The mining algorithms can be used in this case to find which the item sets are likely to be bought with some other sets of items. Erlang and Parallelization Data Mining algorithms are very important in the present world and are widely used. The main problem associated with it is the large data sets on which the algorithms are to be run. The solution to this is parallelization of the algorithms. This would increase the efficiency of the algorithms drastically in terms of time and memory. We can use a dynamically typed functional language, Erlang for parallelization of those algorithms. Erlang supports lightweight threads and message passing model of concurrency, which is what makes it suitable for the purpose.
4 Erlang Programming Language The efficiency in Erlang programming language is achieved by two methods. One is Concurrent Programming and the other is Distributed programming. Concurrent programming will make the algorithm time efficient and Distributed programming will make the algorithm utilize maximum of the resources in hand from all the connected systems. In Erlang, concurrency is achieved by Processes and Message Passing and distribution is achieved by creating Erlang Nodes. Erlang Processes Concurrency in erlang is done by message passing between processes for communication. Erlang processes are lightweight threads. There is inbuilt function spawn define in Erlang for creating process: spawn (Module_Name, Function_Name, Argument_List) We can create a process from the current node to another node by using one extra argument in spawn function: spawn (Node_Name, Module_Name, Function_Name, Argument_List) spawn returns the process identifier (pid) of the created process. Pids are used for communication with a process. Message Passing in Erlang Communication between processes is done by message passing in erlang. Receiver_Pid! Message! is the operator used for sending messages. Erlang uses asynchronous message passing mechanism, means sender will not wait for the acknowledgment from the receiver. Every process have a mailbox which they used for storing received messages. They used pattern matching for selecting messages from the mailbox. receive end pattern1 -> task1;... patternn -> taskn We can also register a process by a name and used that name instead of pid for communication.
5 Erlang Node Distributed programming in erlang is done by using erlang nodes. Erlang processes run on these nodes. We can create nodes on same host or different host. The syntax for this is erl -name -setcookie cookie_name host_name is the IP address of the machine or machine name. Cookies are used to handle the security issues. If any node want to connect to a particular node then it must use the same cookie. We can use simpler version of above syntax for creating nodes on same machine. The syntax is erl -sname node_name Security is not the problem here because nodes are created on the same machine. Association Rule Association rule mining is used to capture all possible rules that can explain the presence of some sets of data items in the presence of some other set of data items. Association Rules are of the form: {Ia1, Ia2,., Ian} {Ib1, Ib2,.., Ibm} This says that whenever the set of items on left hand side occur in a transaction, the second itemset will also occur. It is highly unlikely to find such association of two sets of data for every transaction. So we use thresholds to find such strong association rules which hold in at least the minimum threshold number of transactions. For this purpose we assign two terms, Support and Confidence. Both of these terms have a threshold, support threshold and confidence threshold. Support is defined as ratio of occurrence of a set of items to the total number of transactions. Confidence for a rule, A B, is defined as ratio of occurrence of B whenever A occurs to the total occurrence of A. Owing to these terms, Frequent itemset can be defined as an itemset whose support is more than or equal to the support threshold. Also, a strong association rule is said to be an association rule whose confidence is more than or equal to the confidence threshold. Finally, the mining of association rules is done in two steps: Finding Frequent Itemsets Generating Strong Association Rules
6 The step of finding frequent itemsets is heavy in terms of time consumption as it involves scanning of large data sets. Various algorithms are present to find frequent itemsets. One of the basic and efficient algorithm is Apriori Algorithm. This is the algorithm we are trying to implement concurrently in this project. Apriori Algorithm Apriori is an algorithm to find frequent itemsets from a given set of transactions present in large data sets. It approach is to find frequency of sets of items and then pruning the list by using the given threshold value. It finds the frequency of all 1- item sets and then prune it, then 2-item sets and then prune it and continue this till no further n-item sets can be formed or the size of item sets have reached the total number of distinct items in the data set. To form (k+1)-item set, it uses frequent k- item set. The algorithm uses the fact that all subsets of a frequent itemset are also frequent. Finally all the frequent n-itemsets are combined to form all the frequent itemsets which will be the output of the algorithm. Conventional Apriori Algorithm The Apriori Algorithm works is a Candidate-generation-and-test paradigm. Its working principle is: If an itemset is frequent, then all its subsets are also frequent. This can also be stated as: If an itemset is not frequent, all of its supersets are also not frequent. The Algorithm generates candidate itemsets in increasing order of length. It then prunes the candidate itemset using the given support threshold to for the frequent itemset. Then it uses all frequent itemsets of a particular length to form candidate itemsets having length one more. The algorithm terminates when there is no more candidate itemset or the length of candidate itemsets reach the total number of items in the dataset. Stepwise Algorithm Initialization o 1-Candidate <- generate 1-item candidate set by finding frequency of each item. o call Gen_freq_set( 1-Candidate) and store it as 1-Frequent Gen_freq_set( k-candidate) o If k-candidate is empty or k equals number of items in the data set -> terminate
7 o k-frequent <- prune k-candidate using support threshold o Store the k-frequent Set o call Gen_cand_set( k-frequent) Gen_cand_set( k-frequent) o Join two candidates whose k-1 items are common o (k+1)-candidate <- merge two k-frequent such that k-1 items in both are same o Create subsets of (k+1)-candidate of size k and check if they are frequent. If true then keep the (k+1)-itemset in the candidate set or else discard it o call Gen_freq_set( (k+1)-candidate) Combine all the Frequent sets and return it. Parallelized Apriori Algorithm To parallelize the above algorithm and running it on distributed systems, we have to use approaches which includes division of large data sets in smaller chunks and multithreaded memory-sharing parallel algorithm. The process of dividing the data set and combining it again is repeatedly done for finding each k-item frequent set. For this purpose, many nodes are created on different systems which would carry out the operations of generating and candidate itemsets for the part of data set given to each one of them. There is also a main node which is there to communicate with all its child nodes through message passing. The function of the main node is to divide the large data set into smaller chunks and send it to all the computing nodes. The main node also collects candidate itemsets found from all the computing nodes, combine them and perform pruning to form the frequent itemsets. It then again sends this frequent itemset to each of the computing nodes to form candidate itemsets of size one higher. The process terminates with the termination of the main node which terminates after outputting the final frequent itemsets.
8 Stepwise Algorithm Create Main node and N Computing nodes Task of Main Node o Generation of 1-item frequent set Divide the data set in N parts and send it to all computing nodes Wait for receiving 1-item Candidate sets as messages from each computing node Combine all the small candidate sets to form a new candidate set -> 1-Candidate Call Gen_freq_set( 1-Candidate) Send 1-Frequent set as message to each computing node for finding small 2-item candidate sets. o Gen_freq_set( k-candidate) If k-candidate is empty or k equals number of items in the data set -> terminate k-frequent <- prune k-candidate using support threshold Store and return the k-frequent set o Collect_cand_set( ) Wait for receiving k-item Candidate set as messages from each computing node Combine all the small candidate sets to form a new candidate set -> k-candidate Call Gen_freq_set( k-candidate) If Gen_freq_set terminates Combine all the Frequent itemsets and return it Send terminate as message to each computing node terminate Else send the k-frequent set as message to each computing node for finding small (k+1)-item candidate sets Call Collect_cand_set( )
9 Task of Each Computing Node o Generation of 1-item candidate set Wait for the small data set as message from the Main node Count the occurrence of each item in the data set and store it as 1-item candidate set Send the 1-item candidate set to the main node for combining. o Gen_cand_set( ) Wait for terminate as message from Main node Terminate this node Wait for the k-frequent set as message from the Main node Join two candidates whose k-1 items are common (k+1)-candidate <- merge two k-frequent such that k-1 items in both are same Create subsets of (k+1)-candidate of size k and check if they are frequent If true then keep the (k+1)-itemset in the candidate set or else discard it Send the (k+1)-candidate set as message to the Main node for combining. Call Gen_cand_set( ) We have implement the first part of the algorithm that is to generate 1-item Frequent set. With this, the main frame of the algorithm is coded. The second part will require making the already coded processes to run recursively.
10 Observation We created varying number of nodes on one system, ran a particular Data set (no. of transactions = 100,000) for each case and recorded the time taken in each case to find 1-item Frequent set. The readings are as shown below No. of Nodes Execution Time (ms) The table shows that the execution time decreases with increase in the number of node. But for much higher number of nodes, the execution time starts increasing with increase in the number of nodes. This happens as we are creating all the nodes on the same system, so, the creation takes time. Also, more number of nodes means division of data in more parts and collection of data from more parts before proceeding. This also increases the overall execution time considerably. Conclusion Observing the data from the above table, we can say that distributed programming is a very powerful feature of Erlang which can be used to significantly improve the efficiency of some Data Mining algorithm which uses large data sets. Erlang is a very simple functional language with small number of concepts. Yet, it can be used to code very complex algorithm in very few number of lines compared to other programming language. Apart from data mining algorithms, it can also be used to program softwares which can use its concurrency feature to improve performance.
11 References: A parallel Association Rule Mining algorithm by Zhi-gang Wang and Chi-She Wang. In: Web information System and Mining Lecture Notes in Computer Science Volume 7529, 2012, pp Concurrent Programming in Erlang by Joe Armstrong, Robert Virding, Claes Wikstrom and Mike Williams Data Mining and Analysis: Fundamental Concepts and Algorithms by Mohammed J. Zaki and Wagner Meira Jr.
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India [email protected]
Building A Smart Academic Advising System Using Association Rule Mining
Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 [email protected] Qutaibah Althebyan +962796536277 [email protected] Baraq Ghalib & Mohammed
Distributed Apriori in Hadoop MapReduce Framework
Distributed Apriori in Hadoop MapReduce Framework By Shulei Zhao (sz2352) and Rongxin Du (rd2537) Individual Contribution: Shulei Zhao: Implements centralized Apriori algorithm and input preprocessing
Project Report. 1. Application Scenario
Project Report In this report, we briefly introduce the application scenario of association rule mining, give details of apriori algorithm implementation and comment on the mined rules. Also some instructions
New Matrix Approach to Improve Apriori Algorithm
New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, [email protected] Associate
Improving Apriori Algorithm to get better performance with Cloud Computing
Improving Apriori Algorithm to get better performance with Cloud Computing Zeba Qureshi 1 ; Sanjay Bansal 2 Affiliation: A.I.T.R, RGPV, India 1, A.I.T.R, RGPV, India 2 ABSTRACT Cloud computing has become
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan
A Time Efficient Algorithm for Web Log Analysis
A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,
Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm
Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical
Data Mining Apriori Algorithm
10 Data Mining Apriori Algorithm Apriori principle Frequent itemsets generation Association rules generation Section 6 of course book TNM033: Introduction to Data Mining 1 Association Rule Mining (ARM)
Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
Discovery of Maximal Frequent Item Sets using Subset Creation
Discovery of Maximal Frequent Item Sets using Subset Creation Jnanamurthy HK, Vishesh HV, Vishruth Jain, Preetham Kumar, Radhika M. Pai Department of Information and Communication Technology Manipal Institute
Data Mining: Foundation, Techniques and Applications
Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing
Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis
, 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,
Classification and Prediction
Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser
Mining Interesting Medical Knowledge from Big Data
IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 06-10 www.iosrjournals.org Mining Interesting Medical Knowledge from
Mining an Online Auctions Data Warehouse
Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance
Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar
Data Mining: Association Analysis Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of
Data Mining with R. Decision Trees and Random Forests. Hugh Murrell
Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge
ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR
ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR By: Dmitri Ilkaev, Stephen Pearson Abstract: In this paper we analyze the concept of grid programming as it applies to
A Fraud Detection Approach in Telecommunication using Cluster GA
A Fraud Detection Approach in Telecommunication using Cluster GA V.Umayaparvathi Dept of Computer Science, DDE, MKU Dr.K.Iyakutti CSIR Emeritus Scientist, School of Physics, MKU Abstract: In trend mobile
Databases - Data Mining. (GF Royle, N Spadaccini 2006-2010) Databases - Data Mining 1 / 25
Databases - Data Mining (GF Royle, N Spadaccini 2006-2010) Databases - Data Mining 1 / 25 This lecture This lecture introduces data-mining through market-basket analysis. (GF Royle, N Spadaccini 2006-2010)
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains
A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, [email protected] D.S. Rajpoot Registrar,
Selection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
Binary Coded Web Access Pattern Tree in Education Domain
Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi
ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL
International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR
Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets
Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets Pramod S. Reader, Information Technology, M.P.Christian College of Engineering, Bhilai,C.G. INDIA.
Association Rule Mining
Association Rule Mining Association Rules and Frequent Patterns Frequent Pattern Mining Algorithms Apriori FP-growth Correlation Analysis Constraint-based Mining Using Frequent Patterns for Classification
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, [email protected]
Data Mining Applications in Manufacturing
Data Mining Applications in Manufacturing Dr Jenny Harding Senior Lecturer Wolfson School of Mechanical & Manufacturing Engineering, Loughborough University Identification of Knowledge - Context Intelligent
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm
Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),
Keywords: Mobility Prediction, Location Prediction, Data Mining etc
Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Data Mining Approach
Clinic + - A Clinical Decision Support System Using Association Rule Mining
Clinic + - A Clinical Decision Support System Using Association Rule Mining Sangeetha Santhosh, Mercelin Francis M.Tech Student, Dept. of CSE., Marian Engineering College, Kerala University, Trivandrum,
Data Mining Approach in Security Information and Event Management
Data Mining Approach in Security Information and Event Management Anita Rajendra Zope, Amarsinh Vidhate, and Naresh Harale Abstract This paper gives an overview of data mining field & security information
COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction
COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised
Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1
Chapter 4 Data Mining A Short Introduction 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining
A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs
A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs Risto Vaarandi Department of Computer Engineering, Tallinn University of Technology, Estonia [email protected] Abstract. Today,
COMBINED METHODOLOGY of the CLASSIFICATION RULES for MEDICAL DATA-SETS
COMBINED METHODOLOGY of the CLASSIFICATION RULES for MEDICAL DATA-SETS V.Sneha Latha#, P.Y.L.Swetha#, M.Bhavya#, G. Geetha#, D. K.Suhasini# # Dept. of Computer Science& Engineering K.L.C.E, GreenFields-522502,
Classification of IDS Alerts with Data Mining Techniques
International Journal of Electronic Commerce Studies Vol.5, No.1, pp.1-6, 2014 Classification of IDS Alerts with Data Mining Techniques Hany Nashat Gabra Computer and Systems Engineering Department, Ain
Analysis of Customer Behavior using Clustering and Association Rules
Analysis of Customer Behavior using Clustering and Association Rules P.Isakki alias Devi, Research Scholar, Vels University,Chennai 117, Tamilnadu, India. S.P.Rajagopalan Professor of Computer Science
Data Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
A Data Mining Tutorial
A Data Mining Tutorial Presented at the Second IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 98) 14 December 1998 Graham Williams, Markus Hegland and Stephen
Using Data Mining Methods to Predict Personally Identifiable Information in Emails
Using Data Mining Methods to Predict Personally Identifiable Information in Emails Liqiang Geng 1, Larry Korba 1, Xin Wang, Yunli Wang 1, Hongyu Liu 1, Yonghua You 1 1 Institute of Information Technology,
Distributed Systems / Middleware Distributed Programming in Erlang
Distributed Systems / Middleware Distributed Programming in Erlang Alessandro Sivieri Dipartimento di Elettronica e Informazione Politecnico, Italy [email protected] http://corsi.dei.polimi.it/distsys
Static Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster
Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster Sudhakar Singh Dept. of Computer Science Faculty of Science Banaras Hindu University Rakhi Garg Dept. of Computer
CHAPTER 1 INTRODUCTION
1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.
Case study: d60 Raptor smartadvisor. Jan Neerbek Alexandra Institute
Case study: d60 Raptor smartadvisor Jan Neerbek Alexandra Institute Agenda d60: A cloud/data mining case Cloud Data Mining Market Basket Analysis Large data sets Our solution 2 Alexandra Institute The
A hybrid algorithm combining weighted and hasht apriori algorithms in Map Reduce model using Eucalyptus cloud platform
A hybrid algorithm combining weighted and hasht apriori algorithms in Map Reduce model using Eucalyptus cloud platform 1 R. SUMITHRA, 2 SUJNI PAUL AND 3 D. PONMARY PUSHPA LATHA 1 School of Computer Science,
IJRFM Volume 2, Issue 1 (January 2012) (ISSN 2231-5985)
ASSOCIATION MODELS FOR MARKET BASKET ANALYSIS, CUSTOMER BEHAVIOUR ANALYSIS AND BUSINESS INTELLIGENCE SOLUTION EMBEDDED WITH ARIORI CONCEPT J.M. Lakshmi Mahesh* ABSTRACT This paper analyzes the customer
A Survey on Association Rule Mining in Market Basket Analysis
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey
The Fuzzy Frequent Pattern Tree
The Fuzzy Frequent Pattern Tree STERGIOS PAPADIMITRIOU 1 SEFERINA MAVROUDI 2 1. Department of Information Management, Technological Educational Institute of Kavala, 65404 Kavala, Greece 2. Pattern Recognition
Web Document Clustering
Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,
Code and Process Migration! Motivation!
Code and Process Migration! Motivation How does migration occur? Resource migration Agent-based system Details of process migration Lecture 6, page 1 Motivation! Key reasons: performance and flexibility
So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)
Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we
Intelligent Log Analyzer. André Restivo <[email protected]>
Intelligent Log Analyzer André Restivo 9th January 2003 Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines.
AN APPLICATION OF INFORMATION RETRIEVAL IN P2P NETWORKS USING SOCKETS AND METADATA
AN APPLICATION OF INFORMATION RETRIEVAL IN P2P NETWORKS USING SOCKETS AND METADATA Ms. M. Kiruthika Asst. Professor, Fr.C.R.I.T, Vashi, Navi Mumbai. [email protected] Ms. Smita Dange Lecturer,
KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: [email protected]
KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: [email protected] Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:
Searching frequent itemsets by clustering data
Towards a parallel approach using MapReduce Maria Malek Hubert Kadima LARIS-EISTI Ave du Parc, 95011 Cergy-Pontoise, FRANCE [email protected], [email protected] 1 Introduction and Related Work
Dual Mechanism to Detect DDOS Attack Priyanka Dembla, Chander Diwaker 2 1 Research Scholar, 2 Assistant Professor
International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Engineering, Business and Enterprise
Data Mining to Recognize Fail Parts in Manufacturing Process
122 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.7, NO.2 August 2009 Data Mining to Recognize Fail Parts in Manufacturing Process Wanida Kanarkard 1, Danaipong Chetchotsak
Implementing Graph Pattern Mining for Big Data in the Cloud
Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya [email protected]
MINING ASSOCIATION RULES FROM LARGE DATA BASES- A REVIEW
MINING ASSOCIATION RULES FROM LARGE DATA BASES- A REVIEW R.Priya 1, Ananthi Sheshasaayee 2 1 Research Scholar, Asst. Prof, M.C.A. Dept, VELS University, Chennai, India 2 Associate Professor and Head, Dept
Map-Reduce for Machine Learning on Multicore
Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,
Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm
Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm 1 Sanjeev Rao, 2 Priyanka Gupta 1,2 Dept. of CSE, RIMT-MAEC, Mandi Gobindgarh, Punjab, india Abstract In this paper we
RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases
998 JOURNAL OF SOFTWARE, VOL. 5, NO. 9, SEPTEMBER 2010 RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases Abdallah Alashqur Faculty of Information Technology Applied Science University
Chapter 13: Query Processing. Basic Steps in Query Processing
Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing
Association Analysis: Basic Concepts and Algorithms
6 Association Analysis: Basic Concepts and Algorithms Many business enterprises accumulate large quantities of data from their dayto-day operations. For example, huge amounts of customer purchase data
Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest
Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest 1. Introduction Few years ago, parallel computers could
RESEARCH PAPER International Journal of Recent Trends in Engineering, Vol 1, No. 1, May 2009
An Algorithm for Dynamic Load Balancing in Distributed Systems with Multiple Supporting Nodes by Exploiting the Interrupt Service Parveen Jain 1, Daya Gupta 2 1,2 Delhi College of Engineering, New Delhi,
The basic data mining algorithms introduced may be enhanced in a number of ways.
DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,
DATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
Professor Anita Wasilewska. Classification Lecture Notes
Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,
Scala Actors Library. Robert Hilbrich
Scala Actors Library Robert Hilbrich Foreword and Disclaimer I am not going to teach you Scala. However, I want to: Introduce a library Explain what I use it for My Goal is to: Give you a basic idea about
Benchmarking Hadoop & HBase on Violin
Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages
Chapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
Data Mining for Knowledge Management. Classification
1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh
Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE
International Journal of Computer Science and Applications, Vol. 5, No. 4, pp 57-69, 2008 Technomathematics Research Foundation PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE
Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification
Introduction Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Advanced Topics in Software Engineering 1 Concurrent Programs Characterized by
International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET
DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand
A Survey on Intrusion Detection System with Data Mining Techniques
A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,
