Distributed Data Mining Algorithm Parallelization

Size: px
Start display at page:

Download "Distributed Data Mining Algorithm Parallelization"

Transcription

1 Distributed Data Mining Algorithm Parallelization B.Tech Project Report By: Rishi Kumar Singh (Y6389) Abhishek Ranjan (10030) Project Guide: Prof. Satyadev Nandakumar Department of Computer Science and Engineering IIT Kanpur

2 Acknowledgements We would like to thank our project guide Prof. Satyadev Nandakumar for guiding us throughout the project tenure and giving necessary advice and instructions in this project. He has been truly supporting and helped us whenever we were in need. We thank him for allotting some precious time from his busy schedule without which it would be very hard for us to proceed in the project.

3 Abstract Data mining is processing of large amount of data and extracting useful information from it. The goal of the project is to use Erlang to parallelize and make the work simpler using distributed systems. To do so, we take an Association Rule Mining algorithm, Apriori, formulate its distributed version and implement them using erlang. We then check the code by running it on multiple machines setup with several nodes on each, in parallel using message passing from erlang. This algorithm is very easily coded in Erlang using very minimal number of lines of coding. Seeing this, we are encouraged to use Erlang to parallelize many other similar data mining algorithms. Introduction Data Mining and Association Rule Mining Data Mining is the computational process of discovering useful patterns in large data sets. These data patterns can be then transformed to some understandable form for analyzing or other further uses. Association Rule Mining is a type of Data Mining where specific relations between the items of large data sets are found and strong rules are formulated connecting some set of items to another. This method is widely used in super markets where they have large number of items and large data of the transactions made by many customers. The mining algorithms can be used in this case to find which the item sets are likely to be bought with some other sets of items. Erlang and Parallelization Data Mining algorithms are very important in the present world and are widely used. The main problem associated with it is the large data sets on which the algorithms are to be run. The solution to this is parallelization of the algorithms. This would increase the efficiency of the algorithms drastically in terms of time and memory. We can use a dynamically typed functional language, Erlang for parallelization of those algorithms. Erlang supports lightweight threads and message passing model of concurrency, which is what makes it suitable for the purpose.

4 Erlang Programming Language The efficiency in Erlang programming language is achieved by two methods. One is Concurrent Programming and the other is Distributed programming. Concurrent programming will make the algorithm time efficient and Distributed programming will make the algorithm utilize maximum of the resources in hand from all the connected systems. In Erlang, concurrency is achieved by Processes and Message Passing and distribution is achieved by creating Erlang Nodes. Erlang Processes Concurrency in erlang is done by message passing between processes for communication. Erlang processes are lightweight threads. There is inbuilt function spawn define in Erlang for creating process: spawn (Module_Name, Function_Name, Argument_List) We can create a process from the current node to another node by using one extra argument in spawn function: spawn (Node_Name, Module_Name, Function_Name, Argument_List) spawn returns the process identifier (pid) of the created process. Pids are used for communication with a process. Message Passing in Erlang Communication between processes is done by message passing in erlang. Receiver_Pid! Message! is the operator used for sending messages. Erlang uses asynchronous message passing mechanism, means sender will not wait for the acknowledgment from the receiver. Every process have a mailbox which they used for storing received messages. They used pattern matching for selecting messages from the mailbox. receive end pattern1 -> task1;... patternn -> taskn We can also register a process by a name and used that name instead of pid for communication.

5 Erlang Node Distributed programming in erlang is done by using erlang nodes. Erlang processes run on these nodes. We can create nodes on same host or different host. The syntax for this is erl -name -setcookie cookie_name host_name is the IP address of the machine or machine name. Cookies are used to handle the security issues. If any node want to connect to a particular node then it must use the same cookie. We can use simpler version of above syntax for creating nodes on same machine. The syntax is erl -sname node_name Security is not the problem here because nodes are created on the same machine. Association Rule Association rule mining is used to capture all possible rules that can explain the presence of some sets of data items in the presence of some other set of data items. Association Rules are of the form: {Ia1, Ia2,., Ian} {Ib1, Ib2,.., Ibm} This says that whenever the set of items on left hand side occur in a transaction, the second itemset will also occur. It is highly unlikely to find such association of two sets of data for every transaction. So we use thresholds to find such strong association rules which hold in at least the minimum threshold number of transactions. For this purpose we assign two terms, Support and Confidence. Both of these terms have a threshold, support threshold and confidence threshold. Support is defined as ratio of occurrence of a set of items to the total number of transactions. Confidence for a rule, A B, is defined as ratio of occurrence of B whenever A occurs to the total occurrence of A. Owing to these terms, Frequent itemset can be defined as an itemset whose support is more than or equal to the support threshold. Also, a strong association rule is said to be an association rule whose confidence is more than or equal to the confidence threshold. Finally, the mining of association rules is done in two steps: Finding Frequent Itemsets Generating Strong Association Rules

6 The step of finding frequent itemsets is heavy in terms of time consumption as it involves scanning of large data sets. Various algorithms are present to find frequent itemsets. One of the basic and efficient algorithm is Apriori Algorithm. This is the algorithm we are trying to implement concurrently in this project. Apriori Algorithm Apriori is an algorithm to find frequent itemsets from a given set of transactions present in large data sets. It approach is to find frequency of sets of items and then pruning the list by using the given threshold value. It finds the frequency of all 1- item sets and then prune it, then 2-item sets and then prune it and continue this till no further n-item sets can be formed or the size of item sets have reached the total number of distinct items in the data set. To form (k+1)-item set, it uses frequent k- item set. The algorithm uses the fact that all subsets of a frequent itemset are also frequent. Finally all the frequent n-itemsets are combined to form all the frequent itemsets which will be the output of the algorithm. Conventional Apriori Algorithm The Apriori Algorithm works is a Candidate-generation-and-test paradigm. Its working principle is: If an itemset is frequent, then all its subsets are also frequent. This can also be stated as: If an itemset is not frequent, all of its supersets are also not frequent. The Algorithm generates candidate itemsets in increasing order of length. It then prunes the candidate itemset using the given support threshold to for the frequent itemset. Then it uses all frequent itemsets of a particular length to form candidate itemsets having length one more. The algorithm terminates when there is no more candidate itemset or the length of candidate itemsets reach the total number of items in the dataset. Stepwise Algorithm Initialization o 1-Candidate <- generate 1-item candidate set by finding frequency of each item. o call Gen_freq_set( 1-Candidate) and store it as 1-Frequent Gen_freq_set( k-candidate) o If k-candidate is empty or k equals number of items in the data set -> terminate

7 o k-frequent <- prune k-candidate using support threshold o Store the k-frequent Set o call Gen_cand_set( k-frequent) Gen_cand_set( k-frequent) o Join two candidates whose k-1 items are common o (k+1)-candidate <- merge two k-frequent such that k-1 items in both are same o Create subsets of (k+1)-candidate of size k and check if they are frequent. If true then keep the (k+1)-itemset in the candidate set or else discard it o call Gen_freq_set( (k+1)-candidate) Combine all the Frequent sets and return it. Parallelized Apriori Algorithm To parallelize the above algorithm and running it on distributed systems, we have to use approaches which includes division of large data sets in smaller chunks and multithreaded memory-sharing parallel algorithm. The process of dividing the data set and combining it again is repeatedly done for finding each k-item frequent set. For this purpose, many nodes are created on different systems which would carry out the operations of generating and candidate itemsets for the part of data set given to each one of them. There is also a main node which is there to communicate with all its child nodes through message passing. The function of the main node is to divide the large data set into smaller chunks and send it to all the computing nodes. The main node also collects candidate itemsets found from all the computing nodes, combine them and perform pruning to form the frequent itemsets. It then again sends this frequent itemset to each of the computing nodes to form candidate itemsets of size one higher. The process terminates with the termination of the main node which terminates after outputting the final frequent itemsets.

8 Stepwise Algorithm Create Main node and N Computing nodes Task of Main Node o Generation of 1-item frequent set Divide the data set in N parts and send it to all computing nodes Wait for receiving 1-item Candidate sets as messages from each computing node Combine all the small candidate sets to form a new candidate set -> 1-Candidate Call Gen_freq_set( 1-Candidate) Send 1-Frequent set as message to each computing node for finding small 2-item candidate sets. o Gen_freq_set( k-candidate) If k-candidate is empty or k equals number of items in the data set -> terminate k-frequent <- prune k-candidate using support threshold Store and return the k-frequent set o Collect_cand_set( ) Wait for receiving k-item Candidate set as messages from each computing node Combine all the small candidate sets to form a new candidate set -> k-candidate Call Gen_freq_set( k-candidate) If Gen_freq_set terminates Combine all the Frequent itemsets and return it Send terminate as message to each computing node terminate Else send the k-frequent set as message to each computing node for finding small (k+1)-item candidate sets Call Collect_cand_set( )

9 Task of Each Computing Node o Generation of 1-item candidate set Wait for the small data set as message from the Main node Count the occurrence of each item in the data set and store it as 1-item candidate set Send the 1-item candidate set to the main node for combining. o Gen_cand_set( ) Wait for terminate as message from Main node Terminate this node Wait for the k-frequent set as message from the Main node Join two candidates whose k-1 items are common (k+1)-candidate <- merge two k-frequent such that k-1 items in both are same Create subsets of (k+1)-candidate of size k and check if they are frequent If true then keep the (k+1)-itemset in the candidate set or else discard it Send the (k+1)-candidate set as message to the Main node for combining. Call Gen_cand_set( ) We have implement the first part of the algorithm that is to generate 1-item Frequent set. With this, the main frame of the algorithm is coded. The second part will require making the already coded processes to run recursively.

10 Observation We created varying number of nodes on one system, ran a particular Data set (no. of transactions = 100,000) for each case and recorded the time taken in each case to find 1-item Frequent set. The readings are as shown below No. of Nodes Execution Time (ms) The table shows that the execution time decreases with increase in the number of node. But for much higher number of nodes, the execution time starts increasing with increase in the number of nodes. This happens as we are creating all the nodes on the same system, so, the creation takes time. Also, more number of nodes means division of data in more parts and collection of data from more parts before proceeding. This also increases the overall execution time considerably. Conclusion Observing the data from the above table, we can say that distributed programming is a very powerful feature of Erlang which can be used to significantly improve the efficiency of some Data Mining algorithm which uses large data sets. Erlang is a very simple functional language with small number of concepts. Yet, it can be used to code very complex algorithm in very few number of lines compared to other programming language. Apart from data mining algorithms, it can also be used to program softwares which can use its concurrency feature to improve performance.

11 References: A parallel Association Rule Mining algorithm by Zhi-gang Wang and Chi-She Wang. In: Web information System and Mining Lecture Notes in Computer Science Volume 7529, 2012, pp Concurrent Programming in Erlang by Joe Armstrong, Robert Virding, Claes Wikstrom and Mike Williams Data Mining and Analysis: Fundamental Concepts and Algorithms by Mohammed J. Zaki and Wagner Meira Jr.

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE

DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India [email protected]

More information

Building A Smart Academic Advising System Using Association Rule Mining

Building A Smart Academic Advising System Using Association Rule Mining Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 [email protected] Qutaibah Althebyan +962796536277 [email protected] Baraq Ghalib & Mohammed

More information

Distributed Apriori in Hadoop MapReduce Framework

Distributed Apriori in Hadoop MapReduce Framework Distributed Apriori in Hadoop MapReduce Framework By Shulei Zhao (sz2352) and Rongxin Du (rd2537) Individual Contribution: Shulei Zhao: Implements centralized Apriori algorithm and input preprocessing

More information

Project Report. 1. Application Scenario

Project Report. 1. Application Scenario Project Report In this report, we briefly introduce the application scenario of association rule mining, give details of apriori algorithm implementation and comment on the mined rules. Also some instructions

More information

New Matrix Approach to Improve Apriori Algorithm

New Matrix Approach to Improve Apriori Algorithm New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, [email protected] Associate

More information

Improving Apriori Algorithm to get better performance with Cloud Computing

Improving Apriori Algorithm to get better performance with Cloud Computing Improving Apriori Algorithm to get better performance with Cloud Computing Zeba Qureshi 1 ; Sanjay Bansal 2 Affiliation: A.I.T.R, RGPV, India 1, A.I.T.R, RGPV, India 2 ABSTRACT Cloud computing has become

More information

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM

MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan

More information

A Time Efficient Algorithm for Web Log Analysis

A Time Efficient Algorithm for Web Log Analysis A Time Efficient Algorithm for Web Log Analysis Santosh Shakya Anju Singh Divakar Singh Student [M.Tech.6 th sem (CSE)] Asst.Proff, Dept. of CSE BU HOD (CSE), BUIT, BUIT,BU Bhopal Barkatullah University,

More information

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm

Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical

More information

Data Mining Apriori Algorithm

Data Mining Apriori Algorithm 10 Data Mining Apriori Algorithm Apriori principle Frequent itemsets generation Association rules generation Section 6 of course book TNM033: Introduction to Data Mining 1 Association Rule Mining (ARM)

More information

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm

Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*

More information

Discovery of Maximal Frequent Item Sets using Subset Creation

Discovery of Maximal Frequent Item Sets using Subset Creation Discovery of Maximal Frequent Item Sets using Subset Creation Jnanamurthy HK, Vishesh HV, Vishruth Jain, Preetham Kumar, Radhika M. Pai Department of Information and Communication Technology Manipal Institute

More information

Data Mining: Foundation, Techniques and Applications

Data Mining: Foundation, Techniques and Applications Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing

More information

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis

Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis , 23-25 October, 2013, San Francisco, USA Mining Online GIS for Crime Rate and Models based on Frequent Pattern Analysis John David Elijah Sandig, Ruby Mae Somoba, Ma. Beth Concepcion and Bobby D. Gerardo,

More information

Classification and Prediction

Classification and Prediction Classification and Prediction Slides for Data Mining: Concepts and Techniques Chapter 7 Jiawei Han and Micheline Kamber Intelligent Database Systems Research Lab School of Computing Science Simon Fraser

More information

Mining Interesting Medical Knowledge from Big Data

Mining Interesting Medical Knowledge from Big Data IOSR Journal of Computer Engineering (IOSR-JCE) e-issn: 2278-0661,p-ISSN: 2278-8727, Volume 18, Issue 1, Ver. II (Jan Feb. 2016), PP 06-10 www.iosrjournals.org Mining Interesting Medical Knowledge from

More information

Mining an Online Auctions Data Warehouse

Mining an Online Auctions Data Warehouse Proceedings of MASPLAS'02 The Mid-Atlantic Student Workshop on Programming Languages and Systems Pace University, April 19, 2002 Mining an Online Auctions Data Warehouse David Ulmer Under the guidance

More information

Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar

Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar Data Mining: Association Analysis Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of

More information

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell Data Mining with R Decision Trees and Random Forests Hugh Murrell reference books These slides are based on a book by Graham Williams: Data Mining with Rattle and R, The Art of Excavating Data for Knowledge

More information

ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR

ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR ANALYSIS OF GRID COMPUTING AS IT APPLIES TO HIGH VOLUME DOCUMENT PROCESSING AND OCR By: Dmitri Ilkaev, Stephen Pearson Abstract: In this paper we analyze the concept of grid programming as it applies to

More information

A Fraud Detection Approach in Telecommunication using Cluster GA

A Fraud Detection Approach in Telecommunication using Cluster GA A Fraud Detection Approach in Telecommunication using Cluster GA V.Umayaparvathi Dept of Computer Science, DDE, MKU Dr.K.Iyakutti CSIR Emeritus Scientist, School of Physics, MKU Abstract: In trend mobile

More information

Databases - Data Mining. (GF Royle, N Spadaccini 2006-2010) Databases - Data Mining 1 / 25

Databases - Data Mining. (GF Royle, N Spadaccini 2006-2010) Databases - Data Mining 1 / 25 Databases - Data Mining (GF Royle, N Spadaccini 2006-2010) Databases - Data Mining 1 / 25 This lecture This lecture introduces data-mining through market-basket analysis. (GF Royle, N Spadaccini 2006-2010)

More information

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains

A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains A Way to Understand Various Patterns of Data Mining Techniques for Selected Domains Dr. Kanak Saxena Professor & Head, Computer Application SATI, Vidisha, [email protected] D.S. Rajpoot Registrar,

More information

Selection of Optimal Discount of Retail Assortments with Data Mining Approach

Selection of Optimal Discount of Retail Assortments with Data Mining Approach Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.

More information

Binary Coded Web Access Pattern Tree in Education Domain

Binary Coded Web Access Pattern Tree in Education Domain Binary Coded Web Access Pattern Tree in Education Domain C. Gomathi P.G. Department of Computer Science Kongu Arts and Science College Erode-638-107, Tamil Nadu, India E-mail: [email protected] M. Moorthi

More information

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL

ASSOCIATION RULE MINING ON WEB LOGS FOR EXTRACTING INTERESTING PATTERNS THROUGH WEKA TOOL International Journal Of Advanced Technology In Engineering And Science Www.Ijates.Com Volume No 03, Special Issue No. 01, February 2015 ISSN (Online): 2348 7550 ASSOCIATION RULE MINING ON WEB LOGS FOR

More information

Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets

Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets Performance Evaluation of some Online Association Rule Mining Algorithms for sorted and unsorted Data sets Pramod S. Reader, Information Technology, M.P.Christian College of Engineering, Bhilai,C.G. INDIA.

More information

Association Rule Mining

Association Rule Mining Association Rule Mining Association Rules and Frequent Patterns Frequent Pattern Mining Algorithms Apriori FP-growth Correlation Analysis Constraint-based Mining Using Frequent Patterns for Classification

More information

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH

MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, [email protected]

More information

Data Mining Applications in Manufacturing

Data Mining Applications in Manufacturing Data Mining Applications in Manufacturing Dr Jenny Harding Senior Lecturer Wolfson School of Mechanical & Manufacturing Engineering, Loughborough University Identification of Knowledge - Context Intelligent

More information

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm

Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm Association Technique on Prediction of Chronic Diseases Using Apriori Algorithm R.Karthiyayini 1, J.Jayaprakash 2 Assistant Professor, Department of Computer Applications, Anna University (BIT Campus),

More information

Keywords: Mobility Prediction, Location Prediction, Data Mining etc

Keywords: Mobility Prediction, Location Prediction, Data Mining etc Volume 4, Issue 4, April 2014 ISSN: 2277 128X International Journal of Advanced Research in Computer Science and Software Engineering Research Paper Available online at: www.ijarcsse.com Data Mining Approach

More information

Clinic + - A Clinical Decision Support System Using Association Rule Mining

Clinic + - A Clinical Decision Support System Using Association Rule Mining Clinic + - A Clinical Decision Support System Using Association Rule Mining Sangeetha Santhosh, Mercelin Francis M.Tech Student, Dept. of CSE., Marian Engineering College, Kerala University, Trivandrum,

More information

Data Mining Approach in Security Information and Event Management

Data Mining Approach in Security Information and Event Management Data Mining Approach in Security Information and Event Management Anita Rajendra Zope, Amarsinh Vidhate, and Naresh Harale Abstract This paper gives an overview of data mining field & security information

More information

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction COMP3420: Advanced Databases and Data Mining Classification and prediction: Introduction and Decision Tree Induction Lecture outline Classification versus prediction Classification A two step process Supervised

More information

Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1

Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 Chapter 4 Data Mining A Short Introduction 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining

More information

A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs

A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs A Breadth-First Algorithm for Mining Frequent Patterns from Event Logs Risto Vaarandi Department of Computer Engineering, Tallinn University of Technology, Estonia [email protected] Abstract. Today,

More information

COMBINED METHODOLOGY of the CLASSIFICATION RULES for MEDICAL DATA-SETS

COMBINED METHODOLOGY of the CLASSIFICATION RULES for MEDICAL DATA-SETS COMBINED METHODOLOGY of the CLASSIFICATION RULES for MEDICAL DATA-SETS V.Sneha Latha#, P.Y.L.Swetha#, M.Bhavya#, G. Geetha#, D. K.Suhasini# # Dept. of Computer Science& Engineering K.L.C.E, GreenFields-522502,

More information

Classification of IDS Alerts with Data Mining Techniques

Classification of IDS Alerts with Data Mining Techniques International Journal of Electronic Commerce Studies Vol.5, No.1, pp.1-6, 2014 Classification of IDS Alerts with Data Mining Techniques Hany Nashat Gabra Computer and Systems Engineering Department, Ain

More information

Analysis of Customer Behavior using Clustering and Association Rules

Analysis of Customer Behavior using Clustering and Association Rules Analysis of Customer Behavior using Clustering and Association Rules P.Isakki alias Devi, Research Scholar, Vels University,Chennai 117, Tamilnadu, India. S.P.Rajagopalan Professor of Computer Science

More information

Data Mining Classification: Decision Trees

Data Mining Classification: Decision Trees Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous

More information

A Data Mining Tutorial

A Data Mining Tutorial A Data Mining Tutorial Presented at the Second IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 98) 14 December 1998 Graham Williams, Markus Hegland and Stephen

More information

Using Data Mining Methods to Predict Personally Identifiable Information in Emails

Using Data Mining Methods to Predict Personally Identifiable Information in Emails Using Data Mining Methods to Predict Personally Identifiable Information in Emails Liqiang Geng 1, Larry Korba 1, Xin Wang, Yunli Wang 1, Hongyu Liu 1, Yonghua You 1 1 Institute of Information Technology,

More information

Distributed Systems / Middleware Distributed Programming in Erlang

Distributed Systems / Middleware Distributed Programming in Erlang Distributed Systems / Middleware Distributed Programming in Erlang Alessandro Sivieri Dipartimento di Elettronica e Informazione Politecnico, Italy [email protected] http://corsi.dei.polimi.it/distsys

More information

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge

Static Data Mining Algorithm with Progressive Approach for Mining Knowledge Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive

More information

Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster

Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster Performance Analysis of Apriori Algorithm with Different Data Structures on Hadoop Cluster Sudhakar Singh Dept. of Computer Science Faculty of Science Banaras Hindu University Rakhi Garg Dept. of Computer

More information

CHAPTER 1 INTRODUCTION

CHAPTER 1 INTRODUCTION 1 CHAPTER 1 INTRODUCTION 1.1 MOTIVATION OF RESEARCH Multicore processors have two or more execution cores (processors) implemented on a single chip having their own set of execution and architectural recourses.

More information

Case study: d60 Raptor smartadvisor. Jan Neerbek Alexandra Institute

Case study: d60 Raptor smartadvisor. Jan Neerbek Alexandra Institute Case study: d60 Raptor smartadvisor Jan Neerbek Alexandra Institute Agenda d60: A cloud/data mining case Cloud Data Mining Market Basket Analysis Large data sets Our solution 2 Alexandra Institute The

More information

A hybrid algorithm combining weighted and hasht apriori algorithms in Map Reduce model using Eucalyptus cloud platform

A hybrid algorithm combining weighted and hasht apriori algorithms in Map Reduce model using Eucalyptus cloud platform A hybrid algorithm combining weighted and hasht apriori algorithms in Map Reduce model using Eucalyptus cloud platform 1 R. SUMITHRA, 2 SUJNI PAUL AND 3 D. PONMARY PUSHPA LATHA 1 School of Computer Science,

More information

IJRFM Volume 2, Issue 1 (January 2012) (ISSN 2231-5985)

IJRFM Volume 2, Issue 1 (January 2012) (ISSN 2231-5985) ASSOCIATION MODELS FOR MARKET BASKET ANALYSIS, CUSTOMER BEHAVIOUR ANALYSIS AND BUSINESS INTELLIGENCE SOLUTION EMBEDDED WITH ARIORI CONCEPT J.M. Lakshmi Mahesh* ABSTRACT This paper analyzes the customer

More information

A Survey on Association Rule Mining in Market Basket Analysis

A Survey on Association Rule Mining in Market Basket Analysis International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey

More information

The Fuzzy Frequent Pattern Tree

The Fuzzy Frequent Pattern Tree The Fuzzy Frequent Pattern Tree STERGIOS PAPADIMITRIOU 1 SEFERINA MAVROUDI 2 1. Department of Information Management, Technological Educational Institute of Kavala, 65404 Kavala, Greece 2. Pattern Recognition

More information

Web Document Clustering

Web Document Clustering Web Document Clustering Lab Project based on the MDL clustering suite http://www.cs.ccsu.edu/~markov/mdlclustering/ Zdravko Markov Computer Science Department Central Connecticut State University New Britain,

More information

Code and Process Migration! Motivation!

Code and Process Migration! Motivation! Code and Process Migration! Motivation How does migration occur? Resource migration Agent-based system Details of process migration Lecture 6, page 1 Motivation! Key reasons: performance and flexibility

More information

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02)

So today we shall continue our discussion on the search engines and web crawlers. (Refer Slide Time: 01:02) Internet Technology Prof. Indranil Sengupta Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No #39 Search Engines and Web Crawler :: Part 2 So today we

More information

Intelligent Log Analyzer. André Restivo <[email protected]>

Intelligent Log Analyzer. André Restivo <andre.restivo@portugalmail.pt> Intelligent Log Analyzer André Restivo 9th January 2003 Abstract Server Administrators often have to analyze server logs to find if something is wrong with their machines.

More information

AN APPLICATION OF INFORMATION RETRIEVAL IN P2P NETWORKS USING SOCKETS AND METADATA

AN APPLICATION OF INFORMATION RETRIEVAL IN P2P NETWORKS USING SOCKETS AND METADATA AN APPLICATION OF INFORMATION RETRIEVAL IN P2P NETWORKS USING SOCKETS AND METADATA Ms. M. Kiruthika Asst. Professor, Fr.C.R.I.T, Vashi, Navi Mumbai. [email protected] Ms. Smita Dange Lecturer,

More information

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: [email protected]

KNIME TUTORIAL. Anna Monreale KDD-Lab, University of Pisa Email: annam@di.unipi.it KNIME TUTORIAL Anna Monreale KDD-Lab, University of Pisa Email: [email protected] Outline Introduction on KNIME KNIME components Exercise: Market Basket Analysis Exercise: Customer Segmentation Exercise:

More information

Searching frequent itemsets by clustering data

Searching frequent itemsets by clustering data Towards a parallel approach using MapReduce Maria Malek Hubert Kadima LARIS-EISTI Ave du Parc, 95011 Cergy-Pontoise, FRANCE [email protected], [email protected] 1 Introduction and Related Work

More information

Dual Mechanism to Detect DDOS Attack Priyanka Dembla, Chander Diwaker 2 1 Research Scholar, 2 Assistant Professor

Dual Mechanism to Detect DDOS Attack Priyanka Dembla, Chander Diwaker 2 1 Research Scholar, 2 Assistant Professor International Association of Scientific Innovation and Research (IASIR) (An Association Unifying the Sciences, Engineering, and Applied Research) International Journal of Engineering, Business and Enterprise

More information

Data Mining to Recognize Fail Parts in Manufacturing Process

Data Mining to Recognize Fail Parts in Manufacturing Process 122 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.7, NO.2 August 2009 Data Mining to Recognize Fail Parts in Manufacturing Process Wanida Kanarkard 1, Danaipong Chetchotsak

More information

Implementing Graph Pattern Mining for Big Data in the Cloud

Implementing Graph Pattern Mining for Big Data in the Cloud Implementing Graph Pattern Mining for Big Data in the Cloud Chandana Ojah M.Tech in Computer Science & Engineering Department of Computer Science & Engineering, PES College of Engineering, Mandya [email protected]

More information

MINING ASSOCIATION RULES FROM LARGE DATA BASES- A REVIEW

MINING ASSOCIATION RULES FROM LARGE DATA BASES- A REVIEW MINING ASSOCIATION RULES FROM LARGE DATA BASES- A REVIEW R.Priya 1, Ananthi Sheshasaayee 2 1 Research Scholar, Asst. Prof, M.C.A. Dept, VELS University, Chennai, India 2 Associate Professor and Head, Dept

More information

Map-Reduce for Machine Learning on Multicore

Map-Reduce for Machine Learning on Multicore Map-Reduce for Machine Learning on Multicore Chu, et al. Problem The world is going multicore New computers - dual core to 12+-core Shift to more concurrent programming paradigms and languages Erlang,

More information

Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm

Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm Implementing Improved Algorithm Over APRIORI Data Mining Association Rule Algorithm 1 Sanjeev Rao, 2 Priyanka Gupta 1,2 Dept. of CSE, RIMT-MAEC, Mandi Gobindgarh, Punjab, india Abstract In this paper we

More information

RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases

RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases 998 JOURNAL OF SOFTWARE, VOL. 5, NO. 9, SEPTEMBER 2010 RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases Abdallah Alashqur Faculty of Information Technology Applied Science University

More information

Chapter 13: Query Processing. Basic Steps in Query Processing

Chapter 13: Query Processing. Basic Steps in Query Processing Chapter 13: Query Processing! Overview! Measures of Query Cost! Selection Operation! Sorting! Join Operation! Other Operations! Evaluation of Expressions 13.1 Basic Steps in Query Processing 1. Parsing

More information

Association Analysis: Basic Concepts and Algorithms

Association Analysis: Basic Concepts and Algorithms 6 Association Analysis: Basic Concepts and Algorithms Many business enterprises accumulate large quantities of data from their dayto-day operations. For example, huge amounts of customer purchase data

More information

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest

Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest Operating Systems for Parallel Processing Assistent Lecturer Alecu Felician Economic Informatics Department Academy of Economic Studies Bucharest 1. Introduction Few years ago, parallel computers could

More information

RESEARCH PAPER International Journal of Recent Trends in Engineering, Vol 1, No. 1, May 2009

RESEARCH PAPER International Journal of Recent Trends in Engineering, Vol 1, No. 1, May 2009 An Algorithm for Dynamic Load Balancing in Distributed Systems with Multiple Supporting Nodes by Exploiting the Interrupt Service Parveen Jain 1, Daya Gupta 2 1,2 Delhi College of Engineering, New Delhi,

More information

The basic data mining algorithms introduced may be enhanced in a number of ways.

The basic data mining algorithms introduced may be enhanced in a number of ways. DATA MINING TECHNOLOGIES AND IMPLEMENTATIONS The basic data mining algorithms introduced may be enhanced in a number of ways. Data mining algorithms have traditionally assumed data is memory resident,

More information

DATA ANALYSIS II. Matrix Algorithms

DATA ANALYSIS II. Matrix Algorithms DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where

More information

Professor Anita Wasilewska. Classification Lecture Notes

Professor Anita Wasilewska. Classification Lecture Notes Professor Anita Wasilewska Classification Lecture Notes Classification (Data Mining Book Chapters 5 and 7) PART ONE: Supervised learning and Classification Data format: training and test data Concept,

More information

Scala Actors Library. Robert Hilbrich

Scala Actors Library. Robert Hilbrich Scala Actors Library Robert Hilbrich Foreword and Disclaimer I am not going to teach you Scala. However, I want to: Introduce a library Explain what I use it for My Goal is to: Give you a basic idea about

More information

Benchmarking Hadoop & HBase on Violin

Benchmarking Hadoop & HBase on Violin Technical White Paper Report Technical Report Benchmarking Hadoop & HBase on Violin Harnessing Big Data Analytics at the Speed of Memory Version 1.0 Abstract The purpose of benchmarking is to show advantages

More information

Chapter 20: Data Analysis

Chapter 20: Data Analysis Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification

More information

Data Mining for Knowledge Management. Classification

Data Mining for Knowledge Management. Classification 1 Data Mining for Knowledge Management Classification Themis Palpanas University of Trento http://disi.unitn.eu/~themis Data Mining for Knowledge Management 1 Thanks for slides to: Jiawei Han Eamonn Keogh

More information

Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan

Data Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:

More information

PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE

PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE International Journal of Computer Science and Applications, Vol. 5, No. 4, pp 57-69, 2008 Technomathematics Research Foundation PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE

More information

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification

Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Introduction Overview Motivating Examples Interleaving Model Semantics of Correctness Testing, Debugging, and Verification Advanced Topics in Software Engineering 1 Concurrent Programs Characterized by

More information

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET

International Journal of World Research, Vol: I Issue XIII, December 2008, Print ISSN: 2347-937X DATA MINING TECHNIQUES AND STOCK MARKET DATA MINING TECHNIQUES AND STOCK MARKET Mr. Rahul Thakkar, Lecturer and HOD, Naran Lala College of Professional & Applied Sciences, Navsari ABSTRACT Without trading in a stock market we can t understand

More information

A Survey on Intrusion Detection System with Data Mining Techniques

A Survey on Intrusion Detection System with Data Mining Techniques A Survey on Intrusion Detection System with Data Mining Techniques Ms. Ruth D 1, Mrs. Lovelin Ponn Felciah M 2 1 M.Phil Scholar, Department of Computer Science, Bishop Heber College (Autonomous), Trichirappalli,

More information