Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar
|
|
- Job Dennis
- 8 years ago
- Views:
Transcription
1 Data Mining: Association Analysis Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar
2 Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction Market-Basket transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example of Association Rules {Bread} {Milk} {Diaper} {Beer}, 2
3 Definition: Frequent Itemset Itemset A collection of one or more items Example: {Milk, Bread, Diaper} k-itemset An itemset t that t contains k items Support count σ) Frequency of occurrence of an itemset E.g. σ{milk, Bread,Diaper}) = 2 Support Fraction of transactions that contain an itemset E.g. s{milk, Bread, Diaper}) = 2/5 Frequent Itemset An itemset whose support is greater than or equal to a minsup threshold TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke 3
4 Definition: Association Rule Association Rule An implication expression of the form X Y, where X and Y are itemsets Example: {Milk, Diaper} {Beer} Rule Evaluation Metrics Support s) Fraction of transactions that contain both X and Y Confidence c) TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example: { Milk, Diaper} Milk, Diaper, Beer) s = σ T 2 = = 5 Measures how often items in Y appear in transactions that 5 contain X Beer 0.4 σ Milk,Diaper,Beer) 2 c = = = 0.67 σ Milk, Diaper) 3 4
5 Association Rule Mining Task Given a set of transactions T, the goal of association rule mining i is to find all rules having support minsup threshold confidence minconf threshold Brute-force approach: List all possible association rules Compute the support and confidence for each rule Prune rules that fail the minsup and minconf thresholds Computationally prohibitive! 5
6 Mining Association Rules TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Example of Rules: {Milk,Diaper} {Beer} s=0.4, c=0.67) {Milk,Beer} {Diaper} s=0.4, c=1.0) {Diaper,Beer} {Milk} s=0.4, c=0.67) {Beer} {Milk,Diaper} s=0.4, c=0.67) {Diaper} {Milk,Beer} s=0.4, c=0.5) {Milk} {Diaper,Beer} s=0.4, c=0.5) Observations: All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} Rules originating from the same itemset have identical support but can have different confidence Thus, we may decouple the support and confidence requirements 6
7 Association Rules Mining cont.) Problem Decomposition 1. Find all items itemsets) with support larger than s% large itemset L k ) 2. Use the large itemsets to generate the rules Example if ABCD and AB are large itemsets then rule AB -> CD holds if supportabcd)/supportab) >= c% Existing Mining Algorithms focused on sub-problem 1 features read the database several passes focused on reducing the computation cost 7
8 Frequent Itemset Generation null A B C D E AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE ABCD ABCE ABDE ACDE BCDE ABCDE Given d items, there are 2 d possible candidate itemsets t 8
9 Frequent Itemset Generation Brute-force approach: Each itemset in the lattice is a candidate frequent itemset Count the support of each candidate by scanning the database Transactions TID Items 1 Bread, Milk 2 Bread, Diaper, Beer, Eggs 3 Milk, Diaper, Beer, Coke 4 Bread, Milk, Diaper, Beer 5 Bread, Milk, Diaper, Coke Match each transaction against every candidate Complexity ~ ONMw) => Expensive since M = 2 d!!! 9
10 Frequent Itemset Generation Strategies Reduce the number of candidates M) Complete search: M=2 d Use pruning techniques to reduce M Reduce the number of transactions N) Reduce size of N as the size of itemset increases Used by DHP and vertical-based mining algorithms Reduce the number of comparisons NM) Use efficient data structures to store the candidates or transactions No need to match every candidate against every transaction 10
11 Reducing Number of Candidates Apriori principle: If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due to the following property of fthe support measure: X, Y : X Y ) s X ) s Y Support of an itemset never exceeds the support of its subsets This is known as the anti-monotone property p of support ) 11
12 Illustrating Apriori Principle Found to be Infrequent Pruned supersets 12
13 Illustrating Apriori Principle Item Count Bread 4 Coke 2 Milk 4 Beer 3 Diaper 4 Eggs 1 Minimum Support = 3 Items 1-itemsets) Itemset Count {Bread,Milk} 3 {Bread,Beer} 2 {Bread,Diaper} 3 {Milk,Beer} 2 {Milk,Diaper} 3 {Beer,Diaper} 3 Pairs 2-itemsets) No need to generate candidates involving Coke or Eggs) Triplets 3-itemsets) If every subset is considered, 6 C C C 3 = 41 With support-based pruning, = 13 Item set Count {B read,m ilk,d iaper} 3 13
14 Apriori Algorithm Method: Let k=1 Generate frequent itemsets t of length 1 Repeat until no new frequent itemsets are identified Generate length k+1) candidate itemsets from length k frequent itemsets Prune candidate itemsets containing subsets of length k that are infrequent Count the support of each candidate by scanning the DB Eliminate candidates that are infrequent, leaving only those that are frequent 14
15 Association Rules Mining Generation of Candidate Itemsets C k 1) Join: L k-1 L k-1 2) Prune: delete all itemsets c C k such that some k-1)- subset of c is not in L k-1. <e.g.> If {A B} is large then both A and B are large Example Let L 3 be {{1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2 3 4}} 1) Join: C 4 = L 3 L 3 = {{ }, { }} 2) Prune: { } is deleted because {1 4 5} is not in L 3 => C 4 = {{ }} 15
16 Example of Apriori Mining Trans # Items 1 A B C D E 2 A B D 3 A C F 4 D G 5 A B H Minimum Support = 2/5 Step 1: Find L 1 L 1 =ABCD) Step 2: Find L 2 i) Generate Candidates C 2 = L 1 x L 1 = { A B), A C), A D),.} ii) Count SA B))=SA C))=SA D))=SB D))=2/5 SB C))=SC D))=1/5 = { A B), A C), A D), B D)} Step 3: Find L 3 i) C 3 = L 2 x L 2 = { A B C), A B D), A C D) } Prune: delete A B C), A C D) ii) Count SA B D)) = 2/5 L 3 = {A B D)} 16
17 Factors Affecting Complexity Choice of minimum support threshold lowering support threshold results in more frequent itemsets this may increase number of candidates and max length of frequent itemsets Dimensionality number of items) of the data set more space is needed to store support count of each item if number of frequent items also increases, both computation and I/O costs may also increase Size of database since Apriori makes multiple passes, run time of algorithm may increase with number of transactions Average transaction width transaction width increases with denser data sets This may increase max length of frequent itemsets and traversals of hash tree number of subsets in a transaction increases with its width) 17
18 Maximal Frequent Itemset An itemset is maximal frequent if none of its immediate supersets is frequent Maximal Itemsets Infrequent Itemsets Border 18
19 Closed Itemset An itemset is closed if none of its immediate supersets has the same support as the itemset TID Items 1 {A,B} 2 {B,C,D} 3 {A,B,C,D} 4 {A,B,D} 5 {A,B,C,D} Itemset Support {A} 4 {B} 5 {C} 3 {D} 4 {A,B} 4 {A,C} 2 {A,D} 3 {B,C} 3 {B,D} 4 {C,D} 3 Itemset Support {A,B,C} 2 {A,B,D} 3 {A,C,D} 2 {B,C,D} 3 {A,B,C,D} 2 19
20 Maximal vs Closed Itemsets TID Items 1 ABC 2 ABCD 3 BCE 4 ACDE 5 DE null A B C D E Transaction Ids AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE 2 4 ABCD ABCE ABDE ACDE BCDE Not supported by any transactions ABCDE 20
21 Maximal vs Closed Frequent Itemsets Minimum support = 2 null Closed but not maximal A B C D E Closed and maximal AB AC AD AE BC BD BE CD CE DE ABC ABD ABE ACD ACE ADE BCD BCE BDE CDE 2 4 ABCD ABCE ABDE ACDE BCDE # Closed = 9 # Maximal = 4 ABCDE 21
22 Association Rules Discovery Research Issues efficient i mining i methods eg) No candidates; I/O reduction Variations of association rules Multiple support, Multi-level rules rule interestingness & pruning incremental mining accuracy adjustment new constraints & measures integration considerations g with RDBMS fuzzy sets, rough sets,, etc. DB* DB DB + 22
23 What more applications for association rules mining? Discussions Fields, industry problems 23
24 Variations of Association Rules Multiple Support Multi-Level Association Rules 24
25 Effect of Support Distribution Many real data sets have skewed support distribution ib ti Support distribution of a retail data set 25
26 Effect of Support Distribution How to set the appropriate minsup threshold? If minsup is set too high, we could miss itemsets involving interesting rare items e.g., expensive products) If minsup is set too low, it is computationally ti expensive and the number of itemsets is very large Using a single minimum support threshold may not be effective 26
27 Multiple Minimum Support How to apply multiple minimum supports? MSi): minimum support for item i e.g.: MSMilk)=5%, MSCoke) = 3%, MSBroccoli)=0.1%, MSSalmon)=0.5% MS{Milk, Broccoli}) = min MSMilk), MSBroccoli)) = 01% 0.1% Challenge: Support is no longer anti-monotone t Suppose: SupportMilk, Coke) = 1.5% and SupportMilk, Coke, Broccoli) = 0.5% {Milk,Coke} is infrequent but {Milk,Coke,Broccoli} is frequent! 27
28 Multiple Minimum Support Liu 1999) Order the items according to their minimum support in ascending order) e.g.: MSMilk)=5%, MSCoke) = 3%, MSBroccoli)=0 0.1%, MSSalmon)=0 0.5% Ordering: Broccoli, Salmon, Coke, Milk Need to modify Apriori such that: L 1 : set of frequent items F 1 : set of items whose support is MS1) where MS1) is min i MSi)) ) C 2 : candidate itemsets of size 2 is generated from F 1 instead of L 1 28
29 Multiple Minimum Support Liu 1999) Modifications to Apriori: In traditional Apriori, i A candidate k+1)-itemset is generated by merging two frequent itemsets of size k The candidate is pruned if it contains any infrequent subsets of size k Pruning step has to be modified: d Prune only if subset contains the first item e.g.: Candidate={Broccoli {Broccoli, Coke, Milk} ordered according to minimum support) {Broccoli, Coke} and {Broccoli, Milk} are frequent but {Coke, Milk} is infrequent Candidate is not pruned because {Coke,Milk} does not contain the first item, i.e., Broccoli. 29
30 Multi-level Association Rules Food Electronics Bread Milk Computers Home Wheat White Skim 2% Desktop Laptop Accessory TV DVD Foremost Kemps Printer Scanner 30
31 Multi-level Association Rules Why should we incorporate concept hierarchy? Rules at lower levels may not have enough support to appear in any frequent itemsets Rules at lower levels of the hierarchy are overly specific e.g., skim milk white bread, 2% milk wheat bread, skim milk wheat bread, etc. are indicative of association between milk and bread 31
32 Multi-level Association Rules How do support and confidence vary as we traverse the concept hierarchy? h If X is the parent item for both X1 and X2, then σx) σx1) + σx2) If σx1 Y1) minsup, and X is parent of X1, Y is parent of Y1 then σx Y1) minsup, σx1 Y) minsup σx Y) minsup If confx1 Y1) minconf, then confx1 Y) minconf 32
33 Multi-level Association Rules Approach 1: Extend current association rule formulation by augmenting each transaction with higher level items Original Transaction: {skim milk, wheat bread} Augmented Transaction: {skim milk, wheat bread, milk, bread, food} Issues: Items that reside at higher levels have much higher support counts if support threshold is low, too many frequent patterns involving items from the higher levels Increased dimensionality of the data 33
34 Multi-level Association Rules Approach 2: Generate frequent patterns at highest level first Then, generate frequent patterns at the next highest level, and so on Issues: I/O requirements will increase dramatically because we need to perform more passes over the data May miss some potentially interesting cross-level association patterns Other approaches: [Paper Study] 34
35 Pattern Evaluation Association rule algorithms tend to produce too many rules many of them are uninteresting or redundant Redundant if {A,B,C} {D} and {A,B} {D} have same support & confidence Interestingness measures can be used to prune/rank the derived patterns In the original formulation of association rules, support & confidence are the only measures used 35
36 Application of Interestingness Measure Knowledge Interestingness Measures Postprocessing Patterns Measures Featur e Prod uct Prod uct Prod uct Prod uct Prod uct Prod uct Prod uct Prod uct Prod uct Prod uct Featur e Featur e Featur F t Mining Preprocessed Data e e Featur e Featur e Featur e Featur e Featur e Featur e Mining Selected Data Preprocessing Data Selection 36
37 Computing Interestingness Measure Given a rule X Y, information needed to compute rule interestingness can be obtained from a contingency table Contingency table for X Y Y Y f 11 : support of X and Y X f 11 f 10 f 1+ X f 01 f 00 f o+ f +1 f +0 T f 10 : support of X and Y f 01 : support of X and Y f 00 : support of X and Y Used to define various measures support, confidence, lift, Gini, J-measure, etc. 37
38 Drawback of Confidence Coffee Coffee Tea Tea Association Rule: Tea Coffee Confidence= PCoffee Tea) = 0.75 but PCoffee) = 0.9 Although confidence is high, rule is misleading PCoffee Tea) ea) =
39 Statistical Independence Population of 1000 students 600 students know how to swim S) 700 students know how to bike B) 420 students know how to swim and bike S,B) PS B) = 420/1000 = 0.42 PS) PB) = = 0.42 PS B) = PS) PB) => Statistical independence PS B) > PS) PB) => Positively correlated PS B) < PS) PB) => Negatively correlated 39
40 Statistical-based Measures Measures that take into account statistical d d dependence ) X Y P ) ) Y P X Y P Lift = ) ) ), Y P X P Y X P Interest = ) ) ) ) ) ), ) ) Y P X P Y X P Y P X P Y X P PS = )] )[1 )] )[1 ) ) ), Y P Y P X P X P Y P X P Y X P coefficient = φ 40
41 Example: Lift Coffee Coffee Tea Tea Association Rule: Tea Coffee Confidence= PCoffee Tea) = 0.75 but PCoffee) = 0.9 Lift = 0.75/0.9= < 1, therefore is negatively associated) 41
42 Example: Lift cont.) Y Y X X Y Y X X Lift = = Lift = = )0.1) 0.9)0.9) Statistical independence: If PX,Y)=PX)PY) => Lift = 1 42
43 Some Issues There are lots of measures proposed in the literature Some measures are good for certain applications, but not for others What criteria should we use to determine whether a measure is good or bad? What about Apriori- style support based pruning? How does it affect these measures?
Data Mining Association Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 6. Introduction to Data Mining
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar Tan,Steinbach, Kumar Introduction to Data Mining 4/8/24
More informationData Mining Apriori Algorithm
10 Data Mining Apriori Algorithm Apriori principle Frequent itemsets generation Association rules generation Section 6 of course book TNM033: Introduction to Data Mining 1 Association Rule Mining (ARM)
More informationAssociation Analysis: Basic Concepts and Algorithms
6 Association Analysis: Basic Concepts and Algorithms Many business enterprises accumulate large quantities of data from their dayto-day operations. For example, huge amounts of customer purchase data
More informationMining Association Rules. Mining Association Rules. What Is Association Rule Mining? What Is Association Rule Mining? What is Association rule mining
Mining Association Rules What is Association rule mining Mining Association Rules Apriori Algorithm Additional Measures of rule interestingness Advanced Techniques 1 2 What Is Association Rule Mining?
More informationMarket Basket Analysis and Mining Association Rules
Market Basket Analysis and Mining Association Rules 1 Mining Association Rules Market Basket Analysis What is Association rule mining Apriori Algorithm Measures of rule interestingness 2 Market Basket
More informationUnique column combinations
Unique column combinations Arvid Heise Guest lecture in Data Profiling and Data Cleansing Prof. Dr. Felix Naumann Agenda 2 Introduction and problem statement Unique column combinations Exponential search
More informationData Mining2 Advanced Aspects and Applications
Data Mining2 Advanced Aspects and Applications Fosca Giannotti and Mirco Nanni Pisa KDD Lab, ISTI-CNR & Univ. Pisa http://www-kdd.isti.cnr.it/ DIPARTIMENTO DI INFORMATICA - Università di Pisa anno accademico
More informationAssociation Rule Mining
Association Rule Mining Association Rules and Frequent Patterns Frequent Pattern Mining Algorithms Apriori FP-growth Correlation Analysis Constraint-based Mining Using Frequent Patterns for Classification
More informationMAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH
MAXIMAL FREQUENT ITEMSET GENERATION USING SEGMENTATION APPROACH M.Rajalakshmi 1, Dr.T.Purusothaman 2, Dr.R.Nedunchezhian 3 1 Assistant Professor (SG), Coimbatore Institute of Technology, India, rajalakshmi@cit.edu.in
More informationMINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM
MINING THE DATA FROM DISTRIBUTED DATABASE USING AN IMPROVED MINING ALGORITHM J. Arokia Renjit Asst. Professor/ CSE Department, Jeppiaar Engineering College, Chennai, TamilNadu,India 600119. Dr.K.L.Shunmuganathan
More informationLaboratory Module 8 Mining Frequent Itemsets Apriori Algorithm
Laboratory Module 8 Mining Frequent Itemsets Apriori Algorithm Purpose: key concepts in mining frequent itemsets understand the Apriori algorithm run Apriori in Weka GUI and in programatic way 1 Theoretical
More informationProject Report. 1. Application Scenario
Project Report In this report, we briefly introduce the application scenario of association rule mining, give details of apriori algorithm implementation and comment on the mined rules. Also some instructions
More informationEffective Pruning for the Discovery of Conditional Functional Dependencies
Effective Pruning for the Discovery of Conditional Functional Dependencies Jiuyong Li 1, Jiuxue Liu 1, Hannu Toivonen 2, Jianming Yong 3 1 School of Computer and Information Science, University of South
More informationFrequent item set mining
Frequent item set mining Christian Borgelt Frequent item set mining is one of the best known and most popular data mining methods. Originally developed for market basket analysis, it is used nowadays for
More informationHOW TO USE MINITAB: DESIGN OF EXPERIMENTS. Noelle M. Richard 08/27/14
HOW TO USE MINITAB: DESIGN OF EXPERIMENTS 1 Noelle M. Richard 08/27/14 CONTENTS 1. Terminology 2. Factorial Designs When to Use? (preliminary experiments) Full Factorial Design General Full Factorial Design
More informationBoolean Algebra (cont d) UNIT 3 BOOLEAN ALGEBRA (CONT D) Guidelines for Multiplying Out and Factoring. Objectives. Iris Hui-Ru Jiang Spring 2010
Boolean Algebra (cont d) 2 Contents Multiplying out and factoring expressions Exclusive-OR and Exclusive-NOR operations The consensus theorem Summary of algebraic simplification Proving validity of an
More informationLecture Notes on Database Normalization
Lecture Notes on Database Normalization Chengkai Li Department of Computer Science and Engineering The University of Texas at Arlington April 15, 2012 I decided to write this document, because many students
More informationMining Association Rules to Evade Network Intrusion in Network Audit Data
Mining Association Rules to Evade Network Intrusion in Network Audit Data Kamini Nalavade 1, B.B. Meshram 2 Abstract With the growth of hacking and exploiting tools and invention of new ways of intrusion,
More informationFinding Frequent Itemsets using Apriori Algorihm to Detect Intrusions in Large Dataset
Finding Frequent Itemsets using Apriori Algorihm to Detect Intrusions in Large Dataset Kamini Nalavade1, B.B. Meshram2, Abstract With the growth of hacking and exploiting tools and invention of new ways
More informationBuilding A Smart Academic Advising System Using Association Rule Mining
Building A Smart Academic Advising System Using Association Rule Mining Raed Shatnawi +962795285056 raedamin@just.edu.jo Qutaibah Althebyan +962796536277 qaalthebyan@just.edu.jo Baraq Ghalib & Mohammed
More informationFuzzy Association Rules
Vienna University of Economics and Business Administration Fuzzy Association Rules An Implementation in R Master Thesis Vienna University of Economics and Business Administration Author Bakk. Lukas Helm
More informationChapter 6: Episode discovery process
Chapter 6: Episode discovery process Algorithmic Methods of Data Mining, Fall 2005, Chapter 6: Episode discovery process 1 6. Episode discovery process The knowledge discovery process KDD process of analyzing
More informationData Mining: Foundation, Techniques and Applications
Data Mining: Foundation, Techniques and Applications Lesson 1b :A Quick Overview of Data Mining Li Cuiping( 李 翠 平 ) School of Information Renmin University of China Anthony Tung( 鄧 锦 浩 ) School of Computing
More informationFinding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm
R. Sridevi et al Int. Journal of Engineering Research and Applications RESEARCH ARTICLE OPEN ACCESS Finding Frequent Patterns Based On Quantitative Binary Attributes Using FP-Growth Algorithm R. Sridevi,*
More informationDie Welt Multimedia-Reichweite
Die Welt Multimedia-Reichweite 1) Background The quantification of Die Welt s average daily audience (known as Multimedia-Reichweite, MMR) has been developed by Die Welt management, including the research
More informationAssociation Rule Mining: A Survey
Association Rule Mining: A Survey Qiankun Zhao Nanyang Technological University, Singapore and Sourav S. Bhowmick Nanyang Technological University, Singapore 1. DATA MINING OVERVIEW Data mining [Chen et
More informationData Mining: An Overview. David Madigan http://www.stat.columbia.edu/~madigan
Data Mining: An Overview David Madigan http://www.stat.columbia.edu/~madigan Overview Brief Introduction to Data Mining Data Mining Algorithms Specific Eamples Algorithms: Disease Clusters Algorithms:
More informationKnowledge Discovery and Data Mining
Knowledge Discovery and Data Mining Data Mining in a nutshell Dr. Osmar R. Zaïane Fall 2007 Extract interesting knowledge (rules, regularities, patterns, constraints) from data in large collections. Knowledge
More informationJordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010
Jordan University of Science & Technology Computer Science Department CS 728: Advanced Database Systems Midterm Exam First 2009/2010 Student Name: ID: Part 1: Multiple-Choice Questions (17 questions, 1
More informationSelection of Optimal Discount of Retail Assortments with Data Mining Approach
Available online at www.interscience.in Selection of Optimal Discount of Retail Assortments with Data Mining Approach Padmalatha Eddla, Ravinder Reddy, Mamatha Computer Science Department,CBIT, Gandipet,Hyderabad,A.P,India.
More informationRDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases
998 JOURNAL OF SOFTWARE, VOL. 5, NO. 9, SEPTEMBER 2010 RDB-MINER: A SQL-Based Algorithm for Mining True Relational Databases Abdallah Alashqur Faculty of Information Technology Applied Science University
More informationAn Efficient Market Basket Analysis based on Adaptive Association Rule Mining with Faster Rule Generation Algorithm
An Efficient Market Basket Analysis based on Adaptive Association Rule Mining with Faster Rule Generation Algorithm Dr. M. Dhanabhakyam* & Dr. M. Punithavalli** *Assistant Professor, Department of Commerce,
More informationAn Improved Algorithm for Fuzzy Data Mining for Intrusion Detection
An Improved Algorithm for Fuzzy Data Mining for Intrusion Detection German Florez, Susan M. Bridges, and Rayford B. Vaughn Abstract We have been using fuzzy data mining techniques to extract patterns that
More informationGerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
More informationPREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE
International Journal of Computer Science and Applications, Vol. 5, No. 4, pp 57-69, 2008 Technomathematics Research Foundation PREDICTIVE MODELING OF INTER-TRANSACTION ASSOCIATION RULES A BUSINESS PERSPECTIVE
More informationMining Sequence Data. JERZY STEFANOWSKI Inst. Informatyki PP Wersja dla TPD 2009 Zaawansowana eksploracja danych
Mining Sequence Data JERZY STEFANOWSKI Inst. Informatyki PP Wersja dla TPD 2009 Zaawansowana eksploracja danych Outline of the presentation 1. Realtionships to mining frequent items 2. Motivations for
More informationWeb Users Session Analysis Using DBSCAN and Two Phase Utility Mining Algorithms
International Journal of Soft Computing and Engineering (IJSCE) ISSN: 2231-2307, Volume-1, Issue-6, January 2012 Web Users Session Analysis Using DBSCAN and Two Phase Utility Mining Algorithms G. Sunil
More informationMining the Most Interesting Web Access Associations
Mining the Most Interesting Web Access Associations Li Shen, Ling Cheng, James Ford, Fillia Makedon, Vasileios Megalooikonomou, Tilmann Steinberg The Dartmouth Experimental Visualization Laboratory (DEVLAB)
More informationFunctional Dependencies and Normalization
Functional Dependencies and Normalization 5DV119 Introduction to Database Management Umeå University Department of Computing Science Stephen J. Hegner hegner@cs.umu.se http://www.cs.umu.se/~hegner Functional
More informationPart 2: Community Detection
Chapter 8: Graph Data Part 2: Community Detection Based on Leskovec, Rajaraman, Ullman 2014: Mining of Massive Datasets Big Data Management and Analytics Outline Community Detection - Social networks -
More informationDiscovery of Maximal Frequent Item Sets using Subset Creation
Discovery of Maximal Frequent Item Sets using Subset Creation Jnanamurthy HK, Vishesh HV, Vishruth Jain, Preetham Kumar, Radhika M. Pai Department of Information and Communication Technology Manipal Institute
More informationMining Association Rules on Grid Platforms
UNIVERSITY OF TUNIS EL MANAR FACULTY OF SCIENCES OF TUNISIA Mining Association Rules on Grid Platforms Raja Tlili raja_tlili@yahoo.fr Yahya Slimani yahya.slimani@fst.rnu.tn CoreGrid 11 Plan Introduction
More informationData Mining Applications in Manufacturing
Data Mining Applications in Manufacturing Dr Jenny Harding Senior Lecturer Wolfson School of Mechanical & Manufacturing Engineering, Loughborough University Identification of Knowledge - Context Intelligent
More informationBig Data Analysis Technology
Big Data Analysis Technology Tobias Hardes (6687549) Email: Tobias.Hardes@autistici.org Seminar: Cloud Computing and Big Data Analysis, L.079.08013 Summer semester 2013 University of Paderborn Abstract
More informationDistributed Apriori in Hadoop MapReduce Framework
Distributed Apriori in Hadoop MapReduce Framework By Shulei Zhao (sz2352) and Rongxin Du (rd2537) Individual Contribution: Shulei Zhao: Implements centralized Apriori algorithm and input preprocessing
More informationBig Data Frequent Pattern Mining
Big Data Frequent Pattern Mining David C. Anastasiu and Jeremy Iverson and Shaden Smith and George Karypis Department of Computer Science and Engineering University of Minnesota, Twin Cities, MN 55455,
More informationUsers Interest Correlation through Web Log Mining
Users Interest Correlation through Web Log Mining F. Tao, P. Contreras, B. Pauer, T. Taskaya and F. Murtagh School of Computer Science, the Queen s University of Belfast; DIW-Berlin Abstract When more
More informationDiscovering Data Quality Rules
Discovering Data Quality Rules Fei Chiang University of Toronto fchiang@cs.toronto.edu Renée J. Miller University of Toronto miller@cs.toronto.edu ABSTRACT Dirty data is a serious problem for businesses
More informationNew Matrix Approach to Improve Apriori Algorithm
New Matrix Approach to Improve Apriori Algorithm A. Rehab H. Alwa, B. Anasuya V Patil Associate Prof., IT Faculty, Majan College-University College Muscat, Oman, rehab.alwan@majancolleg.edu.om Associate
More informationMASTER'S THESIS. Mining Changes in Customer Purchasing Behavior
MASTER'S THESIS 2009:097 Mining Changes in Customer Purchasing Behavior - a Data Mining Approach Samira Madani Luleå University of Technology Master Thesis, Continuation Courses Marketing and e-commerce
More informationIntroduction to Data Mining
Introduction to Data Mining 1 Why Data Mining? Explosive Growth of Data Data collection and data availability Automated data collection tools, Internet, smartphones, Major sources of abundant data Business:
More informationStatic Data Mining Algorithm with Progressive Approach for Mining Knowledge
Global Journal of Business Management and Information Technology. Volume 1, Number 2 (2011), pp. 85-93 Research India Publications http://www.ripublication.com Static Data Mining Algorithm with Progressive
More informationChapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1
Chapter 4 Data Mining A Short Introduction 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1 1 Today's Question 1. Data Mining Overview 2. Association Rule Mining
More informationData Preprocessing. Week 2
Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.
More informationMining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
Data Mining and Knowledge Discovery, 8, 53 87, 2004 c 2004 Kluwer Academic Publishers. Manufactured in The Netherlands. Mining Frequent Patterns without Candidate Generation: A Frequent-Pattern Tree Approach
More informationDistributed Data Mining Algorithm Parallelization
Distributed Data Mining Algorithm Parallelization B.Tech Project Report By: Rishi Kumar Singh (Y6389) Abhishek Ranjan (10030) Project Guide: Prof. Satyadev Nandakumar Department of Computer Science and
More informationBenchmark Databases for Testing Big-Data Analytics In Cloud Environments
North Carolina State University Graduate Program in Operations Research Benchmark Databases for Testing Big-Data Analytics In Cloud Environments Rong Huang Rada Chirkova Yahya Fathi ICA CON 2012 April
More information{ Mining, Sets, of, Patterns }
{ Mining, Sets, of, Patterns } A tutorial at ECMLPKDD2010 September 20, 2010, Barcelona, Spain by B. Bringmann, S. Nijssen, N. Tatti, J. Vreeken, A. Zimmermann 1 Overview Tutorial 00:00 00:45 Introduction
More informationData Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland
Data Mining and Knowledge Discovery in Databases (KDD) State of the Art Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland 1 Conference overview 1. Overview of KDD and data mining 2. Data
More informationTheory behind Normalization & DB Design. Satisfiability: Does an FD hold? Lecture 12
Theory behind Normalization & DB Design Lecture 12 Satisfiability: Does an FD hold? Satisfiability of FDs Given: FD X Y and relation R Output: Does R satisfy X Y? Algorithm: a.sort R on X b.do all the
More informationWeek 11: Normal Forms. Logical Database Design. Normal Forms and Normalization. Examples of Redundancy
Week 11: Normal Forms Database Design Database Redundancies and Anomalies Functional Dependencies Entailment, Closure and Equivalence Lossless Decompositions The Third Normal Form (3NF) The Boyce-Codd
More informationUnit 3 Boolean Algebra (Continued)
Unit 3 Boolean Algebra (Continued) 1. Exclusive-OR Operation 2. Consensus Theorem Department of Communication Engineering, NCTU 1 3.1 Multiplying Out and Factoring Expressions Department of Communication
More informationDatabases -Normalization III. (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31
Databases -Normalization III (N Spadaccini 2010 and W Liu 2012) Databases - Normalization III 1 / 31 This lecture This lecture describes 3rd normal form. (N Spadaccini 2010 and W Liu 2012) Databases -
More informationChapter 20: Data Analysis
Chapter 20: Data Analysis Database System Concepts, 6 th Ed. See www.db-book.com for conditions on re-use Chapter 20: Data Analysis Decision Support Systems Data Warehousing Data Mining Classification
More informationDiscovery of Data Dependencies in Relational Databases
Discovery of Data Dependencies in Relational Databases Siegfried Bell & Peter Brockhausen Informatik VIII, University Dortmund 44221 Dortmund, Germany email: fbell; brockhg@ls8.informatik.uni-dortmund.de
More informationDatabases - Data Mining. (GF Royle, N Spadaccini 2006-2010) Databases - Data Mining 1 / 25
Databases - Data Mining (GF Royle, N Spadaccini 2006-2010) Databases - Data Mining 1 / 25 This lecture This lecture introduces data-mining through market-basket analysis. (GF Royle, N Spadaccini 2006-2010)
More informationIndex support for regular expression search. Alexander Korotkov PGCon 2012, Ottawa
Index support for regular expression search Alexander Korotkov PGCon 2012, Ottawa Introduction What is regular expressions? Regular expressions are: powerful tool for text processing based on formal language
More informationParallel Data Mining Algorithms for Association Rules and Clustering
Parallel Data Mining Algorithms for Association Rules and Clustering Jianwei Li Northwestern University Ying Liu DTKE Center and Grad Univ of CAS Wei-keng Liao Northwestern University Alok Choudhary Northwestern
More informationData Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier
Data Mining: Concepts and Techniques Jiawei Han Micheline Kamber Simon Fräser University К MORGAN KAUFMANN PUBLISHERS AN IMPRINT OF Elsevier Contents Foreword Preface xix vii Chapter I Introduction I I.
More informationAssociation Rule Mining: Exercises and Answers
Association Rule Mining: Exercises and Answers Contains both theoretical and practical exercises to be done using Weka. The exercises are part of the DBTech Virtual Workshop on KDD and BI. Exercise 1.
More informationA Survey on Association Rule Mining in Market Basket Analysis
International Journal of Information and Computation Technology. ISSN 0974-2239 Volume 4, Number 4 (2014), pp. 409-414 International Research Publications House http://www. irphouse.com /ijict.htm A Survey
More informationA Data Mining Tutorial
A Data Mining Tutorial Presented at the Second IASTED International Conference on Parallel and Distributed Computing and Networks (PDCN 98) 14 December 1998 Graham Williams, Markus Hegland and Stephen
More informationLimitations of E-R Designs. Relational Normalization Theory. Redundancy and Other Problems. Redundancy. Anomalies. Example
Limitations of E-R Designs Relational Normalization Theory Chapter 6 Provides a set of guidelines, does not result in a unique database schema Does not provide a way of evaluating alternative schemas Normalization
More informationEnhancement of Security in Distributed Data Mining
Enhancement of Security in Distributed Data Mining Sharda Darekar 1, Prof.D.K.Chitre, 2 1,2 Department Of Computer Engineering, Terna Engineering College,Nerul,Navi Mumbai. 1 sharda.darekar@gmail.com,
More informationChapter 7: Association Rule Mining in Learning Management Systems
1 Chapter 7: Association Rule Mining in Learning Management Systems Enrique García 1, Cristóbal Romero 1, Sebastián Ventura 1, Carlos de Castro 1, Toon Calders 2 1 Department of Computer Science and Numerical
More informationMining Association Rules: A Database Perspective
IJCSNS International Journal of Computer Science and Network Security, VOL.8 No.12, December 2008 69 Mining Association Rules: A Database Perspective Dr. Abdallah Alashqur Faculty of Information Technology
More informationDEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE
DEVELOPMENT OF HASH TABLE BASED WEB-READY DATA MINING ENGINE SK MD OBAIDULLAH Department of Computer Science & Engineering, Aliah University, Saltlake, Sector-V, Kol-900091, West Bengal, India sk.obaidullah@gmail.com
More informationMulti-table Association Rules Hiding
Multi-table Association Rules Hiding Shyue-Liang Wang 1 and Tzung-Pei Hong 2 1 Department of Information Management 2 Department of Computer Science and Information Engineering National University of Kaohsiung
More informationHow to bet using different NairaBet Bet Combinations (Combo)
How to bet using different NairaBet Bet Combinations (Combo) SINGLES Singles consists of single bets. I.e. it will contain just a single selection of any sport. The bet slip of a singles will look like
More informationStatistical Learning Theory Meets Big Data
Statistical Learning Theory Meets Big Data Randomized algorithms for frequent itemsets Eli Upfal Brown University Data, data, data In God we trust, all others (must) bring data Prof. W.E. Deming, Statistician,
More informationDatabase Management Systems. Redundancy and Other Problems. Redundancy
Database Management Systems Winter 2004 CMPUT 391: Database Design Theory or Relational Normalization Theory Dr. Osmar R. Zaïane Lecture 2 Limitations of Relational Database Designs Provides a set of guidelines,
More informationIncSpan: Incremental Mining of Sequential Patterns in Large Database
IncSpan: Incremental Mining of Sequential Patterns in Large Database Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign Urbana, Illinois 61801 hcheng3@uiuc.edu Xifeng
More informationA Data Mining Framework for Automatic Online Customer Lead Generation
A Data Mining Framework for Automatic Online Customer Lead Generation by Md. Abdur Rahman Submitted in partial fulfillment of the requirements for the degree of Master of Computer Science at Dalhousie
More informationIntroduction to Market Basket Analysis Bill Qualls, First Analytics, Raleigh, NC
Paper AA07-2013 Introduction to Market Basket Analysis Bill Qualls, First Analytics, Raleigh, NC ABSTRACT Market Basket Analysis (MBA) is a data mining technique which is widely used in the consumer package
More informationPractical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING
Practical Applications of DATA MINING Sang C Suh Texas A&M University Commerce r 3 JONES & BARTLETT LEARNING Contents Preface xi Foreword by Murat M.Tanik xvii Foreword by John Kocur xix Chapter 1 Introduction
More informationHow To Find Out What A Key Is In A Database Engine
Database design theory, Part I Functional dependencies Introduction As we saw in the last segment, designing a good database is a non trivial matter. The E/R model gives a useful rapid prototyping tool,
More informationSoccer Bet Types Content
Soccer Bet Types Content Content 1 1. Asian Handicap 2 Examples on Asian Handicap All-Up Win bets 3 2. Win/Draw/Win 5 3. All-Up Win 6 Banker Combo 9 Multiple Combo 12 4. Over/Under 14 5. Correct Scores
More informationDesign of Relational Database Schemas
Design of Relational Database Schemas T. M. Murali October 27, November 1, 2010 Plan Till Thanksgiving What are the typical problems or anomalies in relational designs? Introduce the idea of decomposing
More informationMining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism
Decision Support Systems 42 (2006) 1 24 www.elsevier.com/locate/dsw Mining association rules with multiple minimum supports: a new mining algorithm and a support tuning mechanism Ya-Han Hu a, Yen-Liang
More informationData Mining: Introduction. Lecture Notes for Chapter 1. Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler
Data Mining: Introduction Lecture Notes for Chapter 1 Slides by Tan, Steinbach, Kumar adapted by Michael Hahsler Why Mine Data? Commercial Viewpoint Lots of data is being collected and warehoused - Web
More informationImproving Apriori Algorithm to get better performance with Cloud Computing
Improving Apriori Algorithm to get better performance with Cloud Computing Zeba Qureshi 1 ; Sanjay Bansal 2 Affiliation: A.I.T.R, RGPV, India 1, A.I.T.R, RGPV, India 2 ABSTRACT Cloud computing has become
More informationAssociation Rules Mining: A Recent Overview
GESTS International Transactions on Computer Science and Engineering, Vol.32 (1), 2006, pp. 71-82 Association Rules Mining: A Recent Overview Sotiris Kotsiantis, Dimitris Kanellopoulos Educational Software
More informationDatabase Design and Normalization
Database Design and Normalization Chapter 10 (Week 11) EE562 Slides and Modified Slides from Database Management Systems, R. Ramakrishnan 1 Computing Closure F + Example: List all FDs with: - a single
More informationCH3 Boolean Algebra (cont d)
CH3 Boolean Algebra (cont d) Lecturer: 吳 安 宇 Date:2005/10/7 ACCESS IC LAB v Today, you ll know: Introduction 1. Guidelines for multiplying out/factoring expressions 2. Exclusive-OR and Equivalence operations
More informationEFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS
EFFECTIVE USE OF THE KDD PROCESS AND DATA MINING FOR COMPUTER PERFORMANCE PROFESSIONALS Susan P. Imberman Ph.D. College of Staten Island, City University of New York Imberman@postbox.csi.cuny.edu Abstract
More informationKnowledge Discovery and Data Mining. Structured vs. Non-Structured Data
Knowledge Discovery and Data Mining Unit # 2 1 Structured vs. Non-Structured Data Most business databases contain structured data consisting of well-defined fields with numeric or alphanumeric values.
More informationData Mining Classification: Decision Trees
Data Mining Classification: Decision Trees Classification Decision Trees: what they are and how they work Hunt s (TDIDT) algorithm How to select the best split How to handle Inconsistent data Continuous
More informationData Mining to Recognize Fail Parts in Manufacturing Process
122 ECTI TRANSACTIONS ON ELECTRICAL ENG., ELECTRONICS, AND COMMUNICATIONS VOL.7, NO.2 August 2009 Data Mining to Recognize Fail Parts in Manufacturing Process Wanida Kanarkard 1, Danaipong Chetchotsak
More information澳 門 彩 票 有 限 公 司 SLOT Sociedade de Lotarias e Apostas Mútuas de Macau, Lda. Soccer Bet Types
Soccer Bet Types 1. Asian Handicap Bet on a team to win in a designated match. Bets will be fully refunded in the case of a draw result after calculating handicap-goal*. *Handicap-goal Handicap-goal applies
More informationData Mining/Knowledge Discovery/ Intelligent Data Analysis. Data Mining. Key Issue: Feedback. Applications
Data Mining Originally statistician s term for overuse of data to draw invalid inferences. ESP? 1/1000 people guessed color (red/black) of 10 out of 10 cards correctly! Data Mining/Knowledge Discovery/
More information