{ Mining, Sets, of, Patterns }

Similar documents
Association Rule Mining

So, how do you pronounce. Jilles Vreeken. Okay, now we can talk. So, what kind of data? binary. * multi-relational

Azure Machine Learning, SQL Data Mining and R

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Classification and Prediction

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Chapter 12 Discovering New Knowledge Data Mining

Data, Measurements, Features

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Introduction to Data Mining

Data Mining Part 5. Prediction

Data Mining for Business Analytics

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Using multiple models: Bagging, Boosting, Ensembles, Forests

Graph Mining and Social Network Analysis

Principles of Dat Da a t Mining Pham Tho Hoan hoanpt@hnue.edu.v hoanpt@hnue.edu. n

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Professor Anita Wasilewska. Classification Lecture Notes

Machine Learning and Data Analysis overview. Department of Cybernetics, Czech Technical University in Prague.

Decision Tree Learning on Very Large Data Sets

Introduction to Pattern Recognition

ORIGINAL ARTICLE ENSEMBLE APPROACH FOR RULE EXTRACTION IN DATA MINING.

Data Mining for Knowledge Management. Classification

Lecture 10: Regression Trees

1. What are the uses of statistics in data mining? Statistics is used to Estimate the complexity of a data mining problem. Suggest which data mining

Principles of Data Mining by Hand&Mannila&Smyth

Information Management course

Data Mining of Web Access Logs

Learning Predictive Clustering Rules

Machine Learning using MapReduce

Data Preprocessing. Week 2

Social Media Mining. Data Mining Essentials

GPSQL Miner: SQL-Grammar Genetic Programming in Data Mining

T Non-discriminatory Machine Learning

CS 2750 Machine Learning. Lecture 1. Machine Learning. CS 2750 Machine Learning.

Machine Learning. CUNY Graduate Center, Spring Professor Liang Huang.

3F3: Signal and Pattern Processing

Data Mining Classification: Decision Trees

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Introduction to Data Mining

Data Mining Applications in Manufacturing

Sanjeev Kumar. contribute

Introduction to Learning & Decision Trees

The Data Mining Process

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

COURSE RECOMMENDER SYSTEM IN E-LEARNING

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Reference Books. Data Mining. Supervised vs. Unsupervised Learning. Classification: Definition. Classification k-nearest neighbors

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Machine Learning. Term 2012/2013 LSI - FIB. Javier Béjar cbea (LSI - FIB) Machine Learning Term 2012/ / 34

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

DATA MINING FOR THE MANAGEMENT OF SOFTWARE DEVELOPMENT PROCESS

Data Mining Jargon. Bob Muenchen The Statistical Consulting Center

The Data Mining Process

Unsupervised Learning and Data Mining. Unsupervised Learning and Data Mining. Clustering. Supervised Learning. Supervised Learning

Some Research Challenges for Big Data Analytics of Intelligent Security

Classification algorithm in Data mining: An Overview

Data Mining: Overview. What is Data Mining?

High-dimensional labeled data analysis with Gabriel graphs

The Minimum Consistent Subset Cover Problem and its Applications in Data Mining

Machine Learning: Overview

Data Mining: Partially from: Introduction to Data Mining by Tan, Steinbach, Kumar

Scoring the Data Using Association Rules

DATA MINING: AN OVERVIEW

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Ensemble Methods. Knowledge Discovery and Data Mining 2 (VU) ( ) Roman Kern. KTI, TU Graz

Support Vector Machines with Clustering for Training with Very Large Datasets

Search and Data Mining: Techniques. Applications Anya Yarygina Boris Novikov

Data Mining in Social Networks

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Subgraph Patterns: Network Motifs and Graphlets. Pedro Ribeiro

Data Mining III: Numeric Estimation

Challenges and Lessons from NIST Data Science Pre-pilot Evaluation in Introduction to Data Science Course Fall 2015

Introduction. A. Bellaachia Page: 1

Big Data - Lecture 1 Optimization reminders

Chapter ML:XI (continued)

Decision Trees from large Databases: SLIQ

Efficient Integration of Data Mining Techniques in Database Management Systems

Protein Protein Interaction Networks

Data Mining Applications in Higher Education

Chapter 4 Data Mining A Short Introduction. 2006/7, Karl Aberer, EPFL-IC, Laboratoire de systèmes d'informations répartis Data Mining - 1

D-optimal plans in observational studies

Interactive Data Exploration using Pattern Mining

Predict Influencers in the Social Network

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

Microsoft Azure Machine learning Algorithms

Weight of Evidence Module

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

LABEL PROPAGATION ON GRAPHS. SEMI-SUPERVISED LEARNING. ----Changsheng Liu

Didacticiel Études de cas. Association Rules mining with Tanagra, R (arules package), Orange, RapidMiner, Knime and Weka.

from Larson Text By Susan Miertschin

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Scheduling Home Health Care with Separating Benders Cuts in Decision Diagrams

Data Mining Techniques in CRM

DATA WAREHOUSE E KNOWLEDGE DISCOVERY

Transcription:

{ Mining, Sets, of, Patterns } A tutorial at ECMLPKDD2010 September 20, 2010, Barcelona, Spain by B. Bringmann, S. Nijssen, N. Tatti, J. Vreeken, A. Zimmermann 1

Overview Tutorial 00:00 00:45 Introduction Siegfried Nijssen Unsupervised, explorative pattern set mining Jilles Vreeken 01:30 Break 02:00 Supervised pattern set mining Björn Bringmann 02:45 End 2

Practical information Even though we did our best to achieve otherwise: WARNING This TUTORIAL is neither complete nor unbiased REFERENCES are not necessarily authoritative or complete More information (including references): http://www.cs.kuleuven.be/conference/msop/ 3

Part I Introduction 4

Overview part I Patterns Pattern sets Definitions Motivations Dimensions Algorithms 5

Overview part I Patterns Definitions Motivations Dimensions Algorithms 5

Overview part I Patterns Definitions Motivations Dimensions Algorithms 5

What is a pattern? Recurring structure Data Pattern 6

What is a pattern? 10,0 7,5 5,0 Data 2,5 0 0 2,5 5,0 7,5 10,0 Pattern y = x - 1 7

What is a pattern? 10,0 7,5 5,0 Data 2,5 0 0 2,5 5,0 7,5 10,0 Pattern y = x - 1 7

What is a pattern? In this tutorial we are looking for Recurring structures...... in enumerable, discrete domains Hence we do not consider a regression model to be a pattern... 8

Overview part I Patterns Definitions Motivations Dimensions Algorithms 9

Overview part I Patterns Definitions Motivations Dimensions Algorithms 9

What is a pattern? Example 1: Frequent Itemset in Market Basket Data 10

What is a pattern? Example 1: Frequent Itemset in Market Basket Data support( )=3 10

What is a pattern? Example 2: Co-cluster in Gene Expression Data Genes Conditions Lyssiotis et al. 11

What is a pattern? Example 3: Conjunctive Formula in UCI Data 4.9,3.1,1.5,0.1,Iris-setosa 5.0,3.2,1.2,0.2,Iris-setosa 5.5,3.5,1.3,0.2,Iris-setosa 4.9,3.1,1.5,0.1,Iris-setosa 4.4,3.0,1.3,0.2,Iris-setosa 5.1,3.4,1.5,0.2,Iris-setosa 5.0,3.5,1.3,0.3,Iris-setosa 4.5,2.3,1.3,0.3,Iris-setosa 4.4,3.2,1.3,0.2,Iris-setosa 5.0,3.5,1.6,0.6,Iris-setosa 5.1,3.8,1.9,0.4,Iris-setosa 4.8,3.0,1.4,0.3,Iris-setosa 5.1,3.8,1.6,0.2,Iris-setosa 4.6,3.2,1.4,0.2,Iris-setosa 5.3,3.7,1.5,0.2,Iris-setosa 5.0,3.3,1.4,0.2,Iris-setosa 7.0,3.2,4.7,1.4,Iris-versicolor 6.4,3.2,4.5,1.5,Iris-versicolor 6.9,3.1,4.9,1.5,Iris-versicolor 5.5,2.3,4.0,1.3,Iris-versicolor 6.5,2.8,4.6,1.5,Iris-versicolor 5.7,2.8,4.5,1.3,Iris-versicolor Petal length >= 2.0 and Petal width <= 0.5 12

What is a pattern? Example 4: Frequent Subgraph in Molecules 13

What is a pattern? Recurring structure in enumerable, discrete domain Enumerable, discrete domains: itemsets, graphs, sequences, trees,... Recurrence as determined by constraints: support constraint, size constraint, area constraint,... 14

The problem: too many patterns 15

Too many patterns... Solution 1: constraint-based mining Solution 2: pattern set mining 16

Overview part I Patterns Definitions Motivations Dimensions Algorithms 17

Overview part I Patterns Definitions Motivations Dimensions Algorithms 17

Solution 1: pattern constraints Constraint on each pattern individually based on background knowledge condensed representations class labels 18

Constraints: background knowledge Support constraints Syntactical constraints Statistical constraints difference with expectation taxonomies vs = diapers 19

Constraints: condensed representations If we pass a pattern through the data, we obtain another pattern derives 20

Constraints: condensed representations Closed patterns Pasquier et al. Free/generator patterns Pasquier et al. Maximal frequent patterns Bayardo Non-derivable patterns Calders et al. 21

Constraints: class labels vs 22

Constraints: class labels GIVEN database D, target c, threshold t, class of patterns FIND all patterns p with f(p,d,c)>t 23

Constraints: class labels Discriminative Pattern Subgroup Correlated Pattern Emerging Pattern 24

Constraints: class labels Many different names for this setting Novak, Webb and Lavrac Pattern name Emerging pattern Contrast set Correlated pattern Subgroup Discriminative pattern Class association rule Typical measure Growth rate Difference in rel support Chi2 Weighted relative accuracy Information gain Confidence Dong et al. Bay et al. Morishita et al. Kloesgen et al. Cheng et al. Liu et al. 25

Overview part I Patterns Definitions Motivations Dimensions Algorithms 26

Overview part I Patterns Definitions Motivations Dimensions Algorithms 26

How to find patterns? In principle two ways: Greedy / heuristic Fast Overlooks solutions Complete search Finds everything Slower 27

How to find patterns? Complete search under constraints often feasible GIVEN database D, constraint φ on D, class of patterns C FIND all patterns p in class C satisfying φ Key Observation: (Anti-)monotonicity 28

Lots of solutions... what s their problem? 29

Overview part I Patterns Motivations Definition Dimensions Algorithms 30

Overview part I Pattern sets Motivations Definition Dimensions Algorithms 30

The problem - complex pattern relationships Unsupervised descriptive task 31

The problem - complex pattern relationships Unsupervised descriptive task 31

The problem - complex pattern relationships Unsupervised descriptive task 31

The problem - complex pattern relationships Unsupervised descriptive task 31

The problem - complex pattern relationships Unsupervised descriptive task 31

The problem - complex pattern relationships Unsupervised descriptive task 31

The problem - complex pattern relationships Unsupervised descriptive task 31

The problem - complex pattern relationships Supervised predictive task 32

The problem - complex pattern relationships Supervised predictive task All patterns mined P1 P2 32

The problem - complex pattern relationships Supervised predictive task All patterns mined P1 P2 32

The problem - complex pattern relationships Supervised predictive task All patterns mined P1 P2 32

Overview part I Pattern sets Motivations Definitions Dimensions Algorithms 33

Overview part I Pattern sets Motivations Definitions Dimensions Algorithms 33

Pattern set mining GIVEN a data mining task FIND an interrelated set of patterns useful for this task 34

Overview part I Pattern sets Motivations Definition Dimensions Algorithms 35

Overview part I Pattern sets Motivations Definition Dimensions Algorithms 35

Patterns vs Pattern sets unsupervised supervised pattern mining no target no relationships relevant to target no relationships pattern set mining no target relationships part II relevant to target relationships part III 36

Task dimensions unsupervised supervised descriptive Association Analysis Tiling (Co-)Clustering Probabilistic models Subgroup discovery Exceptional model mining predictive Predictive clustering part II Classification Regression part III 37

Task dimensions Supervised vs unsupervised Predictive vs descriptive 38

Task dimensions Supervised vs unsupervised Predictive vs descriptive (Semi-)Structured data vs Binary data 38

Task dimensions Supervised vs unsupervised Predictive vs descriptive (Semi-)Structured data vs Binary data vs 38

Task dimensions Supervised vs unsupervised Predictive vs descriptive (Semi-)Structured data vs Binary data 38

Task dimensions Supervised vs unsupervised Predictive vs descriptive (Semi-)Structured data vs Binary data Constrained vs Unconstrained 38

Task dimensions Supervised vs unsupervised Predictive vs descriptive (Semi-)Structured data vs Binary data Constrained vs Unconstrained vs 38

Task dimensions Supervised vs unsupervised Predictive vs descriptive (Semi-)Structured data vs Binary data Constrained vs Unconstrained 38

Task dimensions Supervised vs unsupervised Predictive vs descriptive (Semi-)Structured data vs Binary data Constrained vs Unconstrained Interpretable model vs Black box 38

Overview part I Pattern sets Motivations Definitions Dimensions Algorithms 39

Overview part I Pattern sets Motivations Definitions Dimensions Algorithms 39

How to find pattern sets? Mining Constraint DB Pattern Mining PS Pattern Selection PS Model Induction M Optimisation Criteria 40

How to find pattern sets? Pattern set constraint Mining Constraint DB Pattern Mining PS Pattern Selection PS Model Induction M Optimisation Criteria 40

How to find pattern sets? Pattern set constraint Mining Constraint DB Pattern Mining PS Pattern Selection PS Model Induction M Optimisation Criteria Model constraint 40

How to find pattern sets? Pattern set constraint Mining Constraint DB Pattern Mining PS Pattern Selection PS Model Induction M Optimisation Criteria Model constraint 40

How to find pattern sets? Pattern set constraint Mining Constraint DB Pattern Mining PS Pattern Selection PS Model Induction M Optimisation Criteria Model constraint 40

How to find pattern sets? Pattern set constraint Pattern set constraint Mining Constraint DB Pattern Mining PS Pattern Selection PS Model Induction M Optimisation Criteria Model constraint Model constraint 41

How to find pattern sets? 1 Model Independent Iterative Mining Pattern set constraint Mining Constraint DB Pattern Mining PS Pattern Selection PS Model Induction M Optimisation Criteria Model constraint Model constraint 41

How to find pattern sets? 1 Model Independent Iterative Mining Pattern set constraint 2 Model Independent Post Processing Mining Constraint DB Pattern Mining PS Pattern Selection PS Model Induction M Optimisation Criteria Model constraint Model constraint 41

How to find pattern sets? 1 Model Independent Iterative Mining Pattern set constraint 2 Model Independent Post Processing Mining Constraint DB Pattern Mining PS Pattern Selection PS Model Induction M 3 Model Dependent Iterative Mining Optimisation Criteria Model constraint 41

How to find pattern sets? 1 Model Independent Iterative Mining Pattern set constraint 2 Model Independent Post Processing Mining Constraint DB Pattern Mining PS Pattern Selection PS Model Induction M 3 Model Dependent Iterative Mining 4 Model Dependent Post Processing Optimisation Criteria Model constraint 41

Pattern set = Feature set 42

Pattern set = Feature set 42

Feature vs pattern selection Feature Selection Binary Feature Selection Pattern Selection We know more about patterns constraints used generality relationships 43

Overview Unsupervised Pattern set mining Part II Supervised Pattern set mining Part III How to score pattern sets How to find pattern sets 44

End of Part I 45