Chapter 12 Discovering New Knowledge Data Mining

Similar documents

Social Media Mining. Data Mining Essentials

Predictive Dynamix Inc

not possible or was possible at a high cost for collecting the data.

EFFICIENT DATA PRE-PROCESSING FOR DATA MINING

Knowledge-based systems and the need for learning

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

MBA Data Mining & Knowledge Discovery

Title. Introduction to Data Mining. Dr Arulsivanathan Naidoo Statistics South Africa. OECD Conference Cape Town 8-10 December 2010.

Azure Machine Learning, SQL Data Mining and R

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

Data Mining and Neural Networks in Stata

8. Machine Learning Applied Artificial Intelligence

Neural Networks and Support Vector Machines

The Data Mining Process

Data Mining for Customer Service Support. Senioritis Seminar Presentation Megan Boice Jay Carter Nick Linke KC Tobin

An Introduction to Data Mining

Data Mining Practical Machine Learning Tools and Techniques

A Property & Casualty Insurance Predictive Modeling Process in SAS

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Data Mining Algorithms Part 1. Dejan Sarka

Role of Neural network in data mining

Data Mining Applications in Higher Education

Predictive Modeling in Workers Compensation 2008 CAS Ratemaking Seminar

How To Use Neural Networks In Data Mining

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

An Overview of Knowledge Discovery Database and Data mining Techniques

Comparison of K-means and Backpropagation Data Mining Algorithms

Applying Data Mining Technique to Sales Forecast

A Decision Tree for Weather Prediction

Data Mining Part 5. Prediction

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Data Mining for Knowledge Management. Classification

A STUDY ON DATA MINING INVESTIGATING ITS METHODS, APPROACHES AND APPLICATIONS

Neural Network Predictor for Fraud Detection: A Study Case for the Federal Patrimony Department

Classification algorithm in Data mining: An Overview

Machine Learning using MapReduce

Data Mining with Weka

Potential Value of Data Mining for Customer Relationship Marketing in the Banking Industry

6.2.8 Neural networks for data mining

Data Mining on Streams

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

Weather forecast prediction: a Data Mining application

Data Mining. 1 Introduction 2 Data Mining methods. Alfred Holl Data Mining 1

Lecture 6. Artificial Neural Networks

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

NEURAL NETWORKS IN DATA MINING

Practical Applications of DATA MINING. Sang C Suh Texas A&M University Commerce JONES & BARTLETT LEARNING

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

How To Make A Credit Risk Model For A Bank Account

Decision-Tree Learning

Prediction of Stock Performance Using Analytical Techniques

Data Mining Part 5. Prediction

Data Mining Applications in Fund Raising

Knowledge Discovery and Data Mining

Feed-Forward mapping networks KAIST 바이오및뇌공학과 정재승

Attribution. Modified from Stuart Russell s slides (Berkeley) Parts of the slides are inspired by Dan Klein s lecture material for CS 188 (Berkeley)

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Gerry Hobbs, Department of Statistics, West Virginia University

Churn Prediction. Vladislav Lazarov. Marius Capota.

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

DATA MINING TECHNIQUES AND APPLICATIONS

The KDD Process: Applying Data Mining

NEURAL NETWORKS A Comprehensive Foundation

Data Mining + Business Intelligence. Integration, Design and Implementation

Data Mining - Evaluation of Classifiers

Data Mining Techniques and its Applications in Banking Sector

1. Classification problems

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

: Introduction to Machine Learning Dr. Rita Osadchy

Data quality in Accounting Information Systems

Machine Learning. Chapter 18, 21. Some material adopted from notes by Chuck Dyer

Machine Learning with MATLAB David Willingham Application Engineer

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Foundations of Artificial Intelligence. Introduction to Data Mining

More Data Mining with Weka

Analytics on Big Data

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

A Basic Guide to Modeling Techniques for All Direct Marketing Challenges

Keywords data mining, prediction techniques, decision making.

Lecture 6 - Data Mining Processes

Application of Predictive Analytics for Better Alignment of Business and IT

Neural network models: Foundations and applications to an audit decision problem

Introduction to Artificial Intelligence G51IAI. An Introduction to Data Mining

Artificial Neural Network, Decision Tree and Statistical Techniques Applied for Designing and Developing Classifier

ANALYTICS IN BIG DATA ERA

Data Mining: Overview. What is Data Mining?

Medical Information Management & Mining. You Chen Jan,15, 2013 You.chen@vanderbilt.edu

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Decision Trees. Andrew W. Moore Professor School of Computer Science Carnegie Mellon University.

EMPIRICAL STUDY ON SELECTION OF TEAM MEMBERS FOR SOFTWARE PROJECTS DATA MINING APPROACH

Event driven trading new studies on innovative way. of trading in Forex market. Michał Osmoła INIME live 23 February 2016

VTT INFORMATION TECHNOLOGY. Overview of Data Mining for Customer Behavior Modeling

Transcription:

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu

Chapter Objectives Introduce the student to the concept of Data Mining (DM), also known as Knowledge Discovery in Databases (KDD). How it is different from knowledge elicitation from experts How it is different from extracting existing knowledge from databases. The objectives of data mining Explanation of past events (descriptive DM) Prediction of future events (predictive DM) (continued)

Chapter Objectives (cont.) Introduce the student to the different classes of statistical methods available for DM Classical statistics (e.g., regression, curve fitting, ) Induction of symbolic rules Neural networks (a.k.a. connectionist models) Introduce the student to the details of some of the methods described in the chapter.

Historical Perspective DM, a.k.a. KDD, arose at the intersection of three independently evolved research directions: Classical statistics and statistical pattern recognition Machine learning (from symbolic AI) Neural networks

Objectives of Data Mining Descriptive DM seeks patterns in past actions or activities to affect these actions or activities eg, seek patterns indicative of fraud in past records Predictive DM looks at past history to predict future behavior Classification classifies a new instance into one of a set of discrete predefined categories Clustering groups items in the data set into different categories Affinity or association finds items closely associated in the data set

Classical statistics & statistical pattern recognition Provide a survey of the most important statistical methods for data mining Curve fitting with least squares method Multi-variate correlation K-Means clustering Market Basket analysis Discriminant analysis Logistic regression

Figure 12.14 2-D input data plotted on a graph y x

Figure 12.15 data and deviations y x Best fitting equation x

Induction of symbolic rules Present a detailed description of the symbolic approach to data mining rule induction by learning decision trees Present the main algorithm for rule induction C5.0 and its ancestors, ID3 and CLS (from machine learning) CART (Classification And Regression Trees) and CHAID, very similar algorithms for rule induction (independently developed in statistics) Present several example applications of rule induction

Table 12.1 decision tables (if ordered, then decision lists) Name Outlook Temperature Humidity Class Data sample1 Sunny Mild Dry Enjoyable Data sample2 Cloudy Cold Humid Not Enjoyable Data sample3 Rainy Mild Humid Not Enjoyable Data sample4 Sunny Hot Humid Not Enjoyable Note: DS = Data Sample

Figure 12.1 decision trees (a.k.a. classification trees) Yes Is the stock s price/earning s ratio > 5? No Root node Has the company s quarterly profit increased over the last year by 10% or more? Leaf node Don t buy No Yes Don t buy Is the company s management stable? Yes No Buy Don t Buy

Induction trees An induction tree is a decision tree holding the data samples (of the training set) Built progressively by gradually segregating the data samples

Figure 12.2 simple induction tree (step 1) {DS1, DS2, DS3, DS4} Outlook = Cloudy = Rainy = Sunny Rain DS2 - Not Enjoyable DS3 - Not Enjoyable DS1 Enjoyable DS4 - Not Enjoyable

Figure 12.3 simple induction tree (step 2) {DS1, DS2, DS3, DS4} Outlook = Cloudy = Rain = Sunny Rain Temperature DS1 Enjoyable DS2 -Not Enjoyable DS3 -Not Enjoyable DS4 Not Enjoyable = Cold = Mild = Hot None DS1 - Enjoyable DS4 Not Enjoyable

Writing the induced tree as rules Rule 1. If the Outlook is cloudy, then the Weather is not enjoyable. Rule 2. If the Outlook is rainy, then the Weather is not enjoyable. Rule 3. If the Outlook is sunny and Temperature is mild, then the Weather is enjoyable. Rule 4. If the Outlook is sunny and Temperature is cold, then the Weather is not enjoyable.

Learning decision trees for classification into multiple classes In the previous example, we were learning a function to predict a boolean (enjoyable = true/false) output. The same approach can be generalized to learn a function that predicts a class (when there are multiple predefined classes/categories). For example, suppose we are attempting to select a KBS shell for some application: with the following as our options: ThoughtGen, Offsite, Genie, SilverWorks, XS, MilliExpert using the following attributes and range of values: Development language: { Java, C++, Lisp } Reasoning method: { forward, backward } External interfaces: { dbase, spreadsheetxl, ASCII file, devices } Cost: any positive number Memory: any positive number

Table 12.2 collection of data samples (training set) described as vectors of attributes (feature vectors) Language Reasoning method Interface Method Cost Memory Classification Java Backward SpreadsheetXL 250 128MB MilliExpert Java Backward ASCII 250 128MB MilliExpert Java Backward dbase 195 256MB ThoughtGen Java * Devices 985 512MB OffSite C++ Forward * 6500 640MB Genie LISP Forward * 15000 5GB Silverworks C++ Backward * 395 256MB XS LISP Backward * 395 256MB XS

Figure 12.4 decision tree resulting from selection of the language attribute Language Java C++ Lisp MilliExpert ThoughtGen OffSite Genie XS SilverWorks XS

Figure 12.5 decision tree resulting from addition of the reasoning method attribute Language Java C++ Lisp Backwards MilliExpert ThoughtGen OffSite Backward Genie XS SilverWorks XS Backward Forward Forward Forward MilliExpert ThoughtGen OffSite OffSite XS Ginie XS SilverWorks

Figure 12.6 final decision tree Language Java C++ Lisp Backward MilliExpert ThoughtGen OffSite Backward Genie XS Backward SilverWorks XS Forward Forward Forward MilliExpert ThoughtGen OffSite OffSite XS Genie XS SilverWorks ASCII Devices SpreadsheetXL dbase MilliExpert MilliExpert ThoughtGen OffSite

Order of choosing attributes Note that the decision tree that is built depends greatly on which attributes you choose first

Figure 12.2 simple induction tree (step 1) {DS1, DS2, DS3, DS4} Outlook = Cloudy = Rainy = Sunny Rain DS2 - Not Enjoyable DS3 - Not Enjoyable DS1 Enjoyable DS4 - Not Enjoyable

Figure 12.3 simple induction tree (step 2) {DS1, DS2, DS3, DS4} Outlook = Cloudy = Rain = Sunny Rain Temperature DS1 Enjoyable DS2 -Not Enjoyable DS3 -Not Enjoyable DS4 Not Enjoyable = Cold = Mild = Hot None DS1 - Enjoyable DS4 Not Enjoyable

Table 12.1 decision tables (if ordered, then decision lists) Name Outlook Temperature Humidity Class Data sample1 Sunny Mild Dry Enjoyable Data sample2 Cloudy Cold Humid Not Enjoyable Data sample3 Rainy Mild Humid Not Enjoyable Data sample4 Sunny Hot Humid Not Enjoyable Note: DS = Data Sample

Figure 12.7 {DS1, DS2, DS3, DS4} Humidity = Humid = Dry DS2 Not Enjoyable DS3 Not Enjoyable DS4 Not Enjoyable DS1 - Enjoyable

Order of choosing attributes (cont) One sensible objective is to seek the minimal tree, ie, the smallest tree required to classify all training set samples correctly Occam s Razor principle: the simplest explanation is the best What order should you choose attributes in, so as to obtain the minimal tree? Often too complex to be feasible Heuristics used Information gain, computed using information theoretic quantities, is the best way in practice

Artificial Neural Networks Provide a detailed description of the connectionist approach to data mining neural networks Present the basic neural network architecture the multi-layer feed forward neural network Present the main supervised learning algorithm backpropagation Present the main unsupervised neural network architecture the Kohonen network

Figure 12.8 simple model of a neuron x 1 W 1 Inputs x 2 W 2 Σ y Activation function f() x k W n

Figure 12.9 three common activation functions 1.0 Threshold function Piece-wise Linear function Sigmoid function

Figure 12.10 simple singlelayer neural network Inputs Outputs

Figure 12.11 two-layer neural network

Supervised Learning: Back Propagation An iterative learning algorithm with three phases: 1. Presentation of the examples (input patterns with outputs) and feed forward execution of the network 2. Calculation of the associated errors when the output of the previous step is compared with the expected output and back propagation of this error 3. Adjustment of the weights

Unsupervised Learning: Kohonen Networks Clustering by an iterative competitive algorithm Note relation to CBR

Figure 12.12 clusters of related data in 2-D space Variable B Cluster #2 Cluster #1 Variable A

Figure 12.13 Kohonen selforganizing map W i Inputs

When to use what Provide useful guidelines for determining what technique to use for specific problems

Table 12.3 Goal Find linear combination of predictors that best separate the population Predict the probability of outcome being in a particular category Input Variables (Predictors) Output Variables (Outcomes) Statistical Technique Continuous Discrete Discriminant Analysis Continuous Discrete Logistic and Multinomial Regression Examples [SPSS, 2000] Predict instances of fraud Predict whether customers will remain or leave (churners or not) Predict which customers will respond to a new product or offer Predict outcomes of various medical procedures Predicting insurance policy renewal Predicting fraud Predicting which product a customer will buy Predicting that a product is likely to fail

Table 12.3 (cont.) Goal Input Variables (Predictors) Output Variables (Outcomes) Statistical Technique Examples [SPSS, 2000] Output is a linear combination of input variables For experiments and repeated measures of the same sample To predict future events whose history has been collected at regular intervals Continuous Continuous Linear Regression Most inputs must be Discrete Continuous Analysis of Variance (ANOVA) Continuous Continuous Time Series Analysis Predict expected revenue in dollars from a new customer Predict sales revenue for a store Predict waiting time on hold for callers to an 800 number. Predict length of stay in a hospital based on patient characteristics and medical condition. Predict which environmental factors are likely to cause cancer Predict future sales data from past sales records

Table 12.4 Goal Input (Predictor) Variables Output (Outcome) Variables Statistical Technique Examples [SPSS, 2000] Predict outcome based on values of nearest neighbors Continuous, Discrete, and Text Continuous or Discrete Memorybased Reasoning (MBR) Predicting medical outcomes Predict by splitting data into subgroups (branches) Continuous or Discrete (Different techniques used based on data characteristics) Continuous or Discrete (Different techniques used based on data characteristics) Decision Trees Predicting which customers will leave Predicting instances of fraud Predict outcome in complex nonlinear environments Continuous or Discrete Continuous or Discrete Neural Networks Predicting expected revenue Predicting credit risk

Table 12.5 Goal Predict by splitting data into more than two subgroups (branches) Predict by splitting data into more than two subgroups (branches) Input (Predictor) Variables Continuous, Discrete, or Ordinal Output (Outcome) Variables Discrete Statistical Technique Chi-square Automatic Interaction Detection (CHAID) Examples [SPSS, 2000] Predict which demographic combinations of predictors yield the highest probability of a sale Predict which factors are causing product defects in manufacturing Continuous Discrete C5.0 Predict which loan customers are considered a good risk Predict which factors are associated with a country s investment risk

Table 12.5 (cont.) Goal Input (Predictor) Variables Output (Outcome) Variables Statistical Technique Examples [SPSS, 2000] Predict by splitting data into binary subgroups (branches) Predict by splitting data into binary subgroups (branches) Continuous Continuous Classification and Regression Trees (CART) Continuous Discrete Quick, Unbiased, Efficient, Statistical Tree (QUEST) Predict which factors are associated with a country s competitiveness Discover which variables are predictors of increased customer profitability Predict who needs additional care after heart surgery

Table 12.6 Goal Input Variables (Predictor) Output Variables (Outcome) Statistical Technique Examples [SPSS, 2000] Find large groups of cases in large data files that are similar on a small set of input characteristics, Continuous or Discrete No outcome variable K-means Cluster Analysis Customer segments for marketing Groups of similar insurance claims To create large cluster memberships Kohonen Neural Networks Cluster customers into segments based on demographics and buying patterns Create small set associations and look for patterns between many categories Logical No outcome variable Market Basket or Association Analysis with Apriori Identify which products are likely to be purchased together Identify which courses students are likely to take together

Errors and their significance in DM Discuss the importance of errors in data mining studies Define the types of errors possible in data mining studies

Table 12.7 Confusion Matrix Heart Disease Diagnostic Predicted No Disease Predicted Presence of Disease Actual No Disease 118 (72%) 46 (28%) Actual Presence of Disease 43 (30.9%) 96 (69.1%)

Table 12.7 Confusion Matrix Heart Disease Diagnostic Predicted No Disease Predicted Presence of Disease Actual No Disease 118 (72%) 46 (28%) Actual Presence of Disease 43 (30.9%) 96 (69.1%) false negatives

Table 12.7 Confusion Matrix false positives Heart Disease Diagnostic Predicted No Disease Predicted Presence of Disease Actual No Disease 118 (72%) 46 (28%) Actual Presence of Disease 43 (30.9%) 96 (69.1%)

Conclusions You should know when to use: Curve-fitting algorithms. Statistical methods for clustering. The C5.0 algorithm to capture rules from examples. Basic feedforward neural networks with supervised learning. Unsupervised learning, clustering techniques and the Kohonen networks. Other statistical techniques.

Chapter 12 Discovering New Knowledge Data Mining Becerra-Fernandez, et al. -- Knowledge Management 1/e -- 2004 Prentice Hall Additional material 2007 Dekai Wu