IT Applications in Business Analytics SS2016 / Lecture 07 Use Case 1 (Two Class Classification) Thomas Zeutschler

Similar documents
Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Data Mining Algorithms Part 1. Dejan Sarka

Azure Machine Learning, SQL Data Mining and R

Knowledge Discovery and Data Mining

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

CI6227: Data Mining. Lesson 11b: Ensemble Learning. Data Analytics Department, Institute for Infocomm Research, A*STAR, Singapore.

Data Mining for Knowledge Management. Classification

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Data mining techniques: decision trees

Classification and Prediction

Data Mining Classification: Decision Trees

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Performance Measures in Data Mining

Data Mining Classification: Basic Concepts, Decision Trees, and Model Evaluation. Lecture Notes for Chapter 4. Introduction to Data Mining

Introduction to Data Mining

Data Mining Practical Machine Learning Tools and Techniques

Lecture 10: Regression Trees

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Predictive Data modeling for health care: Comparative performance study of different prediction models

Decision-Tree Learning

TOWARDS SIMPLE, EASY TO UNDERSTAND, AN INTERACTIVE DECISION TREE ALGORITHM

Decision Trees from large Databases: SLIQ

Social Media Mining. Data Mining Essentials

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Supervised Learning (Big Data Analytics)

CRISP-DM: The life cicle of a data mining project. KDD Process

Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

Chapter 12 Discovering New Knowledge Data Mining

MACHINE LEARNING IN HIGH ENERGY PHYSICS

Beating the MLB Moneyline

Didacticiel Études de cas

Data Mining - Evaluation of Classifiers

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Data Mining Techniques Chapter 6: Decision Trees

MHI3000 Big Data Analytics for Health Care Final Project Report

Predicting the Risk of Heart Attacks using Neural Network and Decision Tree

The Data Mining Process

Gerry Hobbs, Department of Statistics, West Virginia University

Data quality in Accounting Information Systems

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Data Mining Cluster Analysis: Basic Concepts and Algorithms. Lecture Notes for Chapter 8. Introduction to Data Mining

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

Comparing the Results of Support Vector Machines with Traditional Data Mining Algorithms

Microsoft Azure Machine learning Algorithms

New Work Item for ISO Predictive Analytics (Initial Notes and Thoughts) Introduction

Introduction to Learning & Decision Trees

Predicting earning potential on Adult Dataset

Better credit models benefit us all

W6.B.1. FAQs CS535 BIG DATA W6.B If the distance of the point is additionally less than the tight distance T 2, remove it from the original set

BOR 6335 Data Mining. Course Description. Course Bibliography and Required Readings. Prerequisites

Performance Metrics for Graph Mining Tasks

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Big Data Analytics CSCI 4030

Data Mining Methods: Applications for Institutional Research

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

Professor Anita Wasilewska. Classification Lecture Notes

Data Mining. Nonlinear Classification

How To Cluster

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Knowledge Discovery and Data Mining

Maschinelles Lernen mit MATLAB

Data Mining Applications in Higher Education

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Implementation of Data Mining Techniques to Perform Market Analysis

An Overview of Predictive Analytics for Practitioners. Dean Abbott, Abbott Analytics

Data Mining - The Next Mining Boom?

Data Mining for Business Analytics

Evaluation & Validation: Credibility: Evaluating what has been learned

Predicting Flight Delays

Data Preprocessing. Week 2

Data Mining: Foundation, Techniques and Applications

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Performance Analysis of Naive Bayes and J48 Classification Algorithm for Data Classification

Fast Analytics on Big Data with H20

DATA MINING TECHNIQUES AND APPLICATIONS

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

1. Classification problems

Advanced analytics at your hands

An analysis of suitable parameters for efficiently applying K-means clustering to large TCPdump data set using Hadoop framework

Machine Learning Capacity and Performance Analysis and R

CRISP - DM. Data Mining Process. Process Standardization. Why Should There be a Standard Process? Cross-Industry Standard Process for Data Mining

CS590D: Data Mining Chris Clifton

USING THE PREDICTIVE ANALYTICS FOR EFFECTIVE CROSS-SELLING

Role of Customer Response Models in Customer Solicitation Center s Direct Marketing Campaign

Data Mining Clustering (2) Sheets are based on the those provided by Tan, Steinbach, and Kumar. Introduction to Data Mining

Analysis Tools and Libraries for BigData

Experiments in Web Page Classification for Semantic Web

Predict Influencers in the Social Network

S The Difference Between Predictive Modeling and Regression Patricia B. Cerrito, University of Louisville, Louisville, KY

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

MAXIMIZING RETURN ON DIRECT MARKETING CAMPAIGNS

Analytics on Big Data

Course Syllabus. Purposes of Course:

WEKA Explorer User Guide for Version 3-4-3

Machine Learning Logistic Regression

Transcription:

Hochschule Düsseldorf University of Applied Scienses Fachbereich Wirtschaftswissenschaften W Business Analytics (M.Sc.) IT in Business Analytics IT Applications in Business Analytics SS2016 / Lecture 07 Use Case 1 (Two Class Classification) SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 1

Let s get started be a business analytics consultant! SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 2

Case 1 Bike Sales SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 3

Point of Departure 2016 Polygon Whether you're making a go at XC mountain bike racing or simply looking to upgrade your confidence level on the trail, the Polygon hardtail mountain bike proves to be the perfect choice. The Polygon feature sour race-proven 29er geometry with a low-slung bottom bracket and incredibly short chainstays for a planted sensation, snappy handling, and efficient power transfer. It's the obvious mountain bike for anyone who demands speed and reliability. SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 4

Point of Departure Bike Shop We run a bike shop, both stationary and online. Based on an online competition we collected a couple of new customer records. We want to send an email to the most promising new customers to advertise our new 2016 mountain bike model, the Polygon. Who are they? SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 5

The best team will win 4x Teams volunteer to deliver the best proposal for the email campaign. Main Deliverable Proposal for list of new customers to send an email. Evaluate the best prediction model Use the ROC AUC (area under curve) value Present your results (next week) What have you done and why? (use your Knime workflows to explain) What is your conclusion and proposal? Compile a few slides, max. 10 minutes presentation SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 6

CRISP DM Phases and Tasks Business Understanding Determine Business Objectives Background. Business Objectives. Business Success Criteria. Assess Situation Inventory of Resources, Requirements, Assumptions and Constraints. Risks and Contingencies Terminology. Costs and Benefits. Determine Data Mining Goals Data Mining Goals. Data Mining Success Criteria. Data Understanding Collect Initial Data Initial Data Collection Report. Describe Data Data Description Report. Explore Data Data Exploration Report. Verify Data Quality Data Quality Report. Data Preparation Select Data Rationale for Inclusion/ Exclusion. Clean Data Data Cleaning Report. Construct Data Derived Attributes. Generated Records. Integrate Data Merged Data. Format Data Reformatted Data. Dataset Dataset Description. Modelling Select Modelling Technique Modelling Technique. Modelling Assumptions. Generate Test Design Test Design. Build Model Parameter Settings Models. Model Description. Assess Model Model Assessment. Revised Parameter Settings. Evaluation Evaluate Results Assessment of Data. Mining Results w.r.t. Business Success Criteria. Approved Models. Review Process Review of Process. Determine Next Steps List of Possible Actions. Decision. Deployment Plan Deployment Deployment Plan. Plan Monitoring and Maintenance Monitoring and Maintenance Plan. Produce Final Report Final Report. Final Presentation. Review Project Experience Documentation. Produce Project Plan Project Plan. Initial Assessment of Tools and Techniques. SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 7

Available Data Sheet: ExistingCustomers >>> Use for model training and test. Sheet: NewCustomers >>> Select promising emails receivers. https://wiwi.hs-duesseldorf.de/personen/thomas.zeutschler/seiten/default.aspx SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 8

Knime Sample Implementation Beat the teacher. Area Under Curve = 0,756 https://wiwi.hsduesseldorf.de/personen/thomas.zeuts chler/seiten/default.aspx Receiver Operating Characteristic (ROC), is a graphical plot that illustrates the performance of a binary classifier system as its discrimination threshold is varied. SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 9

Want to beat your teacher? (AUC 0,756) Do you have a full understanding of the business problem? What is about data quality? Do we need further data preparation? What is the class of the problem to solve (tip: cheat-sheet)? How to select the right / best prediction model? SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 10

Cheating SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 11

Two Class Classification SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 12

Two Class Classification Introduction Also called Binary Classification Statistical Problem: Classify the elements of a given set into two groups by applying a certain classification method. Application in economies: Customer selection, e.g. Whom to send an email? Portfolio decisions, e.g. What stocks or products to buy? Any kind of Yes/No assignment Application in medical testing: Has a patient a certain disease or not? SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 13

Two Class Classification Similar Problems Super-Problem: Statistical Classification One Class (unary) Classification Identify specific elements among others. Application: outlier detection, anomaly detection, novelty detection Multi-Class (multinomial) Classification Classify the elements of a given set into more than two groups by applying a certain classification method. Application: clustering, attribute assignment, just more then 2 classes SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 14

Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Biker Buyer? Yes Predicted Class No Actual Class Yes No SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 15

Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Biker Buyer? Yes Predicted Class No Actual Class Yes No true positives false positive true negatives false negatives error correct SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 16

Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Biker Buyer? Population = 3.017 Yes Predicted Class No Actual Class Yes No 96 204 77 2.640 SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 17

Two Class Classification Confusion Matrix Purpose: Evaluate the performance of a certain classification algorithm. Total Population predicted condition positive negative Prevalence = Σ Condition positive / Σ Total population real condition positive negative true positive false positive (type I error) false negative (type II error) true negative True Positive Rate (TPR) = Σ True positive / Σ Condition positive (also called Sensitivity, Recall) False Positive Rate (FPR) = Σ False positive / Σ Condition negative (also called Fall-out) False Negative Rate (FNR) = Σ False negative / Σ Condition positive (also called Miss rate) True Negative Rate (TNR) = Σ True negative / Σ Condition negative (also called Specificity (SPC)) Accuracy (ACC) = (Σ True positive + Σ True negative) / Σ Total population Positive Predictive Value (PPV), = Σ True positive / Σ Test outcome positive (also called Precision) False Discovery Rate (FDR) = Σ False positive / Σ Test outcome positive False Omission Rate (FOR) = Σ False negative / Σ Test outcome negative Negative Predictive Value (NPV) = Σ True negative / Σ Test outcome negative Positive Likelihood Ratio (LR+) = TPR / FPR Negative Likelihood Ratio (LR ) = FNR / TNR Diagnostic Odds Ratio (DOR) = LR+ / LR SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 18

Classification Method Comparison Linearly separable pattern: Binary (2-classes) classification http://tjo-en.hatenablog.com/entry/2014/01/06/234155 SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 19

Classification Method Comparison Linearly inseparable pattern: Binary Classification for a simple XOR pattern http://tjo-en.hatenablog.com/entry/2014/01/06/234155 SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 20

Classification Method Comparison Linearly separable pattern: 3-classes classification http://tjo-en.hatenablog.com/entry/2014/01/06/234155 SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 21

Classification Method Comparison Linearly inseparable pattern: Binary Classification for a complex XOR pattern http://tjo-en.hatenablog.com/entry/2014/01/06/234155 SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 22

Classification Method Comparison 4-classes classification for a complex pattern http://tjo-en.hatenablog.com/entry/2014/01/06/234155 SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 23

Classification Method Comparison Try to understand the pattern of data... by applying visual data analysis by applying pairwise comparison of attributes Is your data Linear Separable? Yes: Logistic Regression, Neuronal Networks be cautious on Decision Tree or Random Forrest No: Random Forrest or SVM???: Random Forrest good balance of generalization and accuracy, and its computational cost is relatively low But: Neuronal Networks can (not must) be the best solution but it s not easy to tune them to deliver good results (many parameters). SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 24

Decision Tree Learning SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 25

Decision Tree Learning A supervised learning method. Purpose: Predict the value of a certain target variable of an item based on observations on other variables from other items. If the target variable is from a finite set of values, then we call it classification tree. Otherwise a regression tree. Leaves represent class labels, whereas Branches represent conjunctions of features (variables) that lead to those class labels. Decision Tree (partial) for Bike Sales Sample SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 26

Decision Tree Learning A decision trees describe data, not decisions. A decision tree can be used as input for decision making, e.g. a prediction. Computation: Recursive Partitioning Recursively split the data set into subsets based on an attribute-value-test. (Greedy Algorithm) The recursion is completed when the subset at a node has all the same value of the target variable, or when splitting no longer adds value to the predictions. This approach is called top-down induction of decision trees Different algorithms and metrics have been developed to solve the core in decision tree generation: What is the right variable at each step that best splits the set of items? Greedy Algorithm: making the locally optimal choice at each stage of recursive process. SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 27

Decision Tree Learning in Knime Metric (quality measure) for splitting: Gini Index or Gini Impurity : Given a set of m items i of {1,2,,m} and f i be the fraction of items labeled with the value v i. Information Gain Ratio: Based on the entropy* of an information: Information Gain is defined as = Entropy(parent) - Weighted Sum of Entropy(Children) *the expected value of an information. SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 28

Decision Tree Learning in Knime Pruning Method Pruning reduces tree size and avoids overfitting which increases the generalization performance, and thus, the prediction quality. Available is the "Minimal Description Length" (MDL) pruning or it can also be switched off. Reduced Error Pruning Just relevant if execution speed matters. Otherwise switch it off. Skip nominal columns with domain information Always switch on. This ensures that columns with too many nominal values (e.g. the customer name in the bike sales sample) are automatically skipped. SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 29

Bike Sales Solutions SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 30

Bike Sales using Decision Tree SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 31

Bike Sales using Optimized Random Forrest SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 32

Result Comparision Decision Tree Optimized Random Forrest SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 33

Bike Sales reevaluation by common sense Just 2000 new customers? Let s send everyone an email SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 34

Lecture Summary & Homework SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 35

Lessons Learned Try to understand the business problem end-to-end. Try think beyond the scope of your current knowledge and work. That s analytical thinking. Even simple looking analytical problems may get tricky. You must follow multiple analytical paths to find the best solution. SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 36

Homework Read the post Classification performance comparison http://tjo-en.hatenablog.com/entry/2014/01/06/234155 Read the article Predicting Good Probabilities With Supervised Learning http://machinelearning.wustl.edu/mlpapers/paper_files/icml2005_nicule scu-mizilc05.pdf SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 37

Any Questions? SS 2016 - IT Applications in Business Analytics - 6. Analytical Use Case 1 38