from Larson Text By Susan Miertschin



Similar documents
Data Mining Algorithms Part 1. Dejan Sarka

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

IT and CRM A basic CRM model Data source & gathering system Database system Data warehouse Information delivery system Information users

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

Data Mining with SQL Server Data Tools

IT462 Lab 5: Clustering with MS SQL Server

DBTech Pro Workshop. Knowledge Discovery from Databases (KDD) Including Data Warehousing and Data Mining. Georgios Evangelidis

Delivering Business Intelligence With Microsoft SQL Server 2005 or 2008 HDT922 Five Days

Data Mining: Overview. What is Data Mining?

not possible or was possible at a high cost for collecting the data.

Introduction to Data Mining

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Data Mining Part 5. Prediction

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

OLAP and Data Mining. Data Warehousing and End-User Access Tools. Introducing OLAP. Introducing OLAP

An Introduction to Data Mining. Big Data World. Related Fields and Disciplines. What is Data Mining? 2/12/2015

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

Machine Learning and Data Mining. Fundamentals, robotics, recognition

Data Mining for Business Intelligence. Concepts, Techniques, and Applications in Microsoft Office Excel with XLMiner. 2nd Edition

Azure Machine Learning, SQL Data Mining and R

MBA Data Mining & Knowledge Discovery

STATISTICA. Clustering Techniques. Case Study: Defining Clusters of Shopping Center Patrons. and

Data Mining with Weka

Quality Control of National Genetic Evaluation Results Using Data-Mining Techniques; A Progress Report

Data Mining. Vera Goebel. Department of Informatics, University of Oslo

Tutorials for Project on Building a Business Analytic Model Using Data Mining Tool and Data Warehouse and OLAP Cubes IST 734

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

A THREE-TIERED WEB BASED EXPLORATION AND REPORTING TOOL FOR DATA MINING

Chapter 20: Data Analysis

Data Mining as Part of Knowledge Discovery in Databases (KDD)

Model Deployment. Dr. Saed Sayad. University of Toronto

DATA MINING TECHNIQUES AND APPLICATIONS

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

8. Machine Learning Applied Artificial Intelligence

Pentaho Data Mining Last Modified on January 22, 2007

SQL SERVER TRAINING CURRICULUM

Introduction. A. Bellaachia Page: 1

Prerequisites. Course Outline

Data Mining Techniques in CRM

Predictive Data modeling for health care: Comparative performance study of different prediction models

Data Mining for Fun and Profit

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

GETTING AHEAD OF THE COMPETITION WITH DATA MINING

Discovering, Not Finding. Practical Data Mining for Practitioners: Level II. Advanced Data Mining for Researchers : Level III

Data Mining Solutions for the Business Environment

The Data Mining Process

BOOSTING - A METHOD FOR IMPROVING THE ACCURACY OF PREDICTIVE MODEL

Principles of Data Mining by Hand&Mannila&Smyth

Distance Learning and Examining Systems

Welcome. Data Mining: Updates in Technologies. Xindong Wu. Colorado School of Mines Golden, Colorado 80401, USA

Data Mining and Predictive Modeling with Excel 2007

DATA MINING TECHNIQUES SUPPORT TO KNOWLEGDE OF BUSINESS INTELLIGENT SYSTEM

Adding New Level in KDD to Make the Web Usage Mining More Efficient. Abstract. 1. Introduction [1]. 1/10

Why include analytics as part of the School of Information Technology curriculum?

The basic data mining algorithms introduced may be enhanced in a number of ways.

Chapter 6 - Enhancing Business Intelligence Using Information Systems

Using Microsoft Dynamics CRM for Analytical CRM: A Curriculum Package for Business Intelligence or Data Mining Courses

Data Mining Application in Direct Marketing: Identifying Hot Prospects for Banking Product

Data Mining and Knowledge Discovery in Databases (KDD) State of the Art. Prof. Dr. T. Nouri Computer Science Department FHNW Switzerland

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Final Project Report

Data Mining: Concepts and Techniques. Jiawei Han. Micheline Kamber. Simon Fräser University К MORGAN KAUFMANN PUBLISHERS. AN IMPRINT OF Elsevier

Learning is a very general term denoting the way in which agents:

International Journal of Computer Science Trends and Technology (IJCST) Volume 2 Issue 3, May-Jun 2014

SQL Server 2005 Features Comparison

Nine Common Types of Data Mining Techniques Used in Predictive Analytics

Data Mining: STATISTICA

Université de Montpellier 2 Hugo Alatrista-Salas : hugo.alatrista-salas@teledetection.fr

Prof. Pietro Ducange Students Tutor and Practical Classes Course of Business Intelligence

DMDSS: Data Mining Based Decision Support System to Integrate Data Mining and Decision Support

Data Mining Techniques

Introduction to Data Mining

Data Mining in CRM & Direct Marketing. Jun Du The University of Western Ontario jdu43@uwo.ca

Data quality in Accounting Information Systems

TIM 50 - Business Information Systems

Sunnie Chung. Cleveland State University

LVQ Plug-In Algorithm for SQL Server

An Overview of Knowledge Discovery Database and Data mining Techniques

Chapter 3: Cluster Analysis

Chapter 12 Discovering New Knowledge Data Mining

Introduction Predictive Analytics Tools: Weka

Quick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model

Social Media Mining. Data Mining Essentials

Mobile Phone APP Software Browsing Behavior using Clustering Analysis

Data Mining - Evaluation of Classifiers

Predictive Modeling and Big Data

Data Warehousing and Data Mining in Business Applications

Customer Analytics. Turn Big Data into Big Value

What is Data Mining? Data Mining (Knowledge discovery in database) Data mining: Basic steps. Mining tasks. Classification: YES, NO

Customer Classification And Prediction Based On Data Mining Technique

Information Management course

2015 Workshops for Professors

Role of Social Networking in Marketing using Data Mining

A Partially Supervised Metric Multidimensional Scaling Algorithm for Textual Data Visualization

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Transcription:

Decision Tree Data Mining Example from Larson Text By Susan Miertschin 1

Problem The Maximum Miniatures Marketing Department wants to do a targeted mailing gpromoting the Mythic World line of figurines. Previous research has shown that the buyers of the Mythic World line of product do not have any children living at home. Unfortunately, the list purchased for the Mythic World mailing does not include the statistic on the number of children at home for each household. It does include other facts about the household, namely number of cars owned, marital status, and whether the address is a rental property or occupied by the owner. The marketing department would like to find a way, using the three facts included in the mailing list data and the information known about current customer, to predict which households have no children living at home. The mailing will then be sent only to those households likely to have no children living at home. Wholesale customers have customer numbers below 5000. Retail customers have customer numbers of 5000 and above. Of course, we should only use retail customers as our data for this mining operation. 2

The Problem is One of Classification MaxMin wants to classify the households in the mailing list as either having no children living at home or having children living at home Even though this data point is not part of the mailing list data set 3

Recall - Data Mining - Task Types Classification Fit items into slots Larson Assign items in a collection to target categories or classes Oracle book (http://download.oracle.com/docs/cd/b28359_01/datamine.111/b28129/classify.htm) Applies to categorical attribute data Categorical the attribute takes on two or more discrete values Clustering Discovering Association Rules Discovering Sequential Patterns Sequence Analysis Regression Detective Deviations from Normal 4

Classification Here is data about loan applicants. Which are good risks and which are poor risks? Which of our current customers are likely to increase their current rate of spending if given an affinity card? Which consumers are likely to buy a new cell-phone product if I send them a direct mailing? Which potential customers in the new mailing list are likely to have no children at home? 5

Classification Predictive or Descriptive? Supervised or Unsupervised? 6

Classification Predictive Supervised 7

Classification Predictive - Supervised Predict which loan applicants are good risks and which are poor risks Predict which of our current customers are likely to increase their current rate of spending if given an affinity card Predict which consumers are likely to buy a new cell-phone product if I send them a direct mailing Predict which patients will respond better to treatment A or treatment B Predict which potential customers in the new mailing list have no children at home. 8

The Learn-by-Doing Section in Larson Creates a data mining model to determine whether the data included with the mailing list can predict whether the household has children or not, and thus, determine a more targeted subset of data for the mailing 9

Choose Classification Algorithms Available through MS Analysis Services MS Decision Trees MS Naïve Bayes MS Clustering MS Neural Network Other tools implement these (perhaps) and other algorithms Example: WEKA (Open Source) has 13 (or so) different algorithms based on Bayesian probabilities 16 (or so) different algorithms based on trees 12 (or so) different algorithms for clustering 10

WEKA (Waikato Environment for Knowledge Analysis) Classifiers Based on Bayesian Probabilities Classifiers Based on Trees 11

WEKA (Waikato Environment for Knowledge Analysis) Clusterers Classification Using a Neural Network 12

What Data to Mine? Customer data in the OLAP Cube created earlier Open MaxMinSalesDM in Visual Studio an archive file is available from Blackboard (ITEC4397MaxMinSalesDM.zip) M The cube defined here has been deployed to the class server DO NOT re-deploy, please I am not sure that each of you will be able to access the cube simultaneously we will have to see 13

The Data Mining i Model is Already Built The model is to predict whether number of children at home is 0 or more than 0 Number of children at home is the output variable What are the input variables? Expand Data Source Views in Solution Explorer double click MaxMinSalesDM.dsv to see the structure of the cube and see what data is available Is the output variable there? What input variables are also in the email list? 14

Data Mining Model Customer dimension is the main source of data Homeowner attribute what data is in there? 15

Data Mining Model Marital Status Num Cars Owned 16

Data Mining Model Number of children at home So can the other three data points predict the value of this attribute? That s the question we are to mine for 17

Data Mining Model We also want to use only data for retail customers Excluding wholesale customers Retail customer have a customer number of 5000 and above This query pulls the data that the mining model will examine 18

19 What Do the Models Tell Us?

Decision Tree Algorithm Results In Solution Explorer, expand the Mining Structures nodes and double click on the Children at Home data mining model Use the Model Mining View and select the Decision Tree model to examine the results Derived Rules: no cars + unmarried + not homeowner OR no cars + married + not homeowner Also look at the dependency network viewer 20

Decision Trees Most Popular Data Mining i Tool Easy to understand Easy to implement Easy to use Computationally thrifty Predicts categorical output from categorical and/or real input values 21

Naïve Bayes Algorithm Results select the Naïve Bayes model to examine the results Dependency network viewer is a primary tool Attribute Profiles, Attribute Characteristics, Attribute Discrimination lets you refine 22

Clustering Algorithm Results Cluster diagram shows data that clusters on number of children at home = 0 based on other characteristics Rename the two darkest clusters (No Children At Home 1 and 2, e.g.) Cluster Profiles Cluster Characteristics Cluster Discrimination 23

Decision Trees By Susan Miertschin 24