Enterprise Miner - Decision tree 1

Similar documents
ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Regression Model I. Regression Node

!"!!"#$$%&'()*+$(,%!"#$%$&'()*""%(+,'-*&./#-$&'(-&(0*".$#-$1"(2&."3$'45"

Data Mining Techniques Chapter 6: Decision Trees

COC131 Data Mining - Clustering

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

Improving the Performance of Data Mining Models with Data Preparation Using SAS Enterprise Miner Ricardo Galante, SAS Institute Brasil, São Paulo, SP

GeoGebra Statistics and Probability

An Overview and Evaluation of Decision Tree Methodology

IBM SPSS Decision Trees 21

Scatter Plots with Error Bars

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables

Statistical Data Mining. Practical Assignment 3 Discriminant Analysis and Decision Trees

Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1

Didacticiel Études de cas

APPLICATION PROGRAMMING: DATA MINING AND DATA WAREHOUSING

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.

Text Analytics Illustrated with a Simple Data Set

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Data Mining Using SAS Enterprise Miner : A Case Study Approach, Second Edition

Gerry Hobbs, Department of Statistics, West Virginia University

IBM SPSS Statistics 20 Part 1: Descriptive Statistics

SAP InfiniteInsight Explorer Analytical Data Management v7.0

WEKA Explorer User Guide for Version 3-4-3

Hierarchical Clustering Analysis

Data Visualization. Prepared by Francisco Olivera, Ph.D., Srikanth Koka Department of Civil Engineering Texas A&M University February 2004

Tutorial Segmentation and Classification

Data mining techniques: decision trees

How to create buttons and navigation bars

Data Mining III: Numeric Estimation

USC Aiken CMS Manual. A manual on using the basic functions of the dotcms system. Office of Marketing and Community Relations-USC Aiken

Adobe Dreamweaver CC 14 Tutorial

Data Visualization. Brief Overview of ArcMap

Predictive Modeling of Titanic Survivors: a Learning Competition

How to make a line graph using Excel 2007

Data Mining Methods: Applications for Institutional Research

NIS-Elements Viewer. User's Guide

Access the TAX Training Web Site

One-Way ANOVA using SPSS SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

QUANTIFY INSTALLATION GUIDE

SOS SO S O n O lin n e lin e Bac Ba kup cku ck p u USER MANUAL

University of Arkansas Libraries ArcGIS Desktop Tutorial. Section 2: Manipulating Display Parameters in ArcMap. Symbolizing Features and Rasters:

IBM SPSS Direct Marketing 23

Reevaluating Policy and Claims Analytics: a Case of Non-Fleet Customers In Automobile Insurance Industry

Clustering UE 141 Spring 2013

Tutorial: Biped Character in 3D Studio Max 7, Easy Animation

A Demonstration of Hierarchical Clustering

Petrel TIPS&TRICKS from SCM

IBM SPSS Direct Marketing 22

How to create pop-up menus

What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

ZOINED RETAIL ANALYTICS. User Guide

How To Insert Hyperlinks In Powerpoint Powerpoint

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA

Chapter 32 Histograms and Bar Charts. Chapter Table of Contents VARIABLES METHOD OUTPUT REFERENCES...474

Advanced Microsoft Excel 2010

Doña Ana County, NM Interactive Zoning Map

DaRIS portal visual user guide

Getting Started with SAS Enterprise Miner 7.1

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

EET 310 Programming Tools

Online Test Monitor Certification Course Transcript

Setting Up Global Navigator Alerts

Petrel TIPS&TRICKS from SCM

6. If you want to enter specific formats, click the Format Tab to auto format the information that is entered into the field.

Topographic Change Detection Using CloudCompare Version 1.0

Network Probe User Guide

ADOBE DREAMWEAVER CS3 TUTORIAL

Chapter 1. Creating Sketches in. the Sketch Mode-I. Evaluation chapter. Logon to for more details. Learning Objectives

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Applying a circular load. Immediate and consolidation settlement. Deformed contours. Query points and query lines. Graph query.

MEASURES OF VARIATION

CATIA Basic Concepts TABLE OF CONTENTS

1. CONFIGURING REMOTE ACCESS TO SQL SERVER EXPRESS

Fixplot Instruction Manual. (data plotting program)

CATIA Tubing and Piping TABLE OF CONTENTS

PowerWorld Simulator

Introduction to the TI-Nspire CX

Instructions for Configuring a SAS Metadata Server for Use with JMP Clinical

Creating Web Pages with Microsoft FrontPage

MicroStrategy Quick Guide: Creating Prompts ITU Data Mart Support Group, Reporting Services

Web GIS User Guide MainStreetGIS, LLC

Data Mining Classification: Decision Trees

Data mining and statistical models in marketing campaigns of BT Retail

Easy Manage Helpdesk Guide version 5.4

Help. Contents Back >>

Lecture 10: Regression Trees

Enabling pop-up windows on different browsers

Microsoft PowerPoint 2010 Computer Jeopardy Tutorial

(Least Squares Investigation)

IGSS. Interactive Graphical SCADA System. Quick Start Guide

How schedule AccuTRConsole to run every hour

Drawing a histogram using Excel

Editor Manual for SharePoint Version December 2005

Content Management System User Guide

Intellect Platform - The Workflow Engine Basic HelpDesk Troubleticket System - A102

Device configurator DRC200

User Guide Installing the 3D Studio plug-ins

MS Excel. Handout: Level 2. elearning Department. Copyright 2016 CMS e-learning Department. All Rights Reserved. Page 1 of 11

Transcription:

Enterprise Miner - Decision tree 1 ECLT5810 E-Commerce Data Mining Technique SAS Enterprise Miner -- Decision Tree I. Tree Node Setting Tree Node Defaults - define default options that you commonly use to build trees - right-click Tree node in Project Navigator select Edit Defaults button There are totally 3 Tabs enabled: 1. Basic Tab A. Splitting criterion B. General options - Observations required for a split search: minimum no. of data in a node to perform split - Maximum depth of tree: max. tree height - Splitting rules saved in each node: no. of highest score rules stored in the node - Surrogate rules saved in each node: if observations with missing splitting attribute are encountered, surrogate rules will be used instead of the main rules - Treat missing as an acceptable value: whether use observations with missing splitting attribute to assess the split test or not 1

Enterprise Miner - Decision tree 2 2. Advanced Tab A. Model assessment The Tree node supports the following model assessment criteria: Interval Targets Average Profit or loss Average Square Error Average, profit, or loss in the top 10, 25, or 50% Ordinal Targets Ordinal Proportion correctly classified Proportion of event, profit, or loss in top 10, 25, or 50% Total Leaf Impurity (Gini Binary or Nominal Target Proportion correctly classified Average Profit or Loss Proportion of event, profit, or loss in top 10, 25, or 50% Total Leaf Impurity (Gini evaluates the tree using the maximum average profit or minimal average loss when Profit or Loss Matrix is defined evaluates the tree that has the smallest average square error between the actual class and predicted class this measure evaluates the tree based on the average predicted values for the target, the maximum average profit, or the minimum average loss for the top n% of the cases. You use this model assessment criterion when your overall goal is to create the tree with the best lift value evaluates the tree that has the best classification rate when weighted for the ordinal distances evaluates the tree that results in the maximum profit or minimum loss in the top n % of the data chooses the tree that has the greatest total leaf impurity (smallest Gini evaluates the tree based on the proportion correctly classified (default if you defined a decision matrix) evaluates the tree based on the maximum average profit or minimal average loss evaluates the tree that results in the maximum profit or minimum loss in the top n % of the data chooses the tree that has the greatest total leaf impurity (smallest Gini 2

Enterprise Miner - Decision tree 3 B. General options - Sub-tree: choose method to select subtree 1. Best assessment value Use model assessment criteria specified before 2. The most leaves Use the fully built tree 3. At most indicated number of leaves Use tree with n leaves - Observations sufficient for split search: max. no. of data to perform splitting test - more data more memory and longer CPU times - Maximum tries in an exhaustive split search: max. no. of splitting tests perform in a node Input Data Source node: Data Set: SAMPSIO. DMAGECR Target: GOOD_BAD Data Partition node Method: stratified use GOOD_BAD variable Training: 70% Validate: 30% Tree node Advanced setting Model Assessment Measure Misclassification Running the Tree Node Open Tree Results Browser Model Tab In tree node, the Model tab contains the following sub tabs: General Basic Advanced Import Tree lists the model name, description, date created, last modified, and the target variable. To browse the target profile, select the Profile information button Displays the Basic tab settings that were used to create the tree in browse mode Displays the Advanced tab settings that were used to create the tree in browse mode This tab is grayed out if you did not import a tree from a predecessor Variable Selection node, Tree node, or SAS Code node. To view the properties and a table view of the import tree data set, select the [Properties] button 3

Enterprise Miner - Decision tree 4 After running the tree node, you can investigate the result of the tree. All Tab top left - Summary Table top right - Tree Ring lower left - Assessment Table lower right - Assessment Plot Summary Tab - displays a table that provides summary statistics for the currently selected tree. The information provided in the table depends on whether the target is a categorical or an interval variable and on the model assessment measure you chose in the Advanced tab. Summary Table for Categorical Targets - confusion matrix : pull-down menu View confusion matrix - leaf statistics : pull-down menu View leaf statistics Tree Ring Tab - contains the tree ring, which represents a graphical display of the tree complexity, split balance, and discriminatory power - controls the table view in Table Tab Center region - root node (entire data set) Successive ring - represent each split Segments - represent each node Size of segments - proportional to the no. of data in the node Segment color - categorical assessment value - interval target target value 4

Enterprise Miner - Decision tree 5 Tree Ring Pop-Up Menu - right-click the tree ring: Define colors choose your own customized colors to provide better contrast between segments and/or choose a different color criterion Node definition displays the node definition (the English rule) for the selected node in the tree ring Probe displays summary information in a textbox when you click on a node (or slide your cursor over the surface) in the tree ring Pick selects a node in the tree ring. The tree diagram is automatically updated based on the node you pick in the tree ring Redraw tree redraws the tree diagram from the currently selected node of the tree ring Table Tab - contains a table that provides a measure of how well the tree describes the data. Assessment statistics for each sub-tree are listed for the training and validation data sets; validation statistics are not listed if you did not use a validation data set. The type of statistic that is provided in the table depends on the model assessment measure that you chose in the Advanced tab. - For a categorical target, the default table statistic is the proportion of observations correctly classified - For an interval target, the default table statistic is the average sum of squared differences of an observation from its predicted value. Plot Tab - displays a plot of the assessment values on the vertical axis for the different sub-trees. The Tree node automatically chooses the sub-tree that optimizes the model assessment measure that you chose in the Advanced tab. The vertical reference line identifies this sub-tree. Set the vertical axis statistic for the assessment plot - right-click on the plot: Categorical targets: Interval targets: Proportion misclassified Proportion of event in either the top 10, 25, or 50% Total leaf impurity (Gini index) Average square error Average profit or loss Proportion of event in the either the top 10, 25, or 50% Tree Diagram - displays node (segment) statistics, the names of variables used to split the data into nodes, and the variable values for several levels of nodes in the tree - View pull-down menu Tree A Tree diagram contains the following items: Root node Internal nodes Leaf nodes Numeric labels Character labels top node in the tree that contains all observations non terminal nodes (also includes the root node) that contains the splitting rule terminal nodes that contain the final classification for a set of observations indicate at which point the Tree node found significant splits in interval level variable distributions or in categorical splits for nominal or ordinal level distributions name of variable used to split 5

Enterprise Miner - Decision tree 6 Right click the tree diagram and choose Statistics. Select all the statistics to display in the tree diagram. Then each node of the tree will contain the following information. Percentages for each target level Assigned Decision Node ID Tree Diagram Pop-up Menus - specify tree customizations, save the tree, and print the tree - right-click on a node: Counts for each target level and total counts Percentages of observations correctly classified within the node View competing splits View surrogate splits View split values Define colors Statistics Diagram-node types Tree options Node definition opens the Input Selection window, which lists the splitting rules and their measure of worth. You must first select a node in the diagram before you can view competing splits displays the surrogate splits in the Input Selection window. You must first select a node in the diagram before you can view surrogate splits displays the split values (right click on each splitting value) define tree diagram and tree ring colors set the node statistics specify the information displayed in the diagram, such as node statistics, or node statistics and variables set the depth of the tree displays a text box that contains the rule that defines the node 6