EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER. Copyr i g ht 2013, SAS Ins titut e Inc. All rights res er ve d.



Similar documents
What is Data Mining? MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling. MS4424 Data Mining & Modelling

Decision Trees What Are They?

Gerry Hobbs, Department of Statistics, West Virginia University

Lecture 10: Regression Trees

COMP3420: Advanced Databases and Data Mining. Classification and prediction: Introduction and Decision Tree Induction

Data Mining Classification: Decision Trees

Principles of Data Mining by Hand&Mannila&Smyth

Decision Trees from large Databases: SLIQ

Leveraging Ensemble Models in SAS Enterprise Miner

Applied Data Mining Analysis: A Step-by-Step Introduction Using Real-World Data Sets

INTRODUCTION TO DATA MINING SAS ENTERPRISE MINER

Business Analytics and Credit Scoring

Data Mining with R. Decision Trees and Random Forests. Hugh Murrell

Using multiple models: Bagging, Boosting, Ensembles, Forests

Knowledge Discovery and Data Mining. Structured vs. Non-Structured Data

An Overview of Data Mining: Predictive Modeling for IR in the 21 st Century

Data mining and statistical models in marketing campaigns of BT Retail

Predictive Modeling of Titanic Survivors: a Learning Competition

How To Make A Credit Risk Model For A Bank Account

Microsoft Azure Machine learning Algorithms

Data Mining Jargon. Bob Muenchen The Statistical Consulting Center

Knowledge Discovery and Data Mining

Data Mining Algorithms Part 1. Dejan Sarka

Index Contents Page No. Introduction . Data Mining & Knowledge Discovery

Using Ensemble of Decision Trees to Forecast Travel Time

The Predictive Data Mining Revolution in Scorecards:

Course Syllabus. Purposes of Course:

Enhancing Compliance with Predictive Analytics

Social Media Mining. Data Mining Essentials

Data Mining for Knowledge Management. Classification

Classification and Prediction

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Methods: Applications for Institutional Research

Why do statisticians "hate" us?

COPYRIGHTED MATERIAL. Contents. List of Figures. Acknowledgments

An Overview and Evaluation of Decision Tree Methodology

Data, Measurements, Features

Model Deployment. Dr. Saed Sayad. University of Toronto

Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone:

Improving performance of Memory Based Reasoning model using Weight of Evidence coded categorical variables

Survey Analysis: Data Mining versus Standard Statistical Analysis for Better Analysis of Survey Responses

Knowledge Discovery and Data Mining. Bootstrap review. Bagging Important Concepts. Notes. Lecture 19 - Bagging. Tom Kelsey. Notes

Data mining techniques: decision trees

Predictive Analytics Techniques: What to Use For Your Big Data. March 26, 2014 Fern Halper, PhD

A fast, powerful data mining workbench designed for small to midsize organizations

Data Mining: Overview. What is Data Mining?

Model-Based Recursive Partitioning for Detecting Interaction Effects in Subgroups

BIDM Project. Predicting the contract type for IT/ITES outsourcing contracts

Insurance Analytics - analýza dat a prediktivní modelování v pojišťovnictví. Pavel Kříž. Seminář z aktuárských věd MFF 4.

Introduction to Data Mining and Machine Learning Techniques. Iza Moise, Evangelos Pournaras, Dirk Helbing

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Data Mining with SAS. Mathias Lanner Copyright 2010 SAS Institute Inc. All rights reserved.

Class #6: Non-linear classification. ML4Bio 2012 February 17 th, 2012 Quaid Morris

Chapter 12 Discovering New Knowledge Data Mining

Data Mining Practical Machine Learning Tools and Techniques

Methods for Interaction Detection in Predictive Modeling Using SAS Doug Thompson, PhD, Blue Cross Blue Shield of IL, NM, OK & TX, Chicago, IL

What is Data mining?

Data Mining. Nonlinear Classification

BIOINF 585 Fall 2015 Machine Learning for Systems Biology & Clinical Informatics

Data Mining Part 5. Prediction

THE LAST THING A FISH NOTICES IS THE WATER IN WHICH IT SWIMS COMPETITIVE MARKET ANALYSIS: AN EXAMPLE FOR MOTOR INSURANCE PRICING RISK

Classifying Large Data Sets Using SVMs with Hierarchical Clusters. Presented by :Limou Wang

Decision Trees for Predictive Modeling

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

In this presentation, you will be introduced to data mining and the relationship with meaningful use.

Predictive Modeling and Big Data

Generalizing Random Forests Principles to other Methods: Random MultiNomial Logit, Random Naive Bayes, Anita Prinzie & Dirk Van den Poel

A Property & Casualty Insurance Predictive Modeling Process in SAS

M15_BERE8380_12_SE_C15.7.qxd 2/21/11 3:59 PM Page Analytics and Data Mining 1

Azure Machine Learning, SQL Data Mining and R

Comparison of Data Mining Techniques used for Financial Data Analysis

Practical Data Science with Azure Machine Learning, SQL Data Mining, and R

Data Mining Using SAS Enterprise Miner Randall Matignon, Piedmont, CA

How To Perform An Ensemble Analysis

Importance or the Role of Data Warehousing and Data Mining in Business Applications

Data Mining. Knowledge Discovery, Data Warehousing and Machine Learning Final remarks. Lecturer: JERZY STEFANOWSKI

Knowledge Discovery and Data Mining

Learning Example. Machine learning and our focus. Another Example. An example: data (loan application) The data and the goal

Benchmarking of different classes of models used for credit scoring

TDWI Best Practice BI & DW Predictive Analytics & Data Mining

Regression Modeling Strategies

Why Taking This Course? Course Introduction, Descriptive Statistics and Data Visualization. Learning Goals. GENOME 560, Spring 2012

Data Preprocessing. Week 2

A Comparison of Decision Tree and Logistic Regression Model Xianzhe Chen, North Dakota State University, Fargo, ND

Lecture 6 - Data Mining Processes

CART 6.0 Feature Matrix

Prediction of Stock Performance Using Analytical Techniques

SAS Certificate Applied Statistics and SAS Programming

Supervised Feature Selection & Unsupervised Dimensionality Reduction

DATA MINING METHODS WITH TREES

C o p yr i g ht 2015, S A S I nstitute Inc. A l l r i g hts r eser v ed. INTRODUCTION TO SAS TEXT MINER

Using Adaptive Random Trees (ART) for optimal scorecard segmentation

Handling missing data in large data sets. Agostino Di Ciaccio Dept. of Statistics University of Rome La Sapienza

Framing Business Problems as Data Mining Problems

Addressing Analytics Challenges in the Insurance Industry. Noe Tuason California State Automobile Association

Clustering. Danilo Croce Web Mining & Retrieval a.a. 2015/201 16/03/2016

Predictive Analytics in the Public Sector: Using Data Mining to Assist Better Target Selection for Audit

TNS EX A MINE BehaviourForecast Predictive Analytics for CRM. TNS Infratest Applied Marketing Science

Text Analytics using High Performance SAS Text Miner

Tree Ensembles: The Power of Post- Processing. December 2012 Dan Steinberg Mikhail Golovnya Salford Systems

Transcription:

EXPLORING & MODELING USING INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER

ANALYTICS LIFECYCLE Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models Develop Models Transform & Select

ANALYTICS LIFECYCLE DECISION TREES CAN HELP IN VARIOUS STAGES Evaluate & Monitor Model Formulate Problem Data Preparation Deploy Model Data Exploration Validate Models Develop Models Transform & Select

WHY DECISION TREES?

DECISION TREES ADVANTAGES Decision Trees are powerful predictive and explanatory modeling tools They are flexible in that they are able to model targets that are: Interval (regression trees) Ordinal, nominal and binary (classification trees) Trees can accommodate nonlinearities and interactions Trees are simple to understand and present

DECISION TREE EASY TO VISUALIZE

DECISION TREES ENGLISH RULES Node = 10 if Saving Balance >= 2615.09 AND Credit Card Balance < 641.915 then Tree Node Identifier = 10 Number of Observations = 981 Predicted: INS=1 = 0.68 Predicted: INS=0 = 0.32

DECISION TREE BACKGROUND

WHAT ARE DECISION TREES? Decision trees are statistical models designed for supervised prediction problems. The tree is fitted to data by recursive partitioning. Partitioning refers to segmenting the data into subgroups that are as homogeneous as possible with respect to the target. Many algorithms CHAID, CART, C4.5, C5.0

2 TYPES OF TREES Classification tree target is categorical Regression tree target is continuous

DECISION TREES CLASSIFICATION TREE

DECISION TREES MULTI-WAY SPLITS

DECISION TREES REGRESSION TREE

DECISION TREES PARTITIONED INPUT SPACE

DECISION TREES MULTIVARIATE STEP FUNCTION

DECISION TREES DECISION REGIONS

DECISION TREES LEAVES OF A CLASSIFICATION TREE

USING DECISION TREES FOR INITIAL AND EXPLORATORY DATA ANALYSIS

DECISION TREES INITIAL DATA ANALYSIS AND EXPLORATORY DATA ANALYSIS Interpretability No strict assumptions concerning the functional form of the model Resistant to the curse of dimensionality Robust to outliers in the input space No need to create dummy variables for nominal inputs Missing values do not need to be imputed Computationally fast (usually)

USING DECISION TREES TO MODIFY INPUT SPACE

DECISION TREES MODIFYING THE INPUT SPACE Dimension Reduction Input subset selection Collapsing levels of nominal inputs Dimension Enhancement Discretizing interval inputs Stratified modeling

DECISION TREES INPUT SELECTION

DECISION TREES COLLAPSING LEVELS

INTERACTIVE DECISION TREES IN SAS ENTERPRISE MINER

DECISION TREES INTERACTIVE TRAINING Force and remove inputs Define split values Manually prune branches and leaves

DECISION TREE INTERACTIVE DECISION TREE TIP: Prior to invoking interactive mode, modify the Decision Tree properties to reflect the type of tree you wish to build.

BUILDING SEGMENTATION TREES

DECISION TREES SEGMENTATION TREES WITH MULTIPLE TARGETS Interactively build trees while considering more than one target.

DEMONSTRATION

ADDITIONAL DECISION TREES

SAS ENTERPRISE MINER BAGGING/BOOSTING TREES Use Start Groups & End Groups Nodes

SAS ENTERPRISE MINER GRADIENT BOOSTING Sequential ensemble of many trees Extremely good predictions Very effective at variable selection

SAS ENTERPRISE MINER RANDOM FOREST Predictive Model called a Forest Creates Several Trees Training Data sampled without replacement Input variables sampled Available in EM 13.1

TIPS AND RESOURCES

TIP INTERACTIVE DECISION TREE The Interactive Decision Tree may not use all of your data. It uses a sample of at most 20,000 observations to prevent the excessive time and memory consumption that can occur with large data sets. You can control the size and method for creating the sample with Project Start Code

TIP INTERACTIVE DECISION TREE %let EM_INTERACTIVE_TREE_MAXOBS= <maxnumber-of-observations-in-sample>; %let EM_INTERACTIVE_TREE_SAMPLEMETHOD=<RANDOM FIRSTN STRATIFY>;

TIP INTERACTIVE DECISION TREE %let EM_INTERACTIVE_TREE_MAXOBS = 100000; %let EM_INTERACTIVE_TREE_SAMPLEMETHOD = RANDOM;

LEARNING MORE DOCUMENTATION SAS Enterprise Miner In-product Help File Documentation: http://support.sas.com/documentation/onlinedoc/miner/index.html Getting Started with SAS Enterprise Miner Documentation PDF Sample Data ZIP Recorded Webinar: http://www.sas.com/reg/web/corp/1864003

LEARNING MORE SAS EDUCATION COURSES Decision Tree Modeling https://support.sas.com/edu/schedules.html?ctry=us&id=1463 Data Mining Techniques: Theory and Practice https://support.sas.com/edu/schedules.html?ctry=us&id=1244

LEARNING MORE SAS PRESS

LEARNING MORE SAS PRESS Decision Trees for Analytics Using SAS Enterprise Miner By: Barry de Ville and Padraic Neville ISBN: 978-1-61290-315-6 Copyright Date: July 2013 SAS Bookstore: https://support.sas.com/pubscat/bookdetails.jsp?catid= 1&pc=63319 Table of Contents [PDF] Free Chapter [PDF] Example Code and Data

THANK YOU FOR USING SAS www.sas.com