Quick Start. Creating a Scoring Application. RStat. Based on a Decision Tree Model



Similar documents
WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

SAS Marketing Automation 5.1. User s Guide

Lavastorm Analytic Library Predictive and Statistical Analytics Node Pack FAQs

Data Mining. SPSS Clementine Clementine Overview. Spring 2010 Instructor: Dr. Masoud Yaghini. Clementine

Release 2.1 of SAS Add-In for Microsoft Office Bringing Microsoft PowerPoint into the Mix ABSTRACT INTRODUCTION Data Access

SAS VISUAL ANALYTICS AN OVERVIEW OF POWERFUL DISCOVERY, ANALYSIS AND REPORTING

EMC Documentum Business Process Suite

Oracle Data Miner (Extension of SQL Developer 4.0)

IBM SPSS Direct Marketing 23

IBM SPSS Direct Marketing 22

DPL. Portfolio Manual. Syncopation Software, Inc.

CONTENTS PREFACE 1 INTRODUCTION 1 2 DATA VISUALIZATION 19

STATISTICA. Financial Institutions. Case Study: Credit Scoring. and

Data exploration with Microsoft Excel: analysing more than one variable

Using the New InfoAssist Tool for Ad Hoc Query and Reporting. John Osborn Information Builders

Alteryx Predictive Analytics for Oracle R

Pharmacy Affairs Branch. Website Database Downloads PUBLIC ACCESS GUIDE

WEBFOCUS QUICK DATA FOR EXCEL

Spyglass Portal Manual v

Business Insight Report Authoring Getting Started Guide

F9 Integration Manager

WebFOCUS InfoDiscovery

I. Create the base view with the data you want to measure

Segmentation and Data Management

Easily Identify Your Best Customers

Oracle Data Miner (Extension of SQL Developer 4.0)

IT462 Lab 5: Clustering with MS SQL Server

SAP Predictive Analysis Installation

How To Create A Powerpoint Intelligence Report In A Pivot Table In A Powerpoints.Com

ASSIGNMENT 4 PREDICTIVE MODELING AND GAINS CHARTS

WebSphere Commerce V7 Feature Pack 2

Microsoft Visual Studio Integration Guide

Bitrix Site Manager 4.1. User Guide

Printer Connection Manager

Pharos Uniprint 8.4. Maintenance Guide. Document Version: UP84-Maintenance-1.0. Distribution Date: July 2013

WebSphere Business Monitor V7.0 Business space dashboards

Tutorial Segmentation and Classification

HP Quality Center. Software Version: Microsoft Word Add-in Guide

UCINET Quick Start Guide

IBM SPSS Statistics 20 Part 4: Chi-Square and ANOVA

Technical Notes. EMC NetWorker Performing Backup and Recovery of SharePoint Server by using NetWorker Module for Microsoft SQL VDI Solution

Didacticiel Études de cas

Sage Intelligence Reporting. Microsoft FRx to Sage Intelligence Report Designer Add-In Conversion Guide. Sage 100 ERP

Learn About Analysis, Interactive Reports, and Dashboards

JustClust User Manual

StarWind Virtual SAN Installation and Configuration of Hyper-Converged 2 Nodes with Hyper-V Cluster

WebSphere Business Monitor

Configuring Network Load Balancing with Cerberus FTP Server

IBM Configuring Rational Insight and later for Rational Asset Manager

Tutorial for proteome data analysis using the Perseus software platform

Data Domain Profiling and Data Masking for Hadoop

Learn how to create web enabled (browser) forms in InfoPath 2013 and publish them in SharePoint InfoPath 2013 Web Enabled (Browser) forms

Beginning Microsoft Project

Information Server Documentation SIMATIC. Information Server V8.0 Update 1 Information Server Documentation. Introduction 1. Web application basics 2

Backing Up and Restoring the Database

7. Data Packager: Sharing and Merging Data

ibolt V3.2 Release Notes

WA2262 Applied Data Science and Big Data Analytics Boot Camp for Business Analysts. Classroom Setup Guide. Web Age Solutions Inc.

IBM Unica Leads Version 8 Release 6 May 25, User Guide

DataPA OpenAnalytics End User Training

Query and Export Guide

Tutorial #7A: LC Segmentation with Ratings-based Conjoint Data

Build Your First Web-based Report Using the SAS 9.2 Business Intelligence Clients

ENHANCE. The Style Sheet Tool for Microsoft Dynamics NAV. Microsoft Dynamics NAV 5.0. User s Guide

Consumption of OData Services of Open Items Analytics Dashboard using SAP Predictive Analysis

Project Statistics Dashboard Users Guide

DATA MINING TOOL FOR INTEGRATED COMPLAINT MANAGEMENT SYSTEM WEKA 3.6.7

Using the Drag-and-Drop Report Builder

Excel Companion. (Profit Embedded PHD) User's Guide

IBM Software Group Thought Leadership Whitepaper. IBM Customer Experience Suite and Real-Time Web Analytics

Novell ZENworks Asset Management 7.5

Combination Chart Extensible Visualizations. Product: IBM Cognos Business Intelligence Area of Interest: Reporting

Visualizing Relationships and Connections in Complex Data Using Network Diagrams in SAS Visual Analytics

Free Trial - BIRT Analytics - IAAs

QAD Business Intelligence Dashboards Demonstration Guide. May 2015 BI 3.11

Copyright 2012 Trend Micro Incorporated. All rights reserved.

Multicurrency Bank Reconciliation 9.0

INFOASSIST: REPORTING MADE SIMPLE

BusinessObjects Planning Excel Analyst User Guide

SQL Server Integration Services with Oracle Database 10g

A Demonstration of Hierarchical Clustering

Evaluator s Guide. PC-Duo Enterprise HelpDesk v5.0. Copyright 2006 Vector Networks Ltd and MetaQuest Software Inc. All rights reserved.

Mitigation Planning Portal MPP Reporting System

Configuring the SST DeviceNet OPC Server

Integrating SAS with JMP to Build an Interactive Application

HRS 750: UDW+ Ad Hoc Reports Training 2015 Version 1.1

not possible or was possible at a high cost for collecting the data.

Assignment objectives:

How do I create a Peachtree (Sage 50) Payroll export file?

Oracle Business Intelligence 11g OPN Advanced Workshop

Bradley University College of Liberal Arts and Sciences Department of Computer Sciences and Information Systems

EMC Smarts Network Configuration Manager

1.5 MONITOR. Schools Accountancy Team INTRODUCTION

This exhibit describes how to upload project information from Estimator (PC) to Trns.port PES (server). Figure 1 summarizes this process.

Avaya Network Configuration Manager User Guide

User Guide for TASKE Desktop

Hierarchical Clustering Analysis

Transcription:

Creating a Scoring Application Based on a Decision Tree Model This Quick Start guides you through creating a credit-scoring application in eight easy steps. Quick Start Century Corp., an electronics retailer, sells over 30,000 types of items phones, small radios, computer systems, home appliances, and home theater systems priced anywhere from five to 10,000 dollars. The lion s share of their revenue comes from 972 retail stores that operate in 23 states. About 20 percent comes from an online store, where the average purchase is $500. Most online sales originate in the United States. Century Corp. also derives revenue from its Century Corp. Credit Card (4C). With a combination of special financial incentives, extra conveniences, and well-targeted marketing campaigns, Century Corp. has increased 4C applications by approximately 10 percent per year over the past three years. Over 15 percent of in-store purchases now go onto a Century Corp. Credit Card, and online purchases using 4C have exceeded 25 percent. This successful program expansion also increased the company s exposure to bad debt, so the company created a project that would improve credit scoring during the application process. Century Corp. originally planned to obtain credit scores from a third party. Each time an application was processed, the thirdparty service would provide that individual s credit score. Unfortunately, with over 1,500 average daily 4C applications, the cost structure of this service substantially reduced the margin of the 4C program. RStat Instead, the company developed a completely different methodology that tapped into information they already had. Using existing customers demographic and historical credit data, they created a predictive model that could determine a new applicant s credit risk relative to all other 4C customers risk.

To develop and implement the model, we will use two data sources that contain information about current customers. The first file contains demographic data, such as income, occupation, education, gender, and age. The second file contains credit history. We will use Developer Studio to join these data sources, define virtual fields that will enhance our model, and extract training-sample data. Within RStat we will then build, refine, and evaluate our model. The final model will be deployed within a WebFOCUS scoring application. 1 Create the Procedure to Extract the Training-Data Model Create a new procedure (FEX). All facilities of TABLE, JOIN, DEFINE, COMPUTE, and filtering can be used in the procedure. Join the Customer data source to the Credit History data source using the common ID field. FOCUS Code Generated Using the Developer Studio GUI Tool JOIN LEFT _ OUTER AB _ CUSTOMERS.SEG01.ID IN ab _ customers TO UNIQUE AB _ CREDITHISTORY.SEG01.ID IN ab _ credithistory AS J0 END Create a virtual field to transform the credit score into an indicator flag. Credit score contains a probability between 0 and 1 indicating whether a consumer has paid off their credit line in a timely fashion based on their previous payment history. For our purposes, we will identify anyone with a credit score greater than.5 as a good credit risk and all others as a bad credit risk. FOCUS Code Generated Using the Developer Studio GUI Tool DEFINE FILE AB _ CUSTOMERS CREDIT _ APPROVAL/I6=IF CREDIT _ SCORE GT.5 THEN 1 ELSE 0; END

2 In Report Painter, add the fields you will use to build your model: ID, AGE, EDUCATION, MARITAL, GENDER, OCCUPATION, INCOME from the Customer data source, and CREDIT_APPROVAL from the Credit History data source. Select Run RStat from the toolbar to pass this data selection to RStat. Define the Model Data RStat opens with the selected data set loaded. RStat presents nine tabs that provide for the standard modeling workflow. The Data tab shows the variables and the roles each will play in building the model. Ensure that the following default options are selected: ID as Ident. This is the identifier for each row of data. CREDIT_APPROVAL as the Target. This is the value you will be predicting. All other variables as input. These will be used to predict the Target variable. Select Sample to ensure that the data is split into two sample sets: A training-data set comprising 70% of the original data used to create the model. The remaining 30% of the data (called the evaluation test-data set), which will be used to test how well the model predicts. Click Execute from the RStat toolbar. The status bar confirms that the variable roles have been set.

3 Build a Decision Tree Model Select the Model tab. Decision Tree is selected by default based on your input data. Click Execute to create the model. The model metadata or output appears. 4 Visualize the Decision Tree Model The Decision Tree generates rules that predict the score. Click Rules to display the rules. The Decision Tree divides the customers within the sample data into multiple segments (branches). Each branch terminates in a node that associates a subset of the customers with a predicted score. The rules describe the criteria that qualify customers for each node. The predicted score is a probability value between 0 and 1. Those with a probability of.5 or greater are predicted as good risk and less than.5 as bad risk. Click Draw to display the Tree diagram. The colored numbers at the end of each node correspond to the rules.

5 Evaluate the Decision Tree Model to See How Well the Model Predicts Select the Evaluate tab. Ensure the following options are selected: Error Matrix as the evaluation type. Testing as the data source to use the 30% sample segmented from the training data in Step 2. Click Execute in the RStat toolbar. An error matrix shows the relationship between the actual data and the predicted values. Two error matrices are displayed. The first matrix shows the count of cases and the second shows the percentage of cases. Looking at the second matrix we can see that the model predicts the following: In 83% of the cases {Cell (0,0)} the actual value of bad credit was matched by the predicted value. In 13% of the cases {Cell (1,1)} people with good credit were correctly classified. The remaining 4% were misclassified. Summing across the correctly classified cases, 83% + 13% = 96% correctly classified cases. 6 Export the Final Model to Build the Scoring Application Once you have finalized your model, you will export the model formula as a routine that can be deployed within any WebFOCUS environment to build a scoring application: Select the Model tab. Click Export from the RStat toolbar. Define the scoring routine name as ab_creditscore_tree. Select the ibi\apps\_rstat directory as the export destination. Click Save. The file containing your scoring routine will be generated and placed in the selected location and file name..

7 Compile and Deploy the Scoring Routine Exit RStat by clicking on the Quit button in the toolbar. Close the training report to return to Developer Studio Explorer. Select RStat Model Deployment from the Command menu. Select the exported Scoring Routine file as the source. Select the WebFOCUS environment, server, and application path where the routine should be deployed. In this case, we will deploy in the EDASERVE server with the _RStat directory. Click Deploy. Verify that the deployment was completed successfully.

8 Create a Scoring Application to Apply the Model to New Customer Data Create a new procedure in Report Painter and select the appropriate Master File for the new applicant data set. You can use any data source that contains the input variables defined for the model. For the purposes of this example, use the AB_NewCustomers data file. Add the following fields to the report: ID, AGE, EDUCATION, MARITAL, GENDER, OCCUPATION, and INCOME Create a new Compute field. Define the expression as the scoring function with your new data fields as the model input variables and the computed field (SCORE) as the final parameter. Set field name to SCORE. Set field format to A2. Build the following scoring expression: AB_CREDITSCORE_TREE(AGE, EDUCATION, MARITAL, GENDER, OCCUPATION, INCOME, SCORE) Create a second Compute field to display YES if the score is 1 or No if the score is 0: APPROVED/A3 = IF SCORE EQ 1 THEN YES ELSE NO ; FOCUS Code Generated Using the Developer Studio GUI Tool COMPUTE SCORE/A2 = AB _ CREDITSCORE _ TREE(AGE, EDUCATION, MARITAL, GENDER, OCCUPATION, INCOME, SCORE); COMPUTE APPROVED/A3 = IF SCORE EQ 1 THEN YES ELSE NO ;

Run the procedure to see the predicted values. Using the steps described in this Quick Start Guide you can also implement scoring routines using linear regression, general linear model regression (GLM), logistic regression, poisson regression, multinomial regression, hierarchical clustering, k-means clustering, and a wide array of other modeling techniques. RStat brings the power of predictive analytics to the operational enterprise. Any WebFOCUS application can select new data to be scored and then provide ad hoc analytics through active reports, plot the prediction on a map or graph, or support real-time decision-making through KPI dashboards and transactional process flows. Corporate Headquarters Two Penn Plaza, New York, NY 10121-2898 (212) 736-4433 Fax (212) 967-6406 DN4601534.0109 informationbuilders.com askinfo@informationbuilders.com Canadian Headquarters 150 York St., Suite 1000, Toronto, ON M5H 3S5 (416) 364-2760 Fax (416) 364-6552 For International Inquiries +1(212) 736-4433 Copyright 2009 by Information Builders. All rights reserved. [80] All products and product names mentioned in this publication are trademarks or registered trademarks of their respective companies. Printed in the U.S.A. on recycled paper