Creating a Scoring Application Based on a Decision Tree Model This Quick Start guides you through creating a credit-scoring application in eight easy steps. Quick Start Century Corp., an electronics retailer, sells over 30,000 types of items phones, small radios, computer systems, home appliances, and home theater systems priced anywhere from five to 10,000 dollars. The lion s share of their revenue comes from 972 retail stores that operate in 23 states. About 20 percent comes from an online store, where the average purchase is $500. Most online sales originate in the United States. Century Corp. also derives revenue from its Century Corp. Credit Card (4C). With a combination of special financial incentives, extra conveniences, and well-targeted marketing campaigns, Century Corp. has increased 4C applications by approximately 10 percent per year over the past three years. Over 15 percent of in-store purchases now go onto a Century Corp. Credit Card, and online purchases using 4C have exceeded 25 percent. This successful program expansion also increased the company s exposure to bad debt, so the company created a project that would improve credit scoring during the application process. Century Corp. originally planned to obtain credit scores from a third party. Each time an application was processed, the thirdparty service would provide that individual s credit score. Unfortunately, with over 1,500 average daily 4C applications, the cost structure of this service substantially reduced the margin of the 4C program. RStat Instead, the company developed a completely different methodology that tapped into information they already had. Using existing customers demographic and historical credit data, they created a predictive model that could determine a new applicant s credit risk relative to all other 4C customers risk.
To develop and implement the model, we will use two data sources that contain information about current customers. The first file contains demographic data, such as income, occupation, education, gender, and age. The second file contains credit history. We will use Developer Studio to join these data sources, define virtual fields that will enhance our model, and extract training-sample data. Within RStat we will then build, refine, and evaluate our model. The final model will be deployed within a WebFOCUS scoring application. 1 Create the Procedure to Extract the Training-Data Model Create a new procedure (FEX). All facilities of TABLE, JOIN, DEFINE, COMPUTE, and filtering can be used in the procedure. Join the Customer data source to the Credit History data source using the common ID field. FOCUS Code Generated Using the Developer Studio GUI Tool JOIN LEFT _ OUTER AB _ CUSTOMERS.SEG01.ID IN ab _ customers TO UNIQUE AB _ CREDITHISTORY.SEG01.ID IN ab _ credithistory AS J0 END Create a virtual field to transform the credit score into an indicator flag. Credit score contains a probability between 0 and 1 indicating whether a consumer has paid off their credit line in a timely fashion based on their previous payment history. For our purposes, we will identify anyone with a credit score greater than.5 as a good credit risk and all others as a bad credit risk. FOCUS Code Generated Using the Developer Studio GUI Tool DEFINE FILE AB _ CUSTOMERS CREDIT _ APPROVAL/I6=IF CREDIT _ SCORE GT.5 THEN 1 ELSE 0; END
2 In Report Painter, add the fields you will use to build your model: ID, AGE, EDUCATION, MARITAL, GENDER, OCCUPATION, INCOME from the Customer data source, and CREDIT_APPROVAL from the Credit History data source. Select Run RStat from the toolbar to pass this data selection to RStat. Define the Model Data RStat opens with the selected data set loaded. RStat presents nine tabs that provide for the standard modeling workflow. The Data tab shows the variables and the roles each will play in building the model. Ensure that the following default options are selected: ID as Ident. This is the identifier for each row of data. CREDIT_APPROVAL as the Target. This is the value you will be predicting. All other variables as input. These will be used to predict the Target variable. Select Sample to ensure that the data is split into two sample sets: A training-data set comprising 70% of the original data used to create the model. The remaining 30% of the data (called the evaluation test-data set), which will be used to test how well the model predicts. Click Execute from the RStat toolbar. The status bar confirms that the variable roles have been set.
3 Build a Decision Tree Model Select the Model tab. Decision Tree is selected by default based on your input data. Click Execute to create the model. The model metadata or output appears. 4 Visualize the Decision Tree Model The Decision Tree generates rules that predict the score. Click Rules to display the rules. The Decision Tree divides the customers within the sample data into multiple segments (branches). Each branch terminates in a node that associates a subset of the customers with a predicted score. The rules describe the criteria that qualify customers for each node. The predicted score is a probability value between 0 and 1. Those with a probability of.5 or greater are predicted as good risk and less than.5 as bad risk. Click Draw to display the Tree diagram. The colored numbers at the end of each node correspond to the rules.
5 Evaluate the Decision Tree Model to See How Well the Model Predicts Select the Evaluate tab. Ensure the following options are selected: Error Matrix as the evaluation type. Testing as the data source to use the 30% sample segmented from the training data in Step 2. Click Execute in the RStat toolbar. An error matrix shows the relationship between the actual data and the predicted values. Two error matrices are displayed. The first matrix shows the count of cases and the second shows the percentage of cases. Looking at the second matrix we can see that the model predicts the following: In 83% of the cases {Cell (0,0)} the actual value of bad credit was matched by the predicted value. In 13% of the cases {Cell (1,1)} people with good credit were correctly classified. The remaining 4% were misclassified. Summing across the correctly classified cases, 83% + 13% = 96% correctly classified cases. 6 Export the Final Model to Build the Scoring Application Once you have finalized your model, you will export the model formula as a routine that can be deployed within any WebFOCUS environment to build a scoring application: Select the Model tab. Click Export from the RStat toolbar. Define the scoring routine name as ab_creditscore_tree. Select the ibi\apps\_rstat directory as the export destination. Click Save. The file containing your scoring routine will be generated and placed in the selected location and file name..
7 Compile and Deploy the Scoring Routine Exit RStat by clicking on the Quit button in the toolbar. Close the training report to return to Developer Studio Explorer. Select RStat Model Deployment from the Command menu. Select the exported Scoring Routine file as the source. Select the WebFOCUS environment, server, and application path where the routine should be deployed. In this case, we will deploy in the EDASERVE server with the _RStat directory. Click Deploy. Verify that the deployment was completed successfully.
8 Create a Scoring Application to Apply the Model to New Customer Data Create a new procedure in Report Painter and select the appropriate Master File for the new applicant data set. You can use any data source that contains the input variables defined for the model. For the purposes of this example, use the AB_NewCustomers data file. Add the following fields to the report: ID, AGE, EDUCATION, MARITAL, GENDER, OCCUPATION, and INCOME Create a new Compute field. Define the expression as the scoring function with your new data fields as the model input variables and the computed field (SCORE) as the final parameter. Set field name to SCORE. Set field format to A2. Build the following scoring expression: AB_CREDITSCORE_TREE(AGE, EDUCATION, MARITAL, GENDER, OCCUPATION, INCOME, SCORE) Create a second Compute field to display YES if the score is 1 or No if the score is 0: APPROVED/A3 = IF SCORE EQ 1 THEN YES ELSE NO ; FOCUS Code Generated Using the Developer Studio GUI Tool COMPUTE SCORE/A2 = AB _ CREDITSCORE _ TREE(AGE, EDUCATION, MARITAL, GENDER, OCCUPATION, INCOME, SCORE); COMPUTE APPROVED/A3 = IF SCORE EQ 1 THEN YES ELSE NO ;
Run the procedure to see the predicted values. Using the steps described in this Quick Start Guide you can also implement scoring routines using linear regression, general linear model regression (GLM), logistic regression, poisson regression, multinomial regression, hierarchical clustering, k-means clustering, and a wide array of other modeling techniques. RStat brings the power of predictive analytics to the operational enterprise. Any WebFOCUS application can select new data to be scored and then provide ad hoc analytics through active reports, plot the prediction on a map or graph, or support real-time decision-making through KPI dashboards and transactional process flows. Corporate Headquarters Two Penn Plaza, New York, NY 10121-2898 (212) 736-4433 Fax (212) 967-6406 DN4601534.0109 informationbuilders.com askinfo@informationbuilders.com Canadian Headquarters 150 York St., Suite 1000, Toronto, ON M5H 3S5 (416) 364-2760 Fax (416) 364-6552 For International Inquiries +1(212) 736-4433 Copyright 2009 by Information Builders. All rights reserved. [80] All products and product names mentioned in this publication are trademarks or registered trademarks of their respective companies. Printed in the U.S.A. on recycled paper