Measuring the Discrimination Quality of Suites of Scorecards:

Size: px
Start display at page:

Download "Measuring the Discrimination Quality of Suites of Scorecards:"

Transcription

1 Measuring the Discrimination Quality of Suites of Scorecards: ROCs Ginis, ounds and Segmentation Lyn C Thomas Quantitative Financial Risk Management Centre, University of Southampton UK CSCC X, Edinburgh August 007 1

2 Outline How to measure scorecards Measures of discrimination Divergence expectations of woe functions Kolmogorv Smirnov- difference in distribution function ROC curves- comparison of distribution function/business measures Gini coefficient/d concordance statistic - Simple bound for ROC Curve Relationship between measures of discrimination Segmenting and measures of segmentation Why segment and build different scorecards on each segment How much of discrimination due to segmentation and how much to scorecard? Some examples from behavioural scorecards

3 Measuring scorecards in credit scoring Three different aspects of a scorecards performance that one might want to measure Discriminatory power ( only uses scorecard) How good is the system at separating the two classes of goods and bads Divergence statistic Mahalanobis distance Somer s D-concordance statistic Kolmogorov Smirnov statistic ROC curve Gini coefficient Calibration of probability forecast ( uses scorecard and population odds) Not used much until asel requirements and so few tests Chi-square ( Hosmer-Lemeshow ) test inomial and Normal tests Categorical prediction error( uses scorecard, population odds and cut-off score) This requires the scorecard and the cut-off score so one can implement the decisions and see how many erroneous classifications there are Error rates by tables and swap sets Hypothesis tests 3

4 Introduced by Kullbeck Divergence Continuous version of Information Value Let f(s G) ( f(s )) be density functions of scores of goods, (G) ( bads ()) in a scorecard. Divergence is then defined by f ( s G) Divergence = D= ( f ( s G) f ( s ) log ds = ( f ( s G) f ( s ) w( s) ds f ( s ) where w(s) is the weights of evidence at score. Like Exp goods dist (weights of evidence)-exp bad dist (weights of evidence) D 0 and D=0 f(s G)=f(s ) D no overlap between scores of goods and bads Really can only calculate the divergence by splitting scores into bands. If i bands with g = n and b = n i G i i I i I gi / n G gin Information Value = IV = ( gi / ng bi / n ) ln = ( gi / ng bi / n ) ln i I bi / n i I bn i G 4

5 Mahalanobis Distance and relationship with Divergence If goods have total n G, mean µ G and variance σ G and bads have total n mean µ and variance σ So assuming same variance G and, variance is Mahalanobis distance is D M = (µ G - µ )/ σ This is what discriminant analysis maximises If assume f(s ) and f(s G) are normal Divergence reduces to D If σ G = σ, then D= D M ( σ ) G σ = + ( µ µ ) + σ σ σ σ G G G σ G G = p r o b a b i l i t y d e n s i t y n σ n G + n + n σ Difference between goods and bads different : score bads goods 5

6 Kolmogorov Smirnov statistic Not a difference in expectations but a difference in the distribution functions F(s G) and F(s ).( max difference) 1 KS = max F( s G) F( s ) s F(s ) Probability Distribution function K-S distance F(s G) 0 s: score 6

7 Kolmogorov-Smirnov statistic Problem with KS is that it describes situation at optimal separating score. i.e. where marginal good-bad odds is equal to overall good-bad odds This is usually much higher than any cut-off score. Strong relationship between KS and sensitivity and specificity F( s ) F( s G) = sensitivity + specificty 1 and KS = max F( s ) F( s G) = max sensitivity + specificty 1 s s 7

8 ROC curve Receiver Operating Characteristics Curve KS plots two functions F(s G), F(s ) against score ROC curve plots the functions against each other. Each point corresponds to a cut-off score s Vertical is % of bads below that cut-off Horizontal is % goods against that cut off F(s ) Ideal scorecard gives AC C through point ( 1,0) Curve AC (diagonal) like picking scores at random Gini coefficient (GC) asks how close to optimal is scorecard A F(s G) 8

9 ROC curve relationship with business measures As go down curve from top right to bottom left, decreasing cut-off score, so more accepted volume of portfolio increases As curve goes down, numbers of bads accepted increases losses increase Profit at cut-off score s, (R profit on good, D loss on bad) is Rp (1 F( s G)) Dp (1 F( s )) G So isobars of equal profit if Rp F( s G) + Dp F( s ) = constant G As one goes to north west along curve profit increases C F(s ) Volume increases Losses increase A Profits increase F(s G) 9

10 Gini Coefficient Gini coefficient, GC, is x(ratio of area between curve and diagonal to area AC) If GC =1 then perfect discrimination; GC =0 no discrimination. AUROC is area under the ROC curve so GC= (AUROC -0.5)= AUROC -1 K-S is greatest vertical distance from diagonal to curve. C F(s ) F(s ) A F(s G) F(s G) 10

11 Lift curve and Accuracy Ratio AR Lift curve looks similar to ROC curve ( originated in marketing) but subtle differences C F(s ) A F(s) Plots F(s ) % bads rejected against F(s) % rejected Ideal scorecard gives AC ( is p of population in) Random scorecard given by diagonal AC Curve depends on population odds UT Accuracy ratio, AR = ( area curve above diagonal)/area AC AR = GC ( even though different curves) 11

12 Somer s D-concordance statistic and relationship with Gini Coefiicient C F(s ) A F(s G) Area under curve can be interpreted as probability, D S, that if good and bad chosen at random, then good will have higher score than bad. Consider expectation of variable which is 1 if goods score>bads score; -1 if bads score> goods score; 0 if same score ( ) D = 1 F( s ) f ( s G) ds 1. F( s ) f ( s G) ds S = F( s ) f ( s G) ds 1. f ( s G) ds = AUROC 1 = G This D S is known as Somer s D-concordance statistic. Useful way of calculating GC.is to calculate Mann Whitney U If g goods, b bads let S(G) (S()) be sum of rankings of goods ( bads) Example as scores get ( GGGG); S(G) = =17; s()= 4; g=4,b=. AUROC=U/gb= [S(G) 1/g(g+1)]/gb =(17-10)/4. =0.875 D s = GC= AUROC-1 = U/gb -1=

13 C Very Simple bound on Gini and its powerful consequences Assume the scorecard has monotonically increasing marginal odds (reasonable) ROC curve is concave E (g,b)=(f(s G,F(s ) C F A G Area of AFE = (b-g)ag/; Area of FEC= (b-g).gd/ Area of two triangles = (b-g)/ < area from curve to diagonal =GC/ GC> (b-g) for any point on the curve D 13

14 GC > ( b-g) Very Simple bound on Gini and its powerful consequences (g,b)=(f(s G,F(s ) E F A D Take (g,b) to be at s which Maximises F(s )-F(s G), GC > KS Good cards satisfy rule ( pick up 50% of bads in first 10% of goods) rule G>.4 14

15 Segmentation and Discrimination In reality a scoring system consists of not just one score card but a suite of scorecards built on different segments of the population. Reasons for segmentation System / data constraints new account vs older accounts Policy issues want young people - different scorecard for <5 s Significant interactions between variables and others Usually only calculate discrimination measure for each scorecard separately but should do it for whole system after scorecards have been calibrated on common scale How much of the discrimination is due to the scorecards and how much to the segmentation? 15

16 Measuring power of segmentation in scorecards Measure segmentation power by taking the segments and choose scorecards which discriminates no better than random in each segment. Give borrower j in segment i a score of s i +εu j where u j has uniform distribution on [0,1] and s i+1 > s i + ε Results for two segment case but expands to k segment case. Assume segment 1 has g 1 goods and b 1 bads, score s 1 and segment has g goods and b bads, score s and assume Let g b < g b 1 1 Define n = g + b; n = g + b ; b b g g p = ; p = ; p = ; p = b 1 b g 1 g 1 b b b1 + b g1 + g g1 + g p n n = ; p = t 1 t 1 n 1 + n n1 + n 16

17 Impact of segmentation on Gini coefficient AEC is ROC curve for segmented/random Gini E is (p,p ), so by same result as approx g b 1 1 GC=p -p b g 1 1 C Example from behavioural scorecards where can segment on whether ever in arrears or not In arrears- 830 goods; 160 bads Never in arrears goods 40 bads Segment Gini = (160/00-830/8000) = =.696 Actual Gini was.88 R E Even if no segmentation,gives view of how much of Gini, this characteristic brings to scorecard. Is like using (approx) D-concordance to decide which variables to choose Explains why behavioural, scores have higher Ginis than application score In example suppose Gini for no arrears is like that for application score say GC; assume cannot distinguish good/bads in arrears, then provided arrears score is small enough so curve goes through E ehavioural Gini = GC So appl GC =.45 behav GC =.78 S A ROC curve given by segmentation only 17 D

18 Impact of segmentation on KS statistic For scorecard just from segments Easy to show that KS maximised at end of first segment b g 1 1 K S = p p Note for two segment scorecard KS=GC Cumulative prob F(s ) b p1 g p1 F(s G) ehavioural example In arrears- 830 goods; 160 bads Never in arrears goods 40 bads Segment KS = (160/00-830/8000) = =.696 Actual KS was.75 score 18

19 Impact of segmentation on Divergence and Mahalanobis distance Mahalanobis distance for segmented only scorecard where all segment 1 get s 1, all segment get s (ε 0) µ = p s + p s ; µ = p s + p s b b g g 1 1 G 1 1 σ = p p ( s s ) ; σ = p p ( s s ) b b g g 1 1 G 1 1 % σ = ( n σ + n σ )/ n = ( p p p + p p p )( s s ) D g b t g b t G G % M ( σ ) = p p p p b g g b 1 1 p p p + p p p g b t g b t If instead we take the overall sample varian ce σ σ = p p ( s s ) D t t 1 1 M ( σ ) = p p p p b g g b 1 1 p p t t 1 Example: behavioural score In arrears- 830 goods; 160 bads Never in arrears goods 40 bads p =.8; p =.; p =.10375; p =.8965; b b g b 1 1 p =.107; p =.8793; σ =.16( s s ) ; σ =.0913( s s ) t b 1 1 G 1 % σ = 0.307; σ = 0.35 D M (% σ) =.6; D (% σ) =.14 Actual Mahalonobis distance is.40 M 19

20 Impact of segmentation on Divergence and Mahalanobis distance Divergence for segmented only scorecard where all segment 1 get s 1, all segment get s (ε 0) D = ( σ ) G σ = + ( µ µ ) + σ σ σ σ G G G ( p p + p p )( p p p p ) + ( p p p p ) g g b b g b g b g g b b b b g g p 1 p p 1 p Example: behavioural score In arrears- 830 goods; 160 bads Never in arrears goods 40 bads For segmentation just by itself D = 4.73 Actual D = ( but done using equal variance approx) 0

21 Conclusions There are connections between the different ways of measuring the discrimination of scorecards ROC curve is most fundamental of the measures ( does not depend on population odds) includes KS and D-concordance Very simple triangle bound give quick indication of GC Shows GC>KS Triangle bound is actual value if one only segments with random scorecard in each segment Allows one to recognise how much discrimination is built into segmentation alone independent of scorecards then built Even if no segmentation, gives importance of the variable considered for segmentation in full population scorecard 1

Despite its emphasis on credit-scoring/rating model validation,

Despite its emphasis on credit-scoring/rating model validation, RETAIL RISK MANAGEMENT Empirical Validation of Retail Always a good idea, development of a systematic, enterprise-wide method to continuously validate credit-scoring/rating models nonetheless received

More information

SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS

SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS John Banasik, Jonathan Crook Credit Research Centre, University of Edinburgh Lyn Thomas University of Southampton ssm0 The Problem We wish to estimate an

More information

Issues in Credit Scoring

Issues in Credit Scoring Issues in Credit Scoring Model Development and Validation Dennis Glennon Risk Analysis Division Economics Department The Office of the Comptroller of the Currency The opinions expressed are those of the

More information

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG

An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG Paper 3140-2015 An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG Iván Darío Atehortua Rojas, Banco Colpatria

More information

MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel. +371 26387099, e-mail: irina.genriha@inbox.

MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel. +371 26387099, e-mail: irina.genriha@inbox. MORTGAGE LENDER PROTECTION UNDER INSURANCE ARRANGEMENTS Irina Genriha Latvian University, Tel. +371 2638799, e-mail: irina.genriha@inbox.lv Since September 28, when the crisis deepened in the first place

More information

How to Measure the Quality of Credit Scoring Models *

How to Measure the Quality of Credit Scoring Models * JEL Classification: C0, C53, D8, G32 Keywords: credit scoring, quality indices, lift, profit, normally distributed scores How to Measure the Quality of Credit Scoring Models * Martin ŘEZÁČ Masaryk University,

More information

Weight of Evidence Module

Weight of Evidence Module Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,

More information

Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1

Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1 Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. Developing

More information

Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management

Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management April 2009 By Dean Caire, CFA Most of the literature on credit scoring discusses the various modelling techniques

More information

INTRODUCTION TO RATING MODELS

INTRODUCTION TO RATING MODELS INTRODUCTION TO RATING MODELS Dr. Daniel Straumann Credit Suisse Credit Portfolio Analytics Zurich, May 26, 2005 May 26, 2005 / Daniel Straumann Slide 2 Motivation Due to the Basle II Accord every bank

More information

Credit Risk Models. August 24 26, 2010

Credit Risk Models. August 24 26, 2010 Credit Risk Models August 24 26, 2010 AGENDA 1 st Case Study : Credit Rating Model Borrowers and Factoring (Accounts Receivable Financing) pages 3 10 2 nd Case Study : Credit Scoring Model Automobile Leasing

More information

Validation of Internal Rating and Scoring Models

Validation of Internal Rating and Scoring Models Validation of Internal Rating and Scoring Models Dr. Leif Boegelein Global Financial Services Risk Management Leif.Boegelein@ch.ey.com 07.09.2005 2005 EYGM Limited. All Rights Reserved. Agenda 1. Motivation

More information

Discussion Paper On the validation and review of Credit Rating Agencies methodologies

Discussion Paper On the validation and review of Credit Rating Agencies methodologies Discussion Paper On the validation and review of Credit Rating Agencies methodologies 17 November 2015 ESMA/2015/1735 Responding to this paper The European Securities and Markets Authority (ESMA) invites

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Data Mining Techniques Chapter 6: Decision Trees

Data Mining Techniques Chapter 6: Decision Trees Data Mining Techniques Chapter 6: Decision Trees What is a classification decision tree?.......................................... 2 Visualizing decision trees...................................................

More information

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables

MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables MATH 10: Elementary Statistics and Probability Chapter 5: Continuous Random Variables Tony Pourmohamad Department of Mathematics De Anza College Spring 2015 Objectives By the end of this set of slides,

More information

Reflection and Refraction

Reflection and Refraction Equipment Reflection and Refraction Acrylic block set, plane-concave-convex universal mirror, cork board, cork board stand, pins, flashlight, protractor, ruler, mirror worksheet, rectangular block worksheet,

More information

arxiv:physics/0606071v1 [physics.soc-ph] 7 Jun 2006

arxiv:physics/0606071v1 [physics.soc-ph] 7 Jun 2006 Validation of internal rating systems and PD estimates arxiv:physics/0606071v1 [physics.soc-ph] 7 Jun 2006 1 Introduction Dirk Tasche May 2006 This chapter elaborates on the validation requirements for

More information

Key Concept. Density Curve

Key Concept. Density Curve MAT 155 Statistical Analysis Dr. Claude Moore Cape Fear Community College Chapter 6 Normal Probability Distributions 6 1 Review and Preview 6 2 The Standard Normal Distribution 6 3 Applications of Normal

More information

HYPOTHESIS TESTING: POWER OF THE TEST

HYPOTHESIS TESTING: POWER OF THE TEST HYPOTHESIS TESTING: POWER OF THE TEST The first 6 steps of the 9-step test of hypothesis are called "the test". These steps are not dependent on the observed data values. When planning a research project,

More information

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses

Introduction to Hypothesis Testing. Hypothesis Testing. Step 1: State the Hypotheses Introduction to Hypothesis Testing 1 Hypothesis Testing A hypothesis test is a statistical procedure that uses sample data to evaluate a hypothesis about a population Hypothesis is stated in terms of the

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Statistical tests for SPSS

Statistical tests for SPSS Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

More information

Piecewise Logistic Regression: an Application in Credit Scoring

Piecewise Logistic Regression: an Application in Credit Scoring Piecewise Logistic Regression: an Application in Credit Scoring by Raymond Anderson Standard Bank of South Africa Presented at Credit Scoring and Control Conference XIV Edinburgh, 26-28 August 2015 Abstract

More information

Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

More information

Counterparty Credit Risk for Insurance and Reinsurance Firms. Perry D. Mehta Enterprise Risk Management Symposium Chicago, March 2011

Counterparty Credit Risk for Insurance and Reinsurance Firms. Perry D. Mehta Enterprise Risk Management Symposium Chicago, March 2011 Counterparty Credit Risk for Insurance and Reinsurance Firms Perry D. Mehta Enterprise Risk Management Symposium Chicago, March 2011 Outline What is counterparty credit risk Relevance of counterparty credit

More information

Utility. M. Utku Ünver Micro Theory. M. Utku Ünver Micro Theory Utility 1 / 15

Utility. M. Utku Ünver Micro Theory. M. Utku Ünver Micro Theory Utility 1 / 15 Utility M. Utku Ünver Micro Theory M. Utku Ünver Micro Theory Utility 1 / 15 Utility Function The preferences are the fundamental description useful for analyzing choice and utility is simply a way of

More information

Decision & Risk Analysis Lecture 6. Risk and Utility

Decision & Risk Analysis Lecture 6. Risk and Utility Risk and Utility Risk - Introduction Payoff Game 1 $14.50 0.5 0.5 $30 - $1 EMV 30*0.5+(-1)*0.5= 14.5 Game 2 Which game will you play? Which game is risky? $50.00 Figure 13.1 0.5 0.5 $2,000 - $1,900 EMV

More information

Cross-Tab Weighting for Retail and Small-Business Scorecards in Developing Markets

Cross-Tab Weighting for Retail and Small-Business Scorecards in Developing Markets Cross-Tab Weighting for Retail and Small-Business Scorecards in Developing Markets Dean Caire (DAI Europe) and Mark Schreiner (Microfinance Risk Management L.L.C.) August 24, 2011 Abstract This paper presents

More information

The aerodynamic center

The aerodynamic center The aerodynamic center In this chapter, we re going to focus on the aerodynamic center, and its effect on the moment coefficient C m. 1 Force and moment coefficients 1.1 Aerodynamic forces Let s investigate

More information

Continuous Random Variables

Continuous Random Variables Chapter 5 Continuous Random Variables 5.1 Continuous Random Variables 1 5.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize and understand continuous

More information

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios

Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are

More information

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller

Agenda. Mathias Lanner Sas Institute. Predictive Modeling Applications. Predictive Modeling Training Data. Beslutsträd och andra prediktiva modeller Agenda Introduktion till Prediktiva modeller Beslutsträd Beslutsträd och andra prediktiva modeller Mathias Lanner Sas Institute Pruning Regressioner Neurala Nätverk Utvärdering av modeller 2 Predictive

More information

Ordinal Regression. Chapter

Ordinal Regression. Chapter Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

More information

3.4 The Normal Distribution

3.4 The Normal Distribution 3.4 The Normal Distribution All of the probability distributions we have found so far have been for finite random variables. (We could use rectangles in a histogram.) A probability distribution for a continuous

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Data Modeling & Bureau Scoring Experian for CreditChex

Data Modeling & Bureau Scoring Experian for CreditChex Data Modeling & Bureau Scoring Experian for CreditChex Karachi Nov. 29 th 2007 Experian Decision Analytics Credit Services Help clients with data and services to make business critical decisions in credit

More information

Two Correlated Proportions (McNemar Test)

Two Correlated Proportions (McNemar Test) Chapter 50 Two Correlated Proportions (Mcemar Test) Introduction This procedure computes confidence intervals and hypothesis tests for the comparison of the marginal frequencies of two factors (each with

More information

Demand. Lecture 3. August 2015. Reading: Perlo Chapter 4 1 / 58

Demand. Lecture 3. August 2015. Reading: Perlo Chapter 4 1 / 58 Demand Lecture 3 Reading: Perlo Chapter 4 August 2015 1 / 58 Introduction We saw the demand curve in chapter 2. We learned about consumer decision making in chapter 3. Now we bridge the gap between the

More information

Normality Testing in Excel

Normality Testing in Excel Normality Testing in Excel By Mark Harmon Copyright 2011 Mark Harmon No part of this publication may be reproduced or distributed without the express permission of the author. mark@excelmasterseries.com

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables

STT315 Chapter 4 Random Variables & Probability Distributions KM. Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Chapter 4.5, 6, 8 Probability Distributions for Continuous Random Variables Discrete vs. continuous random variables Examples of continuous distributions o Uniform o Exponential o Normal Recall: A random

More information

Credit scoring Case study in data analytics

Credit scoring Case study in data analytics Credit scoring Case study in data analytics 18 April 2016 This article presents some of the key features of Deloitte s Data Analytics solutions in the financial services. As a concrete showcase we outline

More information

Testing Random- Number Generators

Testing Random- Number Generators Testing Random- Number Generators Raj Jain Washington University Saint Louis, MO 63130 Jain@cse.wustl.edu Audio/Video recordings of this lecture are available at: http://www.cse.wustl.edu/~jain/cse574-08/

More information

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Two-Sample T-Tests Assuming Equal Variance (Enter Means) Chapter 4 Two-Sample T-Tests Assuming Equal Variance (Enter Means) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when the variances of

More information

Non-Inferiority Tests for Two Means using Differences

Non-Inferiority Tests for Two Means using Differences Chapter 450 on-inferiority Tests for Two Means using Differences Introduction This procedure computes power and sample size for non-inferiority tests in two-sample designs in which the outcome is a continuous

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Math 425 (Fall 08) Solutions Midterm 2 November 6, 2008

Math 425 (Fall 08) Solutions Midterm 2 November 6, 2008 Math 425 (Fall 8) Solutions Midterm 2 November 6, 28 (5 pts) Compute E[X] and Var[X] for i) X a random variable that takes the values, 2, 3 with probabilities.2,.5,.3; ii) X a random variable with the

More information

Projects Involving Statistics (& SPSS)

Projects Involving Statistics (& SPSS) Projects Involving Statistics (& SPSS) Academic Skills Advice Starting a project which involves using statistics can feel confusing as there seems to be many different things you can do (charts, graphs,

More information

Jitter Measurements in Serial Data Signals

Jitter Measurements in Serial Data Signals Jitter Measurements in Serial Data Signals Michael Schnecker, Product Manager LeCroy Corporation Introduction The increasing speed of serial data transmission systems places greater importance on measuring

More information

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics

Descriptive statistics Statistical inference statistical inference, statistical induction and inferential statistics Descriptive statistics is the discipline of quantitatively describing the main features of a collection of data. Descriptive statistics are distinguished from inferential statistics (or inductive statistics),

More information

Statistics in Retail Finance. Chapter 6: Behavioural models

Statistics in Retail Finance. Chapter 6: Behavioural models Statistics in Retail Finance 1 Overview > So far we have focussed mainly on application scorecards. In this chapter we shall look at behavioural models. We shall cover the following topics:- Behavioural

More information

Interpretation of Somers D under four simple models

Interpretation of Somers D under four simple models Interpretation of Somers D under four simple models Roger B. Newson 03 September, 04 Introduction Somers D is an ordinal measure of association introduced by Somers (96)[9]. It can be defined in terms

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Risk and return (1) Class 9 Financial Management, 15.414

Risk and return (1) Class 9 Financial Management, 15.414 Risk and return (1) Class 9 Financial Management, 15.414 Today Risk and return Statistics review Introduction to stock price behavior Reading Brealey and Myers, Chapter 7, p. 153 165 Road map Part 1. Valuation

More information

CORPORATE CREDIT RISK MODELING: QUANTITATIVE RATING SYSTEM AND PROBABILITY OF DEFAULT ESTIMATION

CORPORATE CREDIT RISK MODELING: QUANTITATIVE RATING SYSTEM AND PROBABILITY OF DEFAULT ESTIMATION CORPORATE CREDIT RISK MODELING: QUANTITATIVE RATING SYSTEM AND PROBABILITY OF DEFAULT ESTIMATION João Eduardo Fernandes 1 April 2005 (Revised October 2005) ABSTRACT: Research on corporate credit risk modeling

More information

Lecture 3: Linear methods for classification

Lecture 3: Linear methods for classification Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance. Chapter 2: Statistical models of default Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

More information

CORPORATE CREDIT RISK MODELING: QUANTITATIVE RATING SYSTEM AND PROBABILITY OF DEFAULT ESTIMATION

CORPORATE CREDIT RISK MODELING: QUANTITATIVE RATING SYSTEM AND PROBABILITY OF DEFAULT ESTIMATION CORPORATE CREDIT RISK MODELING: QUANTITATIVE RATING SYSTEM AND PROBABILITY OF DEFAULT ESTIMATION João Eduardo Fernandes * April 2005 ABSTRACT: The literature on corporate credit risk modeling for privately-held

More information

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat

WebFOCUS RStat. RStat. Predict the Future and Make Effective Decisions Today. WebFOCUS RStat Information Builders enables agile information solutions with business intelligence (BI) and integration technologies. WebFOCUS the most widely utilized business intelligence platform connects to any enterprise

More information

Reject Inference in Online Purchases

Reject Inference in Online Purchases Reject Inference in Online Purchases LENNART MUMM Master s Thesis at Klarna AB Supervisors: Jonas Adolfsson and Mikael Hussain Examiner: Tatjana Pavlenko May, 2012 Acknowledgements In the process of creating

More information

15.062 Data Mining: Algorithms and Applications Matrix Math Review

15.062 Data Mining: Algorithms and Applications Matrix Math Review .6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Statistical Machine Learning

Statistical Machine Learning Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes

More information

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541

Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 Using An Ordered Logistic Regression Model with SAS Vartanian: SW 541 libname in1 >c:\=; Data first; Set in1.extract; A=1; PROC LOGIST OUTEST=DD MAXITER=100 ORDER=DATA; OUTPUT OUT=CC XBETA=XB P=PROB; MODEL

More information

Point Biserial Correlation Tests

Point Biserial Correlation Tests Chapter 807 Point Biserial Correlation Tests Introduction The point biserial correlation coefficient (ρ in this chapter) is the product-moment correlation calculated between a continuous random variable

More information

Lecture 10: Depicting Sampling Distributions of a Sample Proportion

Lecture 10: Depicting Sampling Distributions of a Sample Proportion Lecture 10: Depicting Sampling Distributions of a Sample Proportion Chapter 5: Probability and Sampling Distributions 2/10/12 Lecture 10 1 Sample Proportion 1 is assigned to population members having a

More information

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank

Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing. C. Olivia Rud, VP, Fleet Bank Data Mining: An Overview of Methods and Technologies for Increasing Profits in Direct Marketing C. Olivia Rud, VP, Fleet Bank ABSTRACT Data Mining is a new term for the common practice of searching through

More information

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences. 1 Commands in JMP and Statcrunch Below are a set of commands in JMP and Statcrunch which facilitate a basic statistical analysis. The first part concerns commands in JMP, the second part is for analysis

More information

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test

Experimental Design. Power and Sample Size Determination. Proportions. Proportions. Confidence Interval for p. The Binomial Test Experimental Design Power and Sample Size Determination Bret Hanlon and Bret Larget Department of Statistics University of Wisconsin Madison November 3 8, 2011 To this point in the semester, we have largely

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Research Methods & Experimental Design

Research Methods & Experimental Design Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Financial Market Efficiency and Its Implications

Financial Market Efficiency and Its Implications Financial Market Efficiency: The Efficient Market Hypothesis (EMH) Financial Market Efficiency and Its Implications Financial markets are efficient if current asset prices fully reflect all currently available

More information

Modeling Lifetime Value in the Insurance Industry

Modeling Lifetime Value in the Insurance Industry Modeling Lifetime Value in the Insurance Industry C. Olivia Parr Rud, Executive Vice President, Data Square, LLC ABSTRACT Acquisition modeling for direct mail insurance has the unique challenge of targeting

More information

1 Sufficient statistics

1 Sufficient statistics 1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Appendix B D&B Rating & Score Explanations

Appendix B D&B Rating & Score Explanations Appendix B D&B Rating & Score Explanations 1 D&B Rating & Score Explanations Contents Rating Keys 3-7 PAYDEX Key 8 Commercial Credit/Financial Stress Scores 9-17 Small Business Risk Account Solution 18-19

More information

Data Preprocessing. Week 2

Data Preprocessing. Week 2 Data Preprocessing Week 2 Topics Data Types Data Repositories Data Preprocessing Present homework assignment #1 Team Homework Assignment #2 Read pp. 227 240, pp. 250 250, and pp. 259 263 the text book.

More information

Some Statistical Applications In The Financial Services Industry

Some Statistical Applications In The Financial Services Industry Some Statistical Applications In The Financial Services Industry Wenqing Lu May 30, 2008 1 Introduction Examples of consumer financial services credit card services mortgage loan services auto finance

More information

The Wilcoxon Rank-Sum Test

The Wilcoxon Rank-Sum Test 1 The Wilcoxon Rank-Sum Test The Wilcoxon rank-sum test is a nonparametric alternative to the twosample t-test which is based solely on the order in which the observations from the two samples fall. We

More information

Normal distribution. ) 2 /2σ. 2π σ

Normal distribution. ) 2 /2σ. 2π σ Normal distribution The normal distribution is the most widely known and used of all distributions. Because the normal distribution approximates many natural phenomena so well, it has developed into a

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = 0.2941 Adverse impact as defined by the 4/5ths rule was not found in the above data.

Adverse Impact Ratio for Females (0/ 1) = 0 (5/ 17) = 0.2941 Adverse impact as defined by the 4/5ths rule was not found in the above data. 1 of 9 12/8/2014 12:57 PM (an On-Line Internet based application) Instructions: Please fill out the information into the form below. Once you have entered your data below, you may select the types of analysis

More information

Lecture Notes on Elasticity of Substitution

Lecture Notes on Elasticity of Substitution Lecture Notes on Elasticity of Substitution Ted Bergstrom, UCSB Economics 210A March 3, 2011 Today s featured guest is the elasticity of substitution. Elasticity of a function of a single variable Before

More information

AN IMPROVED CREDIT SCORING METHOD FOR CHINESE COMMERCIAL BANKS

AN IMPROVED CREDIT SCORING METHOD FOR CHINESE COMMERCIAL BANKS AN IMPROVED CREDIT SCORING METHOD FOR CHINESE COMMERCIAL BANKS Jianping Li Jinli Liu Weixuan Xu 1.University of Science & Technology of China, Hefei, 230026, P.R. China 2.Institute of Policy and Management

More information

CREDIT SCORING MODEL APPLICATIONS:

CREDIT SCORING MODEL APPLICATIONS: Örebro University Örebro University School of Business Master in Applied Statistics Thomas Laitila Sune Karlsson May, 2014 CREDIT SCORING MODEL APPLICATIONS: TESTING MULTINOMIAL TARGETS Gabriela De Rossi

More information

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1.

Normal Distribution. Definition A continuous random variable has a normal distribution if its probability density. f ( y ) = 1. Normal Distribution Definition A continuous random variable has a normal distribution if its probability density e -(y -µ Y ) 2 2 / 2 σ function can be written as for < y < as Y f ( y ) = 1 σ Y 2 π Notation:

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Environmental Remote Sensing GEOG 2021

Environmental Remote Sensing GEOG 2021 Environmental Remote Sensing GEOG 2021 Lecture 4 Image classification 2 Purpose categorising data data abstraction / simplification data interpretation mapping for land cover mapping use land cover class

More information

Calibration of Uncertainty (P10/P90) in Exploration Prospects*

Calibration of Uncertainty (P10/P90) in Exploration Prospects* Calibration of Uncertainty (P10/P90) in Exploration Prospects* Robert Otis 1 and Paul Haryott 2 Search and Discovery Article #40609 (2010) Posted October 29, 2010 *Adapted from oral presentation at AAPG

More information

Confidence Intervals for the Difference Between Two Means

Confidence Intervals for the Difference Between Two Means Chapter 47 Confidence Intervals for the Difference Between Two Means Introduction This procedure calculates the sample size necessary to achieve a specified distance from the difference in sample means

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

Statistics Review PSY379

Statistics Review PSY379 Statistics Review PSY379 Basic concepts Measurement scales Populations vs. samples Continuous vs. discrete variable Independent vs. dependent variable Descriptive vs. inferential stats Common analyses

More information

1 The Brownian bridge construction

1 The Brownian bridge construction The Brownian bridge construction The Brownian bridge construction is a way to build a Brownian motion path by successively adding finer scale detail. This construction leads to a relatively easy proof

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

Gini in a Bottle: The Mathematics of Income Inequality

Gini in a Bottle: The Mathematics of Income Inequality Gini in a Bottle: The Mathematics of Income Inequality Rich Beveridge Clatsop Community College rbeveridge@clatsopcc.edu https://www.clatsopcc.edu/rich-beveridges-homepage Statistics and Social Justice

More information

Lecture notes for Choice Under Uncertainty

Lecture notes for Choice Under Uncertainty Lecture notes for Choice Under Uncertainty 1. Introduction In this lecture we examine the theory of decision-making under uncertainty and its application to the demand for insurance. The undergraduate

More information