SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS
|
|
|
- Derick Collins
- 10 years ago
- Views:
Transcription
1 SAMPLE SELECTION BIAS IN CREDIT SCORING MODELS John Banasik, Jonathan Crook Credit Research Centre, University of Edinburgh Lyn Thomas University of Southampton ssm0
2 The Problem We wish to estimate an accept-reject model applicable to ALL applicants but we only know the performance of those accepted in the past. Conventional Treatment Reject Inference Extrapolation Multinomial regression (various) Augmentation Others ssm1
3 Weaknesses of Reject Inference Extrapolation (Hand & Henley 1993) Provided population model for all applicants is same as that for accepts only then Direct posterior modelling (Logistic, Probit) does not give biased estimates of parameters. Linear Discriminant Analysis will give biased estimates because LDA assumes p(x G) and p(x B) distributions are normal but they are not. ssm2
4 A More General Approach to Missing Data Let D=(D 0, D m ) D denotes whether a case has defaulted subscripts: 0 denotes value is observed m denotes missing Let S = {0,1} 0 denotes value of D is missing 1 denotes value of D is observed Little & Rubin s (1987) Categories Missing Completely At Random (MCAR): S is independent of D and X Missing at Random(MAR): S depends on X but not on D Non-ignorably Missing (NIM): S depends on D and on X ssm3
5 MCAR D 0 values are a random sample from the D values. Estimated parameters from this sample does not give biased estimates of parameters applicable to population of all D values MAR Can be shown (Hand & Henley, Little& Rubin) that if sampling is conditional only on X then using only observed values of D will not give biased estimates of the parameters of a population model applicable to all D values NIM D depends on X values and on other variables (called them Z). So S depends on X and Z. Relating D 0 values only to X WILL give biased estimates of population of all applicants model because the chance of observation of D values depends on Z and X not just X. Z values appear in the likelihood function and so ignoring them by relating D 0 merely to X will lead to omitted variable bias. Known as sample selection bias in the econometric literature. ssm5
6 Sample Selection Techniques We are interested in parameterising a model relating to a population where Y 1 = β 1 X 1 + ε 1....(1) We observe only those values of Y 1 when another variable, Y 2 is positive Suppose Y 2 is a linear model Y 2 = β 2 X 2 + ε 2 Call observed values of Y 1, Y* 1. Then: Y 1 = Y* 1 if Y 2 > 0 Y 1 is unobserved if Y 2 0 If we apply OLS to eqtn 1 using only the selected sample the regression line is E(Y* 1 X 1, Y 2 > 0) = β 1 X 1 + E(ε 1 ε 2 > - β 2 X 2 ) which differs from the regression for the population of interest E(Y 1 ) = β 1 X 1 ssm6
7 Further effect Suppose there are variables in X 2 (and so affect Y 2 ) but they are not in X 1 (they do not affect Y 1 ) Is possible that if estimate Eqtn 1 using only the selected sample these variables may appear statistically significant when in fact they do not affect Y 1. ssm7
8 Bivariate Probit Model with Sample Selection ssm8 Suppose Y 3 and Y 4 are continuous unobserved variables with Y 3 = β 3 X 3 + ε 3 Y 4 = β 4 X 4 + ε 4 and Y 3 =1 if Y 3 > 0 0 if Y 3 0 and Y 4 =1 if Y 4 > 0 0 if Y 4 0 with E(ε 3 ) = E(ε 4 ) Cov(ε 3,ε 4 ) = ρ (ε 3,ε 4 ) ~ bivariate normal Also Y* 3 is observed only if Y* 4 = 1 Y* 4 is observed in all cases Appears to be the credit scoring problem
9 Information Observed Good Bad Total Accepted observed A observed B observed C Rejected not observed D not observed E observed F Given that (ε 3, ε 4 ) ~ bivariate probit the unconstrained observations with associated probabilities are Y* 3 =1 Y* 4 =1: P(Y* 3 =1 Y* 4 =1) = Φ B (β 3 X 3, β 4 X 4,ρ) Cell A Y* 3 =0, Y* 4 =1: P(Y* 3 =0, Y* 4 =1) = Φ B (-β 3 X 3, β 4 X 4, -ρ) Cell B Y* 4 =0 : P(Y* 4 =0) =Φ U ( β 4 X 4 ) Cell F where Φ B is CDF of bivarariate normal distribution with density φ B (β 3 X 3, β 4 X 4,ρ) Φ U is CDF of univariate normal distribution with density φ U ( β 4 X 4 ) ssm9
10 Notice One cause of ρ=0 is if at least one variable is omitted from both the default and the AR equation or if the omitted variables are correlated Sample selection bias may occur unless ρ=0 If accept-reject decision is deterministic (no overrides) and if the accept-reject model can be estimated perfectly then ρ=0 and no sample selection bias occurs But overrides do occur the accept-reject model may not be perfectly estimatable (eg may have been estimated by neural network algorithm) ssm10
11 Previous Studies Boyes, et al (1989) Used bivariate probit to estimate parameters in a default equation but do not compare the predictive performance of this model with one based on aacepts only using a holdout of all applicants Greene (1992, 1998) Compares the predicted conditional probabilities ( P(Y* 3 X 3, Y 4 =1) with unconditional probabilities - but does not compare the predictive performance using a sample of all applicants Other Relevant Studies
12 Methodology: Initial Analysis Data 1. Complete sample comprises applicants virtually all of whom are accepted, distinguishing those who would normally be accepted. 2. Course classification of 46 variables (typically 6 categories each): Classification designed to be equally applicable to All applicants Accepted applicants in isolation Each variable to be considered alternatively as Weights of evidence Sets of binary variables 3. Training sample obtained by proportional stratified sampling to enhance comparability with holdout sample. 2/3 of all applicants in the training sample 2/3 of all accepted applicants in the training sample 2/3 of all good applicants in the training sample 2/3 of all good accepted applicants in the training sample
13 Methodology: Initial Analysis Model Estimation and Evaluation 1. All explanatory variables deployed to estimate logistic regression model using both types of variable (weights of evidence and binary sets) on the basis of training cases alone for both All applicants Accepted applicants only Weights of evidence separately estimated for both applicant categories on the basis of training cases. 2. Each model used to predict corresponding holdout cases to assess the possible extent of over-fitting. 3. The model for accepted applicants only used to predict all holdout cases to assess its general applicability. Preliminary indications are that Models based on accepted applicants only seem generally applicable Reject inference affords very modest prospects for improved predictive performance.
14 Table 2: Prediction Performance for Original Models based on Weights of Evidence Training Sample - Accepted Good % Bad % Total % Training Sample - Accepted Good % Bad % Total % Holdout Sample - Accepted Good % Bad % Total % Holdout Sample - All Cases Good % Bad % Total % Holdout Sample - All Cases Good % Bad % Total % Holdout Sample - All Cases Good % Bad % Total % Table 3: Prediction Performance for Original Models based on Binary Variables Training Sample - Accepted Good % Bad % Total % Training Sample - Accepted Good % Bad % Total % Holdout Sample - Accepted Good % Bad % Total % Holdout Sample - All Cases Good % Bad % Total % Holdout Sample - All Cases Good % Bad % Total % Holdout Sample - All Cases Good % Bad % Total %
15 Table 4: Comparative Prediction Performance of the Bivariate Probit Model Ref Bivariate Probit With Selection Probit (Accepts only) Probit (All cases) Variables Number Number Number Included of Cases Rho Signif AUROC of Cases AUROC of Cases AUROC Binary Variables LL35 All LL30 Those in SW LL31 SW less Weights of Evidence LL36 All LL33 Those in SW LL34 SW less
16 Methodology: Subsequent Analysis General Approach Subsequent analysis empirically examines two aspects of the foregoing conclusions. Perhaps the limited scope for reject inference occurs only because the acceptance threshold is too low. The variable sets for models previously used to accept or reject applicants may diverge from those of subsequent models used to predict how good or bad will be the accepted applicants. To examine these aspects 1. The data set was recast to distinguish 5 acceptance thresholds, one for each couple of deciles. The same proportional stratified sampling was adopted. The original course classification was retained. Weights of evidence were re-estimated for each training sample. 2. Distinct Accept-Reject and Good-Bad models were constructed on the basis of distinct explanatory variable sets.
17 Methodology: Subsequent Analysis Distinct Variable Sets The Accept-Reject model was based on 2540 Scottish applicants. The remaining 9668 English applicants comprised 6446 training cases to estimate a Good-Bad model and 3222 holdout cases. 1. An eligible pool of variables for the Accept-Reject model was discerned by stepwise regressions where Accept-Reject was the dependent variable, one each for Scotland, England, and the UK. 2. A variable surviving in any of these three was entered into a stepwise regression on Scottish data where Good-Bad was the dependent variable. 3. The Scottish model based only on significant variables was used to provide a ranking of acceptability used to group English applicants into 5 bands. 4. An English model was discerned on the basis of stepwise regression where Good-Bad was the dependent variable. This determined the variable set to be used for each of the five bands.
18 Table 5: Variables Included in the Accept-Reject and Good-Bad Equations Good-Bad Accept-Reject Ref Variable Equation Equation 43 B1 20 Time and Present Add 33 B2 40 Weeks since last CCJ 22 No of children under B3 11 Television area code 32 B4 31 P1 35 P2 6 B5 19 Accommodation Type 17 Age of applicant (years) 26 Has Telephone 23 P3 34 B6 21 Type of Bank/Building Soc Accts 16 B7 28 Current Electoral Role Category 38 P4 25 Occupation Code 36 B8 30 Years on Electoral Role at current address 48 No of searches in last 6 months 47 B9 7 B10 9 B11 46 B12 44 B13 27 P5 Bn = bureau variable n; P = proprietary variable n; denotes variable is included
19 Table 6: Prediction Performance for Models based on Weights of Evidence Training Sample - Band 1 Good % Bad % Total % Training Sample - Band 2 Good % Bad % Total % Training Sample - Band 3 Good % Bad % Total % Training Sample - Band 4 Good % Bad % Total % Training Sample - Band 5 Good % Bad % Total % Holdout Sample - Band 1 Good % Bad % Total % Holdout Sample - Band 2 Good % Bad % Total % Holdout Sample - Band 3 Good % Bad % Total % Holdout Sample - Band 4 Good % Bad % Total % Good % Bad % Total % Good % Bad % Total % Good % Bad % Total % Good % Bad % Total % Good % Bad % Total % Good % Bad % Total %
20 Table 7: Prediction Performance for Models based on Binary Variables Training Sample - Band 1 Good % Bad % Total % Training Sample - Band 2 Good % Bad % Total % Training Sample - Band 3 Good % Bad % Total % Training Sample - Band 4 Good % Bad % Total % Training Sample - Band 5 Good % Bad % Total % Holdout Sample - Band 1 Good % Bad % Total % Holdout Sample - Band 2 Good % Bad % Total % Holdout Sample - Band 3 Good % Bad % Total % Holdout Sample - Band 4 Good % Bad % Total % Good % Bad % Total % Good % Bad % Total % Good % Bad % Total % Good % Bad % Total % Good % Bad % Total % Good % Bad % Total %
21 Table 8: Comparative Prediction Performance of the Bivariate Probit Model for each Band Ref Bivariate Probit With Selection Probit (Accepts only) Probit (All cases) Weights of Variables Number Number Number Evidence Included of Cases Rho Signif AUROC of Cases AUROC of Cases AUROC LL39 Band 1 All Band Band Band Band LL42 Band 1 As Table Band Band Band Band LL43 Band 1 As Table 5 less Band Band Band Band
22 Conclusions The scope for improved predictive performance by any form of reject inference is modest The scope depends on the cut-off score adopted. Use of the bivariate probit model with selection only marginally improves predictive performance although this depends on the variables included in the model.
Reject Inference in Credit Scoring. Jie-Men Mok
Reject Inference in Credit Scoring Jie-Men Mok BMI paper January 2009 ii Preface In the Master programme of Business Mathematics and Informatics (BMI), it is required to perform research on a business
Credit scoring and the sample selection bias
Credit scoring and the sample selection bias Thomas Parnitzke Institute of Insurance Economics University of St. Gallen Kirchlistrasse 2, 9010 St. Gallen, Switzerland this version: May 31, 2005 Abstract
A Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
CREDIT SCORING MODEL APPLICATIONS:
Örebro University Örebro University School of Business Master in Applied Statistics Thomas Laitila Sune Karlsson May, 2014 CREDIT SCORING MODEL APPLICATIONS: TESTING MULTINOMIAL TARGETS Gabriela De Rossi
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus
Auxiliary Variables in Mixture Modeling: 3-Step Approaches Using Mplus Tihomir Asparouhov and Bengt Muthén Mplus Web Notes: No. 15 Version 8, August 5, 2014 1 Abstract This paper discusses alternatives
Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
Measuring the Discrimination Quality of Suites of Scorecards:
Measuring the Discrimination Quality of Suites of Scorecards: ROCs Ginis, ounds and Segmentation Lyn C Thomas Quantitative Financial Risk Management Centre, University of Southampton UK CSCC X, Edinburgh
Average Redistributional Effects. IFAI/IZA Conference on Labor Market Policy Evaluation
Average Redistributional Effects IFAI/IZA Conference on Labor Market Policy Evaluation Geert Ridder, Department of Economics, University of Southern California. October 10, 2006 1 Motivation Most papers
Classification Problems
Classification Read Chapter 4 in the text by Bishop, except omit Sections 4.1.6, 4.1.7, 4.2.4, 4.3.3, 4.3.5, 4.3.6, 4.4, and 4.5. Also, review sections 1.5.1, 1.5.2, 1.5.3, and 1.5.4. Classification Problems
problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved
4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random
Credit Risk Scorecard Design, Validation and User Acceptance
Credit Risk Scorecard Design, Validation and User Acceptance - A Lesson for Modellers and Risk Managers 1. Introduction Edward Huang and Christopher Scott Retail Decision Science, HBOS Bank, UK Credit
Despite its emphasis on credit-scoring/rating model validation,
RETAIL RISK MANAGEMENT Empirical Validation of Retail Always a good idea, development of a systematic, enterprise-wide method to continuously validate credit-scoring/rating models nonetheless received
Econometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
INTRODUCTION TO RATING MODELS
INTRODUCTION TO RATING MODELS Dr. Daniel Straumann Credit Suisse Credit Portfolio Analytics Zurich, May 26, 2005 May 26, 2005 / Daniel Straumann Slide 2 Motivation Due to the Basle II Accord every bank
Statistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
Statistics in Retail Finance. Chapter 2: Statistical models of default
Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision
The Impact of Sample Bias on Consumer Credit Scoring Performance and Profitability
The Impact of Sample Bias on Consumer Credit Scoring Performance and Profitability Geert Verstraeten, Dirk Van den Poel Corresponding author: Dirk Van den Poel Department of Marketing, Ghent University,
Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1
Developing Credit Scorecards Using Credit Scoring for SAS Enterprise Miner TM 12.1 SAS Documentation The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2012. Developing
USING LOGIT MODEL TO PREDICT CREDIT SCORE
USING LOGIT MODEL TO PREDICT CREDIT SCORE Taiwo Amoo, Associate Professor of Business Statistics and Operation Management, Brooklyn College, City University of New York, (718) 951-5219, [email protected]
STATISTICA. Financial Institutions. Case Study: Credit Scoring. and
Financial Institutions and STATISTICA Case Study: Credit Scoring STATISTICA Solutions for Business Intelligence, Data Mining, Quality Control, and Web-based Analytics Table of Contents INTRODUCTION: WHAT
MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group
MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could
Predict Influencers in the Social Network
Predict Influencers in the Social Network Ruishan Liu, Yang Zhao and Liuyu Zhou Email: rliu2, yzhao2, [email protected] Department of Electrical Engineering, Stanford University Abstract Given two persons
An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG
Paper 3140-2015 An Application of the Cox Proportional Hazards Model to the Construction of Objective Vintages for Credit in Financial Institutions, Using PROC PHREG Iván Darío Atehortua Rojas, Banco Colpatria
Handling attrition and non-response in longitudinal data
Longitudinal and Life Course Studies 2009 Volume 1 Issue 1 Pp 63-72 Handling attrition and non-response in longitudinal data Harvey Goldstein University of Bristol Correspondence. Professor H. Goldstein
1 Maximum likelihood estimation
COS 424: Interacting with Data Lecturer: David Blei Lecture #4 Scribes: Wei Ho, Michael Ye February 14, 2008 1 Maximum likelihood estimation 1.1 MLE of a Bernoulli random variable (coin flips) Given N
Credit Risk Analysis Using Logistic Regression Modeling
Credit Risk Analysis Using Logistic Regression Modeling Introduction A loan officer at a bank wants to be able to identify characteristics that are indicative of people who are likely to default on loans,
Lecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
Overview. Longitudinal Data Variation and Correlation Different Approaches. Linear Mixed Models Generalized Linear Mixed Models
Overview 1 Introduction Longitudinal Data Variation and Correlation Different Approaches 2 Mixed Models Linear Mixed Models Generalized Linear Mixed Models 3 Marginal Models Linear Models Generalized Linear
Bayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
Regression with a Binary Dependent Variable
Regression with a Binary Dependent Variable Chapter 9 Michael Ash CPPA Lecture 22 Course Notes Endgame Take-home final Distributed Friday 19 May Due Tuesday 23 May (Paper or emailed PDF ok; no Word, Excel,
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS
MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance
VI. Introduction to Logistic Regression
VI. Introduction to Logistic Regression We turn our attention now to the topic of modeling a categorical outcome as a function of (possibly) several factors. The framework of generalized linear models
Handling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
Reject Inference Methodologies in Credit Risk Modeling Derek Montrichard, Canadian Imperial Bank of Commerce, Toronto, Canada
Paper ST-160 Reject Inference Methodologies in Credit Risk Modeling Derek Montrichard, Canadian Imperial Bank of Commerce, Toronto, Canada ABSTRACT In the banking industry, scorecards are commonly built
Weight of Evidence Module
Formula Guide The purpose of the Weight of Evidence (WoE) module is to provide flexible tools to recode the values in continuous and categorical predictor variables into discrete categories automatically,
Problem of Missing Data
VASA Mission of VA Statisticians Association (VASA) Promote & disseminate statistical methodological research relevant to VA studies; Facilitate communication & collaboration among VA-affiliated statisticians;
LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
Lecture 9: Introduction to Pattern Analysis
Lecture 9: Introduction to Pattern Analysis g Features, patterns and classifiers g Components of a PR system g An example g Probability definitions g Bayes Theorem g Gaussian densities Features, patterns
STA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! [email protected]! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling
What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, and Discrete Changes
Using the Delta Method to Construct Confidence Intervals for Predicted Probabilities, Rates, Discrete Changes JunXuJ.ScottLong Indiana University August 22, 2005 The paper provides technical details on
Sales forecasting # 2
Sales forecasting # 2 Arthur Charpentier [email protected] 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting
On Marginal Effects in Semiparametric Censored Regression Models
On Marginal Effects in Semiparametric Censored Regression Models Bo E. Honoré September 3, 2008 Introduction It is often argued that estimation of semiparametric censored regression models such as the
Multinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
Logistic Regression. http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests
Logistic Regression http://faculty.chass.ncsu.edu/garson/pa765/logistic.htm#sigtests Overview Binary (or binomial) logistic regression is a form of regression which is used when the dependent is a dichotomy
CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York
BME I5100: Biomedical Signal Processing Linear Discrimination Lucas C. Parra Biomedical Engineering Department CCNY 1 Schedule Week 1: Introduction Linear, stationary, normal - the stuff biology is not
Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014
Probabilistic Models for Big Data Alex Davies and Roger Frigola University of Cambridge 13th February 2014 The State of Big Data Why probabilistic models for Big Data? 1. If you don t have to worry about
Multiple Imputation for Missing Data: A Cautionary Tale
Multiple Imputation for Missing Data: A Cautionary Tale Paul D. Allison University of Pennsylvania Address correspondence to Paul D. Allison, Sociology Department, University of Pennsylvania, 3718 Locust
LOGIT AND PROBIT ANALYSIS
LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 [email protected] In dummy regression variable models, it is assumed implicitly that the dependent variable Y
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios
Accurately and Efficiently Measuring Individual Account Credit Risk On Existing Portfolios By: Michael Banasiak & By: Daniel Tantum, Ph.D. What Are Statistical Based Behavior Scoring Models And How Are
Nominal and ordinal logistic regression
Nominal and ordinal logistic regression April 26 Nominal and ordinal logistic regression Our goal for today is to briefly go over ways to extend the logistic regression model to the case where the outcome
arxiv:1206.6666v1 [stat.ap] 28 Jun 2012
The Annals of Applied Statistics 2012, Vol. 6, No. 2, 772 794 DOI: 10.1214/11-AOAS521 In the Public Domain arxiv:1206.6666v1 [stat.ap] 28 Jun 2012 ANALYZING ESTABLISHMENT NONRESPONSE USING AN INTERPRETABLE
Statistics in Retail Finance. Chapter 7: Fraud Detection in Retail Credit
Statistics in Retail Finance Chapter 7: Fraud Detection in Retail Credit 1 Overview > Detection of fraud remains an important issue in retail credit. Methods similar to scorecard development may be employed,
Question 2 Naïve Bayes (16 points)
Question 2 Naïve Bayes (16 points) About 2/3 of your email is spam so you downloaded an open source spam filter based on word occurrences that uses the Naive Bayes classifier. Assume you collected the
These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher Bishop
Music and Machine Learning (IFT6080 Winter 08) Prof. Douglas Eck, Université de Montréal These slides follow closely the (English) course textbook Pattern Recognition and Machine Learning by Christopher
Generating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010
Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte
Latent Class Regression Part II
This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike License. Your use of this material constitutes acceptance of that license and the conditions of use of materials on this
Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
Data Preparation and Statistical Displays
Reservoir Modeling with GSLIB Data Preparation and Statistical Displays Data Cleaning / Quality Control Statistics as Parameters for Random Function Models Univariate Statistics Histograms and Probability
CORPORATE CREDIT RISK MODELING: QUANTITATIVE RATING SYSTEM AND PROBABILITY OF DEFAULT ESTIMATION
CORPORATE CREDIT RISK MODELING: QUANTITATIVE RATING SYSTEM AND PROBABILITY OF DEFAULT ESTIMATION João Eduardo Fernandes 1 April 2005 (Revised October 2005) ABSTRACT: Research on corporate credit risk modeling
Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management
Credit Scorecards for SME Finance The Process of Improving Risk Measurement and Management April 2009 By Dean Caire, CFA Most of the literature on credit scoring discusses the various modelling techniques
Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)
Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared
The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.
The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables Kathleen M. Lang* Boston College and Peter Gottschalk Boston College Abstract We derive the efficiency loss
Multiple Choice Models II
Multiple Choice Models II Laura Magazzini University of Verona [email protected] http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Multiple Choice Models II 1 / 28 Categorical data Categorical
IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD
REPUBLIC OF SOUTH AFRICA GOVERNMENT-WIDE MONITORING & IMPACT EVALUATION SEMINAR IMPACT EVALUATION: INSTRUMENTAL VARIABLE METHOD SHAHID KHANDKER World Bank June 2006 ORGANIZED BY THE WORLD BANK AFRICA IMPACT
Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables
Predicting Successful Completion of the Nursing Program: An Analysis of Prerequisites and Demographic Variables Introduction In the summer of 2002, a research study commissioned by the Center for Student
Revealing Taste-Based Discrimination in Hiring: A Correspondence Testing Experiment with Geographic Variation
D I S C U S S I O N P A P E R S E R I E S IZA DP No. 6153 Revealing Taste-Based Discrimination in Hiring: A Correspondence Testing Experiment with Geographic Variation Magnus Carlsson Dan-Olof Rooth November
Multivariate Statistical Inference and Applications
Multivariate Statistical Inference and Applications ALVIN C. RENCHER Department of Statistics Brigham Young University A Wiley-Interscience Publication JOHN WILEY & SONS, INC. New York Chichester Weinheim
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION Introduction In the previous chapter, we explored a class of regression models having particularly simple analytical
A Property & Casualty Insurance Predictive Modeling Process in SAS
Paper AA-02-2015 A Property & Casualty Insurance Predictive Modeling Process in SAS 1.0 ABSTRACT Mei Najim, Sedgwick Claim Management Services, Chicago, Illinois Predictive analytics has been developing
Response to Critiques of Mortgage Discrimination and FHA Loan Performance
A Response to Comments Response to Critiques of Mortgage Discrimination and FHA Loan Performance James A. Berkovec Glenn B. Canner Stuart A. Gabriel Timothy H. Hannan Abstract This response discusses the
Part 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus
Facebook Friend Suggestion Eytan Daniyalzade and Tim Lipus 1. Introduction Facebook is a social networking website with an open platform that enables developers to extract and utilize user information
Standard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
Disability and Job Mismatches in the Australian Labour Market
D I S C U S S I O N P A P E R S E R I E S IZA DP No. 6152 Disability and Job Mismatches in the Australian Labour Market Melanie Jones Kostas Mavromaras Peter Sloane Zhang Wei November 2011 Forschungsinstitut
Missing Data. A Typology Of Missing Data. Missing At Random Or Not Missing At Random
[Leeuw, Edith D. de, and Joop Hox. (2008). Missing Data. Encyclopedia of Survey Research Methods. Retrieved from http://sage-ereference.com/survey/article_n298.html] Missing Data An important indicator
Module 14: Missing Data Stata Practical
Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724
Missing data and net survival analysis Bernard Rachet
Workshop on Flexible Models for Longitudinal and Survival Data with Applications in Biostatistics Warwick, 27-29 July 2015 Missing data and net survival analysis Bernard Rachet General context Population-based,
Part 2: One-parameter models
Part 2: One-parameter models Bernoilli/binomial models Return to iid Y 1,...,Y n Bin(1, θ). The sampling model/likelihood is p(y 1,...,y n θ) =θ P y i (1 θ) n P y i When combined with a prior p(θ), Bayes
STATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
A Review of Methods for Missing Data
Educational Research and Evaluation 1380-3611/01/0704-353$16.00 2001, Vol. 7, No. 4, pp. 353±383 # Swets & Zeitlinger A Review of Methods for Missing Data Therese D. Pigott Loyola University Chicago, Wilmette,
Gerry Hobbs, Department of Statistics, West Virginia University
Decision Trees as a Predictive Modeling Method Gerry Hobbs, Department of Statistics, West Virginia University Abstract Predictive modeling has become an important area of interest in tasks such as credit
Application of discriminant analysis to predict the class of degree for graduating students in a university system
International Journal of Physical Sciences Vol. 4 (), pp. 06-0, January, 009 Available online at http://www.academicjournals.org/ijps ISSN 99-950 009 Academic Journals Full Length Research Paper Application
Data Mining: Overview. What is Data Mining?
Data Mining: Overview What is Data Mining? Recently * coined term for confluence of ideas from statistics and computer science (machine learning and database methods) applied to large databases in science,
Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors
Comparing Features of Convenient Estimators for Binary Choice Models With Endogenous Regressors Arthur Lewbel, Yingying Dong, and Thomas Tao Yang Boston College, University of California Irvine, and Boston
Joint Exam 1/P Sample Exam 1
Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question
1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).
Examples of Questions on Regression Analysis: 1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability). Then,. When
Analyzing Structural Equation Models With Missing Data
Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University [email protected] based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.
