# Calculating the Probability of Returning a Loan with Binary Probability Models

Save this PDF as:

Size: px
Start display at page:

Download "Calculating the Probability of Returning a Loan with Binary Probability Models"

## Transcription

1 Calculating the Probability of Returning a Loan with Binary Probability Models Associate Professor PhD Julian VASILEV ( Varna University of Economics, Bulgaria ABSTRACT The purpose of this article is to give a new approach in calculating the probability of returning a loan. A lot of factors affect the value of the probability. In this article by using statistical and econometric models some infl uencing factors are proved. The main approach is concerned with applying probit and logit models in loan management institutions. A new aspect of the credit risk analysis is given. Calculating the probability of returning a loan is a diffi cult task. We assume that specifi c data fi elds concerning the contract (month of signing, year of signing, given sum) and data fi elds concerning the borrower of the loan (month of birth, year of birth (age), gender, region, where he/ she lives) may be independent variables in a binary logistics model with a dependent variable the probability of returning a loan. It is proved that the month of signing a contract, the year of signing a contract, the gender and the age of the loan owner do not affect the probability of returning a loan. It is proved that the probability of returning a loan depends on the sum of contract, the remoteness of the loan owner and the month of birth. The probability of returning a loan increases with the increase of the given sum, decreases with the proximity of the customer, increases for people born in the beginning of the year and decreases for people born at the end of the year. Key words: probability models, probit model, logit model, econometrics, credit risk analysis, SPSS JEL classification: C Mathematical and Quantitative Methods INTRODUCTION Giving a loan is a risky operation. Credit risk analysis is usually focused on the factors influencing the decision of a credit institution to give or not to give a loan. Risk management calculates the portfolio selection. Calculating the probability of returning a loan may be done in two aspects full payment of a loan or partial payment. Making clusters of customers and making customer profiles are usual techniques. Financial analysis focuses on special instruments but not on the face-to-face connection between the person who gives the loan and the person who takes the loan. Expectations from both sides have to be evaluated. More over sound statistics and econometrics have Romanian Statistical Review nr. 4 /

2 to be applied in order to find an appropriate model which allows calculating the probability of returning a loan. Analysis of an actual economic phenomenon has to be made giving a loan. Appropriate methods of observation have to be applied. Mathematical and statistical tools have to be used to analyse an economic phenomenon. Empirical determination of economic laws is an appropriate approach in the studied case. The art of making social research of economic phenomena consists of standard methods as well as a great deal of creative inspiration. Finding some assumptions and checking their validity for a certain case study may lead to conclusions which may be applied to other money lending institutions. The right interpretation of available data will give the researcher the possibility to accept or reject initially formulated hypotheses. So calculating the probability of returning a loan is a specific econometrics research. Calculating the probability of returning a loan has several aspects: economic, mathematical and statistical. Economic theory is quite reserved in giving some dependencies between the probability of returning a loan and the factors affecting the decision of the credit institution. Economic theory says that some people are given loans, others not. Some loans are returned as whole, others partially, others not. This is the obvious situation in economics. We have to find some explanation of theses phenomena. Monetary theory as part of economic theory has an explanation. Since the country does not emit new banknotes, loan owners cannot pay the principal and the interest as a whole sum, because new banknotes are not emitted. The explanation is a bit rough, but it is clear. It does not describe the economics as a whole system with its specific features. The mathematical theory may build an equation with several independent variables (factors) and with one dependent variable (the resulting variable) which best describes the phenomenon. Since we have to calculate the probability of returning a loan, probit and logit models are the most appropriate ones. They are based on the binary regression. Since we may predict some but not all factors a coefficient of determination is calculated. According to Gujarati (2012) econometrics is mainly interested in empirical verification of economic theory. Data analysis is mainly concerned with collecting, processing, and presenting data in the form of charts and tables. Mathematical statistics gives a lot of examples in the sphere of trade, but not in loan estimation. The main methodology of an econometric research consists of the following steps: (1) defining a hypothesis, (2) making a mathematical model, (3) obtaining data, (4) an estimation of the parameters of the model, (5) testing of hypothesis, (6) making a prediction and (7) adapting the model into practice. 56 Romanian Statistical Review nr. 4 / 2014

3 It should be mentioned that the probability of returning a loan is a statistical phenomenon, not a deterministic one. The probability may depend on various factors such as sum of the loan, gender of the customer, age of the customer, year of contract, month of contract, region where the customer lives. All these data are stored in the databases of credit institutions. Historical data may be used to calculate the probability of returning a loan. The relationship between the independent (input) variables and the dependent (output) variable may be strong or weak. Statistics can never establish a causal connection. We have to make assumptions and check them. We may use correlation analysis to check the strength of an association between two variables for instance between the sum of the loan and the probability to be returned. In regression analysis we try to estimate the average value of one variable on the basis of the exact values of other variable. We may predict the probability of returning a loan only by the sum of contract. This is a one-factor model. We may continue with two-factor models by adding other variables to the model. We may continue with three-factor models by adding other variables to the model. In regression analysis the dependent variable is assumed to be random. It has to have a probability distribution. THEORETICAL BACKGROUND Probit and logit models are part of qualitative response regression models (QRRM). The dependent variable (the probability of returning a loan) has two values yes or no. The dependent variable is binary. QRRM are used in political science where voters have to choose between two parties. QRRM are used in social analysis of households. A family may has a house or it may not have a house. A certain medicine is effective or not for a certain illness. It should be marked that the regressand Y is qualitative. QRRM are also called probability models. The linear probability model (LPM) has the following equation: Y is the dependent variable. The values of X i are the different independents variables. The LPM seems to be like a standard regression model. The Y variable is dichotomous. Some problems with LPM have to be marked. It is assumed that u i values have normal distribution. But when the sample size is large the OLS (ordinary least squares) estimators are usually normally distributed. The disturbances in LPM are usually homoscedastic. The third problem is not fulfilment of 0<=Y i <=1. Calculating the R 2 of the [1] Romanian Statistical Review nr. 4 /

4 LPM is a disputable question. Some authors (John Aldrich and Forrest Nelson) think that the coefficient of determination (R 2 ) should not be calculated in qualitative dependent variable models. LPM is simple but not very correct. The prediction is that with the increase of X, Y also increases or decreases. So the problem is with the assumption that there is a linear dependency between X and Y. So we may assume that with the increase of the sum of a loan, the probability to be returned increases or decreases. The equation of the logistic distribution function (LDF) is the following: Where [2] P i is between 0 and 1. In the logistic distribution function B 1 and B 2 cannot be calculated by the OLS method. The LDF function works with grouped data showing the relative frequency. For each value of X total number of cases is counted and the number of cases with Y value equal to 1. Or in other words, for each level of X, we compute the probability of returning a loan. The LDF describes the probit model. It is sometimes called a normit model. A comparison of binary logit and probit models is done by Cakmakyapan and Goktas (2013). Both models are appropriate when the dependent variable is binary. They use Monte Carlo simulation to compare both models. Probit models in credit institutions are used by Teker et.al. (2013). They try to compose a new rating methodology and provide credit notches to 23 countries which of 13 are developed and 10 are emerging. They use 11 variables in the probit model. The factor analysis eliminates 5 of them. Logit regression is used in credit institutions to find the factors affecting the decision of customers to use e-banking (Annin, K. et. al, 2014). They used a sample with 241 customers. They used binary logistics regression to analyze responses. The most significant factors are the chosen bank, occupational status and monthly income. Probit models are used in loan repayment of yam farmers in Ghana (Wongnaa and Awunyo, 2013). A random sampling technique is used to select 100 farmers. A questionnaire is made to collect data. They found the factors affecting the loan repayment performance by the adoption of a probit model. They proved that the age has a positive effect on loan repayment. 58 Romanian Statistical Review nr. 4 / 2014

5 Estimating the coefficients in a binary regression may be made by their statistical significance. Some articles (Fitrianto and Cing, 2014) focus on theoretical aspects of binary logistics regression. METHODOLOGY OF THE RESEARCH AND INTERMEDIATE RESULTS Analyzing the structure of the dataset We have cross-sectional data from a credit institution. The dataset consists of 699 records for contracts signed between June 2013 and March The usual period of a contract is 30 days. Customers have to return the loan within 30 days. So we have actual data on the first of August Some of the fields in the dataset concern the contract: month of contract, year of contract, sum of contract, finished (yes or no). Other fields in the dataset concern the customer: age, gender, month of birth, region (where he/she lives). We use individual data (for the entire population). A population includes each element from the set of observations. The statistics term population represents all possible measurements. We do not make a sample. The inferences in this paper are limited to the observed credit institution. They may be valid for other credit institutions or not. Since credit institutions work with confidential data (names of customers, their addresses, number of contracts, credit history), we do not use customer specific data. We have individual (ungrouped) data. We may easily make pivot tables to get grouped data. Month of contract Part of the initial dataset with values of individual cases Year of contract Sum contract Finished (0 - false; 1 - true) Age Month of birth Gender (0-male; 1-female) Region Table 1 All the fields have numeric values. The region of the loan owner is saved in the database as the name of the town/village, where he/she lives. We have made coding of the region by replacing the text of the town/village with a corresponding number. People who live in the town of the credit institution have region 1 and those who live at most remote places 16. So the region coding shows the proximity to the credit institution with a numeric value on an ordinal scale. Romanian Statistical Review nr. 4 /

6 Using aggregated data Before testing the probit model we need to have aggregated data. We have computed aggregated data in Excel. Aggregated data version 1 Table 2 Sum Finished Total If we look at the first row, if the sum of contract is less than or equal to 100, 555 loans are given, 263 of them are returned. So the probability of returning a loan, if the sum of contract is less than or equal to 100, is 47%. The aggregated data are copied into SPSS (Hadzhiev et. al., 2009). The analysis starts with a probit model (Analyze/Regression/Probit). Parameters of probit analysis 1 within SPSS Figure 1 60 Romanian Statistical Review nr. 4 / 2014

7 The Chi-square tests show that the model is adequate. The probit parameters (sum and intercept) are significant at Alfa = Their significance is and Parameter estimates of the probit model 1 Table 3 95% Confidence Interval Parameter Estimate Std. Error Z Sig. Lower Upper Bound Bound PROBIT a sum,001,000 2,421,015,000,002 Intercept -,186,081-2,303,021 -,266 -,105 a. PROBIT model: PROBIT(p) = Intercept + BX The interpretation of the probit model estimates is the following. If the sum of loan increases with 100, the probability of returning it increases with 10% (0.001 * 100). Expected responses with the probit model are calculated by SPSS. PROBIT Number Cell count and residuals of probit model 1 sum Number of Subjects Observed Responses Expected Responses Residual Table 4 Probability 1 100, ,917 2,083, , ,341 2,659, , ,911-7,911, , ,819 2,181, , ,148 -,148, , ,103 -,103, , ,055,945, , ,758,242,758 If we look at the sum 300, we take into consideration loans with sums between 200 and 300. The empirical observation shows that 20 loans of 50 were returned or it is 40%. But the probit model calculates that the probability of returning a loan with sum of contract between 200 and 300 is 55.8%. It is obvious that most of the loans 555 of 699 are with sum less than or equal 100. It is appropriate to make a second probit model with loans below 100. In MS Excel we filter the dataset and we use a formula to calculate the rank. Romanian Statistical Review nr. 4 /

8 Filtering the dataset and calculating the rang Table 5 Sum_contract Finished (0 - false; 1 - true) Rank The formula for calculating the rank is the following: =ROUND(C3/10;0) C3 is the sum of contract for the first loan. With a Pivot table we compute total observed and finished. Row labels Aggregated dataset 2 for probit model 2 Count of finished (0 - false; 1 - true) Total observed (0 - false; 1 - true) Grand total Table 6 Now we may use again SPSS to calculate and test the probit model 2 (Analyze/Regression/Probit). 62 Romanian Statistical Review nr. 4 / 2014

9 Parameters of probit analysis 2 within SPSS Figure 2 The parameter estimates table shows the significance of the probit model parameters (rang and intercept). Both of them have significance greater than 0.05 (relatively and 0.894). The conclusion is that the second probit model cannot be made. So we cannot expect the probability of returning a loan to increase or decrease with the increase with 10 levs of the sum of contract. Our conclusion may be confirmed by the Cell counts and residual table for the second probit model. Cell counts and residual table for the second probit model Table 7 Number rang Number of Subjects Observed Responses Expected Responses Residual Probability 1 1, ,569-1,569, , ,153-1,153, , ,320-4,320, , ,123 2,877, , ,707 10,293, , ,663-3,663, , ,701-1,701, , ,805 3,195, , ,791-1,791, , ,165-2,165,448 It is obvious that for contracts with sums between 1 and 100 levs, the probability of returning a loan is between 44.8% and 49%. PROBIT Romanian Statistical Review nr. 4 /

10 Using cross-sectional data values of individual cases Estimating one-factor models The binary logistics model may be estimated in SPSS. We may use the values of individual cases instead of using aggregated data. Parameters of the binary logistics model in SPSS Figure 3 SPSS says Estimation terminated at iteration number 3 because parameter estimates changed by less than The variables in the equation are the following: B 0 = and B 1 = The significance of B 0 is The significance of B 1 is A binary logistics model with an independent variable sum of contract and including the constant in the equation is adequate. The binary logistics equation is mentioned in equation [2]. Where X is the sum of contract. For a loan of 100 levs (approximately EURO), the probability of returning it is computed in the following way. [3] [4] 64 Romanian Statistical Review nr. 4 / 2014

11 For a loan of 50 levs, the probability of returning it is computed in the following way. If we exclude the constant from the equation, the B 1 variable in the equation is not statistically significant (Sig. = 0.134). We may make a binary logistics regression with an independent variable region. Parameters of the binary logistics regression with an independent variable region Figure 4 [5] We try the regression including the constant in the equation. B 0 = B 1 = B 0 is not statistically significant (Sig = 0.285). B 1 is statistically significant (Sig = 0.016). We try the regression excluding the constant in the equation. B 1 = B 1 is statistically significant (Sig = 0.024). A binary logistics model with an independent variable region and excluding the constant in the equation is adequate. The equation is the following: Romanian Statistical Review nr. 4 /

12 [6] Where X is the region. In our case study the region changes from 1 to 16, where 1 is relative to people living in the town of the credit institution, and 16 for people living far away. For X = 1 the calculation is the following. So for people living in the town of the credit institution the probability of returning a loan is 46%. For X = 8 we calculate the probability of returning a loan. [7] For people living far from the credit institution, the probability of returning a loan decreases. А binary logistics model may be checked by entering the month of the contract as an independent variable. B 0 (Sig=0.513) and B 1 (Sig=0.659) are not statistically significant. The month of the contract cannot be an independent variable in a binary logistics model. А binary logistics model may be checked by entering the year of the contract as an independent variable. B 0 (Sig=0.489) and B 1 (Sig=0.489) are not statistically significant. The year of the contract cannot be an independent variable in a binary logistics model. А binary logistics model may be checked by entering the age as an independent variable. B 0 (Sig=0.122) and B 1 (Sig=0.076) are not statistically significant. The age cannot be an independent variable in a binary logistics model. [8] 66 Romanian Statistical Review nr. 4 / 2014

13 А binary logistics model may be checked by entering the month of birth as an independent variable. B 0 (Sig=0.031) and B 1 (Sig=0.006) are statistically significant. The month of birth can be an independent variable in a binary logistics model. B 0 = B 1 = The equation is the following: Where X is the month of birth. In our case study the month of birth changes from 1 to 12, where 1 is relative to people born in January and 12 for people born in December. For X = 1 (people born in the first month of the year) we calculate the probability of returning a loan. [9] [10] loans. 57% of the people born in January are expected to return their For X = 6 (people born in the sixth month of the year) we calculate the probability of returning a loan. [11] 50% of the people born in June are expected to return their loans. А binary logistics model may be checked by entering the gender as an independent variable. B 0 (Sig=0.097) is not significant, but B 1 (Sig=0.020) is statistically significant. The gender cannot be an independent variable in a binary logistics model. Romanian Statistical Review nr. 4 /

14 Estimating a three-factor model We may conclude that the sum of contract, the region and the month of birth influence in great extend the probability of returning a loan. We may try to make a three factor binary logistics model by including and by excluding the constant in the equation. Step 1 a A three factor binary logistics model including the constant in the equation Table 8 B S.E. Wald df Sig. Exp(B) region -,069,026 6,936 1,008,934 month_of_birth -,056,021 6,915 1,009,945 sum_of_contract,002,001 6,646 1,010 1,002 Constant,328,176 3,468 1,063 1,388 a. Variable(s) entered on step 1: sum_of_contract, region, month_of_birth. The constant is not statistically significant (Sig=0.63). The other variables (B 1, B 2 and B 3 ) are statistically significant. We may not make an equation. We may try a three factor model by excluding the constant from the equation. A three factor model by excluding the constant from the equation Table 9 Variables in the Equation B S.E. Wald df Sig. Exp(B) Step 1 a region -,056,025 4,979 1,026,946 sum_of_contract,002,001 10,619 1,001 1,002 month_of_birth -,025,013 3,579 1,059,975 a. Variable(s) entered on step 1: sum_of_contract, region, month_of_birth. We may construct an equation at Alfa = [12] Where X 1 is the month of birth, X 2 is the region and X 3 is the sum of contract. A numeric example may be given. For instance a borrower, born in June (X 1 = 6), living in the town of the credit institution (X 2 = 1), wants to take a loan of 100 levs (X 3 = 100). 68 Romanian Statistical Review nr. 4 / 2014

15 1.806 [13] The probability to return the loan for the mentioned borrower (born in June, living in the town of the credit institution and wants to take a loan of 100 levs) is 86%. RESULTS If we look at the signs of B 1, B 2 and B 3 in the equation, we may conclude the following. With the increase of the month of birth, the probability of returning a loan decreases ( is a negative number). With the increase of the remoteness of a loan owner, the probability of returning it decreases ( is a negative number). With the increase of given sum, the probability of returning a loan increases (0.02 is a positive number). The conducted methodology of research in this paper proves that the month of signing a contract, the year of signing a contract, the gender and the age of the loan owner do not affect the probability of returning a loan. The probability of returning a loan depends on the sum of contract, the remoteness of the loan owner and the month of birth. The probability of returning a loan increases with the increase of the given sum, decreases with the proximity of the customer, increases for people born in the beginning of the year and decreases for people born at the end of the year. People born in the beginning of the year are more willing to return their loan than those who are born at the end of the year. People who live in the town of the credit institution are more inclined to return their loans than people living far away. Loans with comparatively greater sums are more likely to be returned that loans with comparatively small amount of money. CONCLUSIONS Calculating the probability of returning a loan may be done by using the whole dataset from a loan institution (micro data for the entire population). Applying statistical and econometric methods the probability may be calculated. The most appropriate method concerns the qualitative response regression models. The simplest model is the linear probability model (LPM). Taking into consideration the problems with the LPM some other models may be used such as logit and probit models. Data at individual level and grouped data are analyzed. Romanian Statistical Review nr. 4 /

16 It is proved that the month of signing a contract, the year of signing a contract, the gender and the age of the loan owner do not affect the probability of returning a loan. It is proved that the probability of returning a loan depends on the sum of contract, the remoteness of the loan owner and the month of birth. The probability of returning a loan increases with the increase of the given sum, decreases with the proximity of the customer, increases for people born in the beginning of the year and decreases for people born at the end of the year. The analysis conducted in this paper is quantitative but several qualitative conclusions may be brought out. People born in the beginning of the year are more willing to return their loan than those who are born at the end of the year. People who live in the town of the credit institution are more inclined to return their loans than people living far away. Loans with comparatively greater sums are more likely to be returned that loans with comparatively small amount of money. Future research may be focused on the applying a tobit model or an ordered probit model or a nominal probit model on the same dataset. Duration models are also appropriate and applicable on a dataset containing data for given loans. Instead of calculating the probability of returning a loan with a duration model we can find the most important factors determining the duration of a loan contract. The inferences in this paper are limited to the observed credit institution. They may be valid for other credit institutions or not. Future research may be focused on the analysis of datasets of several credit institutions operating within a geographic region. The same hypotheses may be tested with the same methodology. In this case a sampling method may be applied. The conclusions will be valid for several credit institutions, not only for one as it is in this paper. 70 Romanian Statistical Review nr. 4 / 2014

17 REFERENCES 1. Annin, K. and others. (2014). Applying Logistic Regression to E-Banking Usage in Kumasi Metropolis, Ghana. International Journal of Marketing Studies; Vol. 6, No. 2, pp Cakmakyapan, S. and Goktas, A. (2013). A comparison of binary logit and probit models with a simulation study. Journal of social and economic statistics, Vol. 2, No 1, pp Fitrianto, A. and Cing, N Empirical Distributions of Parameter Estimates in Binary Logistic Regression Using Bootstrap. Int. Journal of Math. Analysis, Vol. 8, no. 15, Gujarati, D. (2004). Basic econometrics. 4 th edition Available: vse.cz/english/wp-content/uploads/2012/08/basic-econometrics.pdf < >. 5. Hadzhiev, V. and others. (2009). Statistical and econometric software, Varna: Nauka I ikonomika (in Bulgarian). 6. Teker, D. et.al. (2013). Determination of Sovereign Rating: Factor Based Ordered Probit Models for Panel Data Analysis Modelling Framework. International Journal of Economics and Financial Issues. Vol. 3, No. 1, pp Wongnaa, C. and Awunyo-Vitor, D. (2013). Factors Affecting Loan Repayment Performance Among Yam Farmers in the Sene District, Ghana. Agris on-line Papers in Economics and Informatics, Vol. 5, N 2, p Romanian Statistical Review nr. 4 /

### Time series analysis in loan management information systems

Theoretical and Applied Economics Volume XXI (2014), No. 3(592), pp. 57-66 Time series analysis in loan management information systems Julian VASILEV Varna University of Economics, Bulgaria vasilev@ue-varna.bg

### Module 9: Nonparametric Tests. The Applied Research Center

Module 9: Nonparametric Tests The Applied Research Center Module 9 Overview } Nonparametric Tests } Parametric vs. Nonparametric Tests } Restrictions of Nonparametric Tests } One-Sample Chi-Square Test

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### 2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

### 11. Analysis of Case-control Studies Logistic Regression

Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

### Comparison of sales forecasting models for an innovative agro-industrial product: Bass model versus logistic function

The Empirical Econometrics and Quantitative Economics Letters ISSN 2286 7147 EEQEL all rights reserved Volume 1, Number 4 (December 2012), pp. 89 106. Comparison of sales forecasting models for an innovative

### Chapter 1 Introduction to Econometrics

Chapter 1 Introduction to Econometrics Econometrics deals with the measurement of economic relationships. It is an integration of economics, mathematical economics and statistics with an objective to provide

### Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study)

Cairo University Faculty of Economics and Political Science Statistics Department English Section Students' Opinion about Universities: The Faculty of Economics and Political Science (Case Study) Prepared

Mplus Short Courses Topic 2 Regression Analysis, Eploratory Factor Analysis, Confirmatory Factor Analysis, And Structural Equation Modeling For Categorical, Censored, And Count Outcomes Linda K. Muthén

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### Introduction to Regression and Data Analysis

Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

### Simple Predictive Analytics Curtis Seare

Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use

### Ordinal Regression. Chapter

Ordinal Regression Chapter 4 Many variables of interest are ordinal. That is, you can rank the values, but the real distance between categories is unknown. Diseases are graded on scales from least severe

### Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

### CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples

### THE IMPACT OF MACROECONOMIC FACTORS ON NON-PERFORMING LOANS IN THE REPUBLIC OF MOLDOVA

Abstract THE IMPACT OF MACROECONOMIC FACTORS ON NON-PERFORMING LOANS IN THE REPUBLIC OF MOLDOVA Dorina CLICHICI 44 Tatiana COLESNICOVA 45 The purpose of this research is to estimate the impact of several

### User Behaviour on Google Search Engine

104 International Journal of Learning, Teaching and Educational Research Vol. 10, No. 2, pp. 104 113, February 2014 User Behaviour on Google Search Engine Bartomeu Riutord Fe Stamford International University

### COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

### LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

### When to use Excel. When NOT to use Excel 9/24/2014

Analyzing Quantitative Assessment Data with Excel October 2, 2014 Jeremy Penn, Ph.D. Director When to use Excel You want to quickly summarize or analyze your assessment data You want to create basic visual

### Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

### Association Between Variables

Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

### Directions for using SPSS

Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

### Bivariate Statistics Session 2: Measuring Associations Chi-Square Test

Bivariate Statistics Session 2: Measuring Associations Chi-Square Test Features Of The Chi-Square Statistic The chi-square test is non-parametric. That is, it makes no assumptions about the distribution

### SocioBrains NETWORK SECURITY BY ANALISIS OF LOG FILES OF APACHE WEB SERVER

NETWORK SECURITY BY ANALISIS OF LOG FILES OF APACHE WEB SERVER Julian Vasilev Associate professor PhD Department of Informatics Varna University of Economics BULGARIA vasilev@ue-varna.bg ABSTRACT: The

### End-User Satisfaction in ERP System: Application of Logit Modeling

Applied Mathematical Sciences, Vol. 8, 2014, no. 24, 1187-1192 HIKARI Ltd, www.m-hikari.com http://dx.doi.org/10.12988/ams.2014.4284 End-User Satisfaction in ERP System: Application of Logit Modeling Hashem

### Chapter 7. One-way ANOVA

Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

### Linear Regression in SPSS

Linear Regression in SPSS Data: mangunkill.sav Goals: Examine relation between number of handguns registered (nhandgun) and number of man killed (mankill) checking Predict number of man killed using number

### Poisson Models for Count Data

Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the

### From the help desk: Bootstrapped standard errors

The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Intro to Data Analysis, Economic Statistics and Econometrics

Intro to Data Analysis, Economic Statistics and Econometrics Statistics deals with the techniques for collecting and analyzing data that arise in many different contexts. Econometrics involves the development

### One-Way Analysis of Variance

One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

### SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

### An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

### Statistics in Retail Finance. Chapter 2: Statistical models of default

Statistics in Retail Finance 1 Overview > We consider how to build statistical models of default, or delinquency, and how such models are traditionally used for credit application scoring and decision

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.

Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under

### Section Format Day Begin End Building Rm# Instructor. 001 Lecture Tue 6:45 PM 8:40 PM Silver 401 Ballerini

NEW YORK UNIVERSITY ROBERT F. WAGNER GRADUATE SCHOOL OF PUBLIC SERVICE Course Syllabus Spring 2016 Statistical Methods for Public, Nonprofit, and Health Management Section Format Day Begin End Building

### SPSS for Exploratory Data Analysis Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav)

Data used in this guide: studentp.sav (http://people.ysu.edu/~gchang/stat/studentp.sav) Organize and Display One Quantitative Variable (Descriptive Statistics, Boxplot & Histogram) 1. Move the mouse pointer

### Research Methods & Experimental Design

Research Methods & Experimental Design 16.422 Human Supervisory Control April 2004 Research Methods Qualitative vs. quantitative Understanding the relationship between objectives (research question) and

### CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES

Examples: Monte Carlo Simulation Studies CHAPTER 12 EXAMPLES: MONTE CARLO SIMULATION STUDIES Monte Carlo simulation studies are often used for methodological investigations of the performance of statistical

### The Probit Link Function in Generalized Linear Models for Data Mining Applications

Journal of Modern Applied Statistical Methods Copyright 2013 JMASM, Inc. May 2013, Vol. 12, No. 1, 164-169 1538 9472/13/\$95.00 The Probit Link Function in Generalized Linear Models for Data Mining Applications

### Chi Square Tests. Chapter 10. 10.1 Introduction

Contents 10 Chi Square Tests 703 10.1 Introduction............................ 703 10.2 The Chi Square Distribution.................. 704 10.3 Goodness of Fit Test....................... 709 10.4 Chi Square

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### Statistical tests for SPSS

Statistical tests for SPSS Paolo Coletti A.Y. 2010/11 Free University of Bolzano Bozen Premise This book is a very quick, rough and fast description of statistical tests and their usage. It is explicitly

### Household debt and spending in the United Kingdom

Household debt and spending in the United Kingdom Phil Bunn and May Rostom Bank of England Norges Bank Workshop: 24 March 2015 1 Outline Motivation Literature/theory Data/methodology Econometric results

### Factor B: Curriculum New Math Control Curriculum (B (B 1 ) Overall Mean (marginal) Females (A 1 ) Factor A: Gender Males (A 2) X 21

1 Factorial ANOVA The ANOVA designs we have dealt with up to this point, known as simple ANOVA or oneway ANOVA, had only one independent grouping variable or factor. However, oftentimes a researcher has

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Marginal Person. Average Person. (Average Return of College Goers) Return, Cost. (Average Return in the Population) (Marginal Return)

1 2 3 Marginal Person Average Person (Average Return of College Goers) Return, Cost (Average Return in the Population) 4 (Marginal Return) 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27

### January 26, 2009 The Faculty Center for Teaching and Learning

THE BASICS OF DATA MANAGEMENT AND ANALYSIS A USER GUIDE January 26, 2009 The Faculty Center for Teaching and Learning THE BASICS OF DATA MANAGEMENT AND ANALYSIS Table of Contents Table of Contents... i

### APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING

APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract

### Econometric Analysis of Cross Section and Panel Data Second Edition. Jeffrey M. Wooldridge. The MIT Press Cambridge, Massachusetts London, England

Econometric Analysis of Cross Section and Panel Data Second Edition Jeffrey M. Wooldridge The MIT Press Cambridge, Massachusetts London, England Preface Acknowledgments xxi xxix I INTRODUCTION AND BACKGROUND

### Factors affecting online sales

Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

### SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

### Northumberland Knowledge

Northumberland Knowledge Know Guide How to Analyse Data - November 2012 - This page has been left blank 2 About this guide The Know Guides are a suite of documents that provide useful information about

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### Independent t- Test (Comparing Two Means)

Independent t- Test (Comparing Two Means) The objectives of this lesson are to learn: the definition/purpose of independent t-test when to use the independent t-test the use of SPSS to complete an independent

### Pondicherry University 605014 India- Abstract

International Journal of Management and International Business Studies. ISSN 2277-3177 Volume 4, Number 3 (2014), pp. 309-316 Research India Publications http://www.ripublication.com Management Information

### Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

### Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

### Bivariate Regression Analysis. The beginning of many types of regression

Bivariate Regression Analysis The beginning of many types of regression TOPICS Beyond Correlation Forecasting Two points to estimate the slope Meeting the BLUE criterion The OLS method Purpose of Regression

### Generalized Linear Models

Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

### Multinomial and Ordinal Logistic Regression

Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,

### Module 4 - Multiple Logistic Regression

Module 4 - Multiple Logistic Regression Objectives Understand the principles and theory underlying logistic regression Understand proportions, probabilities, odds, odds ratios, logits and exponents Be

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

### ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R.

ANALYSING LIKERT SCALE/TYPE DATA, ORDINAL LOGISTIC REGRESSION EXAMPLE IN R. 1. Motivation. Likert items are used to measure respondents attitudes to a particular question or statement. One must recall

### INTRODUCTORY STATISTICS

INTRODUCTORY STATISTICS FIFTH EDITION Thomas H. Wonnacott University of Western Ontario Ronald J. Wonnacott University of Western Ontario WILEY JOHN WILEY & SONS New York Chichester Brisbane Toronto Singapore

### Web Service for Calculating the Probability of Returning a Loan Design, Implementation and Deployment

Informatica Economică vol. 18, no. 4/2014 111 Web Service for Calculating the Probability of Returning a Loan Design, Implementation and Deployment Julian VASILEV Varna University of Economics, Varna,

### LOGISTIC REGRESSION ANALYSIS

LOGISTIC REGRESSION ANALYSIS C. Mitchell Dayton Department of Measurement, Statistics & Evaluation Room 1230D Benjamin Building University of Maryland September 1992 1. Introduction and Model Logistic

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### IBM SPSS Statistics for Beginners for Windows

ISS, NEWCASTLE UNIVERSITY IBM SPSS Statistics for Beginners for Windows A Training Manual for Beginners Dr. S. T. Kometa A Training Manual for Beginners Contents 1 Aims and Objectives... 3 1.1 Learning

### Technology Step-by-Step Using StatCrunch

Technology Step-by-Step Using StatCrunch Section 1.3 Simple Random Sampling 1. Select Data, highlight Simulate Data, then highlight Discrete Uniform. 2. Fill in the following window with the appropriate

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.)

Unit 12 Logistic Regression Supplementary Chapter 14 in IPS On CD (Chap 16, 5th ed.) Logistic regression generalizes methods for 2-way tables Adds capability studying several predictors, but Limited to

### Regression Analysis. Data Calculations Output

Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### ECON 523 Applied Econometrics I /Masters Level American University, Spring 2008. Description of the course

ECON 523 Applied Econometrics I /Masters Level American University, Spring 2008 Instructor: Maria Heracleous Lectures: M 8:10-10:40 p.m. WARD 202 Office: 221 Roper Phone: 202-885-3758 Office Hours: M W

### Bivariate Analysis. Correlation. Correlation. Pearson's Correlation Coefficient. Variable 1. Variable 2

Bivariate Analysis Variable 2 LEVELS >2 LEVELS COTIUOUS Correlation Used when you measure two continuous variables. Variable 2 2 LEVELS X 2 >2 LEVELS X 2 COTIUOUS t-test X 2 X 2 AOVA (F-test) t-test AOVA

### Recommend Continued CPS Monitoring. 63 (a) 17 (b) 10 (c) 90. 35 (d) 20 (e) 25 (f) 80. Totals/Marginal 98 37 35 170

Work Sheet 2: Calculating a Chi Square Table 1: Substance Abuse Level by ation Total/Marginal 63 (a) 17 (b) 10 (c) 90 35 (d) 20 (e) 25 (f) 80 Totals/Marginal 98 37 35 170 Step 1: Label Your Table. Label

### Introduction to Quantitative Methods

Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

### Statistics and research

Statistics and research Usaneya Perngparn Chitlada Areesantichai Drug Dependence Research Center (WHOCC for Research and Training in Drug Dependence) College of Public Health Sciences Chulolongkorn University,

### Categorical Data Analysis

Richard L. Scheaffer University of Florida The reference material and many examples for this section are based on Chapter 8, Analyzing Association Between Categorical Variables, from Statistical Methods

### Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com

SPSS-SA Silvermine House Steenberg Office Park, Tokai 7945 Cape Town, South Africa Telephone: +27 21 702 4666 www.spss-sa.com SPSS-SA Training Brochure 2009 TABLE OF CONTENTS 1 SPSS TRAINING COURSES FOCUSING

### ASSESSING FINANCIAL EDUCATION: EVIDENCE FROM BOOTCAMP. William Skimmyhorn. Online Appendix

ASSESSING FINANCIAL EDUCATION: EVIDENCE FROM BOOTCAMP William Skimmyhorn Online Appendix Appendix Table 1. Treatment Variable Imputation Procedure Step Description Percent 1 Using administrative data,

### Teaching model: C1 a. General background: 50% b. Theory-into-practice/developmental 50% knowledge-building: c. Guided academic activities:

1. COURSE DESCRIPTION Degree: Double Degree: Derecho y Finanzas y Contabilidad (English teaching) Course: STATISTICAL AND ECONOMETRIC METHODS FOR FINANCE (Métodos Estadísticos y Econométricos en Finanzas

Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.

### Simulating Chi-Square Test Using Excel

Simulating Chi-Square Test Using Excel Leslie Chandrakantha John Jay College of Criminal Justice of CUNY Mathematics and Computer Science Department 524 West 59 th Street, New York, NY 10019 lchandra@jjay.cuny.edu

### Chapter Four. Data Analyses and Presentation of the Findings

Chapter Four Data Analyses and Presentation of the Findings The fourth chapter represents the focal point of the research report. Previous chapters of the report have laid the groundwork for the project.

### Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This