Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY


 Mitchell Greene
 3 years ago
 Views:
Transcription
1 Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship between race and the existence and amount of credit card debt, along with the associations between debt and secondary variables such as income, age, education, and kids. The analysis was split into three stages by first conducting a logistical regression on data obtained from a survey conducted by the Federal Reserve, then performing a logtransformed multiple linear regression on the data, and finally incorporating interaction terms into the multiple linear regression. According to the logistical regression, income and age were negatively correlated with taking on credit card debt, while variables kids and education were positively associated with taking on debt; racial categories AfricanAmerican and Hispanic were not statistically significant and the racial category Other was less likely to take credit card debt compared to whites with all else held constant. The logtransformed, multiple linear regression demonstrated that kids, education, and age are positively associated with the amount of credit card debt. The racial category for blacks was negatively correlated with amount of credit card debt compared to whites while Hispanic and Other were not statistically significant. Considering interaction terms, an increase in income had a greater effect (positive coefficient) for blacks than it would for whites, and a similar relationship was observed for Hispanics. However, Other has a negative coefficient, demonstrating the effect of income on credit card balance for Other is less than the effect for the racial category white. This study may ultimately expose insights as to how race can be correlated with patterns of credit card spending and lead to future research into the nuances of immigrant versus domestic categories of the same race. INTRODUCTION: This study involved an investigation of how race and other factors may impact creditcard debt and if applicable, the extent of the accumulated debt. The purpose of the initial, logistical regression run in our twopart study was to determine the relationship between race as the explanatory variable and the existence of credit card debt. To control for potential confounding variables that may distort this association, income, number of kids, level of education, and age were also included as explanatory variables and their relationship with the existence of credit card debt was evaluated. The hypothesis presented for the first part of this report predicted a negative association between income and credit card balance; that is, a higher level of income would be correlated with a lower probability of having credit card debt. Number of kids was predicted to be positively associated with the existence of credit card debt, while education and age are predicted to have a negative association with the response variable. The second stage of the study was conducted in order to determine the association between the same set of explanatory variables and the amount of credit card debt accumulated given that an unpaid balance existed. Similar to the first part of the study, the primary relationship explored was the potential effect of race on the quantity of credit card debt. The hypotheses for this stage predicted a negative association between income and credit card balance; that is, a higher level of income would be correlated with a lower credit card balance. Number of kids was predicted to be positively associated with the amount of debt, while education and age were predicted to have negative
2 association with amount of debt. The possible interactions that may exist between explanatory variables were also explored within this report. The analysis conducted within this report would likely be of great interest to credit card companies in terms of evaluating the likelihood that their customers might accrue unpaid credit card balances. Although race was the principal factor considered, the other explanatory variables included may also provide insight into how credit card debt may be determined. Given the recent financial crisis and the lingering impact on consumers, the hope is that this report may shed light on the various factors associated with the existence and accumulation of credit card debt. METHODS: The data used in this report was obtained from the Federal Reserve s 2007 Survey of Consumer Finances in which 4,418 families were surveyed. The response variable ccbal represents credit card balance in dollars and five explanatory variables were examined. The primary explanatory variable of interest, race, was divided into subcategories of white, AfricanAmerican/black, Hispanic, and Other (the racial category Asian was incorporated into Other in the original data set). Income was defined as the total amount of income of the household in dollars, while the variable kids represented the total number of children in the household. The variable titled EDUC indicated the total number of years of education completed by the head of the household. Finally, the explanatory variable age represented the age of the head of household. Although the survey data included many other explanatory variables, the aforementioned predictors were chosen based on their relevance to daily spending and interests of the authors of this report. Statistical analysis was conducted on the data compilation using the statistical package StataSE. A logistic regression was run in order to investigate which of the variables accounted for an individual s likelihood to take on credit card debt, a binary response variable (the subcategories of race were incorporated into the model as dummy variables). Next, a multiple linear regression was run to determine how the predictor variables affected the amount of credit card debt incurred, given an individual had credit card debt, which was executed by setting the parameter ccal>0. Since the survey data was rightskewed, a log transformation was performed on ccbal to minimize the scale during the multiple linear regression. The criterion for statistical significance of variables for both models was p<0.05. Possible interactions between predictor variables were also addressed. Since the effect of race was the primary concern of the report, twelve regressions were run that each addressed an interaction between either AfricanAmerican, Hispanic, or Other (white being the baseline for all comparison) and education, age, kids, or income. Several diagnostics tests were conducted to evaluate the regression models. Heteroskedasticity was tested for using the BreuschPagan/CookWeisberg test given ccbal>0. The ShapiroFrancia normality test and Wilkes normality test were then executed to verify normality. RESULTS: Part 1: Logistic Regression The first stage of this study used a logistic regression in order to determine who might take on credit card debt (Y=1) and who would not (Y=0).
3 From Table 1, the logistic regression model is as follows: P(Y=1) = exp( e07*income *kids *education *age *black *Hispanic *other) / [1 + exp( e07*income *kids *education *age *black *Hispanic *other)] Race Black Significance Pvalue = 0.394, not significant. Hispanic Pvalue = 0.223, not significant. Other Pvalue = 0.004, significant. The variables black and Hispanic were not statistically significant in the model, as the pvalues for their coefficients are above This demonstrates that blacks and Hispanics are not any more likely to take on credit card debt than whites, holding all other x variables constant. For the racial category Other (Asians, etc.), the variable was found to be significant in the model and the coefficient was negative, demonstrating that this racial category would be less likely to take on debt than whites (and blacks and Hispanics), holding all else constant. As expected, the coefficient for income was negative, demonstrating that an increase in income is associated with being less likely to take on debt, with all else constant. Also as expected, the coefficient for kids was positive and the coefficient for age was negative, demonstrating that as the number of kids increases, the probability of taking on credit card debt increases (holding all else constant), and as age increases, the probability of taking on credit card debt decreases (controlling for all other variables). The positive coefficient for education did not correspond to the original hypothesis that an increase in education would be related to a lower likelihood of debt; instead, an increase in years of education is associated with a greater likelihood of debt. Part 2: Linear Regression Due to the rightskewed nature of the response variable, credit card balance (ccbal), the y variable was transformed using a logarithmic (base 10) transformation. A multiple regression was run with log10ccbal and the five x variables: race, income, number of kids, years of education, and age. The results are presented in Table 1. The estimated regression model from Table 2 is as follows: Log10ccbal = e09*income *kids *education *age *black *Hispanic *other In this linear regression model, the variables kids, education, age, and racial subcategory black were significant factors with p value <.01, whereas income (p=0.136), Hispanic (p=0.518) and Other (p=0.977) are insignificant. Among the significant predictors, kids, education and age are positively correlated with credit card balance, whereas black is negatively correlated. For every child a white family has, credit card balance will be multiplied by 10^0.05 = 1.12, holding all other factors constant. For every 1 year
4 increase in years of education, credit card balance will experience a multiplicative increase by 10^(.151) = For every 1 year increase in years of age, white has a 10^(.00386) = 1.01 multiplicative increase in debt, all else being the same, and blacks take on 10^(0.196) = 1.57 times less debt than whites, given that all other factors are the same. Part 3: Interaction Terms A. Interaction between Income and Race Race Black Significance P < 0.001, significant Hispanic P = 0.019, significant Other P < 0.001, significant Regression formula from Table 3.1 [See Appendix]: log10ccbal = e09*income *kids *edu *age .307*black *Hispanic *other e06*income_black. Regression formula from Table 3.2 [See Appendix]: log10ccbal = e09*income *kids *edu *age .189*black *Hispanic *other e07*income_hispanic. The interaction term income_black has a positive coefficient and is significant with pvalue less than 0.001, showing that an increase in income for blacks increases credit card balance by a slight but statistically significant multiplicative value compared to whites, all else constant. There is a similar result for income_hispanic as income_black. Regression formula from Table 3.3 [See Appendix]: log10ccbal = e09*income *kids *edu *age .187*black *Hispanic +.144*other e06*income_other. The interaction term income_other has a negative coefficient and is significant with pvalue less than 0.001, showing that an increase in income for racial category Other decreases credit card balance by a slight but statistically significant multiplicative value compared to whites, with all else constant. B. Interaction between Kids and Race Race Black Significance P < 0.001, significant Hispanic P = 0.4, not significant Other P = 0.25, not significant
5 The interaction term kids_black has a negative coefficient and is significant with pvalue less than 0.001, showing that an additional child for blacks decreases credit card balance by a multiple of 10^.137 =0.729 compared to whites all else constant. The effects for Hispanic and Other are not significant, demonstrating that the effects of kids on credit card balance for these groups is about the same as whites. C. Interaction between Education and Race: Race Black Significance P = 0.005, significant Hispanic P < 0.001, significant Other P = 0.731, not significant The interaction term education_black has a negative coefficient and is significant showing that an additional year of education for blacks decreases credit card balance by a multiple of 10^ =.935 compared to whites, with all else constant. The effect of Hispanic is about the same as the effect just discussed for Blacks. The effect Other is not significant, demonstrating that the effects of education on credit card balance for Other is about the same as whites. D. Interaction between Age and Race: Race Black Significance P < 0.001, significant Hispanic P =0.011, significant Other P = 0.971, not significant The interaction term age_black has a coefficient and is significant showing that an additional year of age for blacks increases credit card balance by a multiple of 10^.00871= 1.02 compared to whites, with all else constant. The effect of Hispanic is about the same as the effect just discussed for Blacks. The effect of Other is not significant, demonstrating that the effect of age on credit card balance for Others is about the same as whites. CONCLUSION/DISCUSSION In this study, a logarithmic transformation was originally performed on the data set obtained from the Federal Reserve since the data was very rightskewed. Furthermore, since the validity of the model may be affected by the normality of values, two normality tests were performed to see if the residuals were normally distributed (see Table 4 in Appendix). The ShapiroFrancia normality test and Wilkes normality test yielded two different results (See Appendix), and the histogram (Graph 1 in Appendix) shows that the residuals are not normally distributed. This may be due to the huge sample size and the exclusion of other factors not included in this model, such as personal preference. When
6 considering the predictive power of the current models, these diagnostic results should be taken into account. The model also demonstrates that the residuals have constant variance according to the Breusch Pagan / CookWeisberg test (hettest) for heteroskedasticity, which had a pvalue for the multiple linear regression. Since this p value>0.05, the null hypothesis of the heteroskedastic test that there is constant variance fails to be rejected. Therefore, the multiple linear regression has constant variance and is not heteroskedastic. The scatter plot of residuals against fitted values looks normal except for an abnormal straight line. Independence among the samples is implied in the model. Multicollinearity was tested for by correlating the explanatory variables against each other (Table 5 for Appendix). The largest correlation value amongst the explanatory variables is between age and kids, which is and falls below the 0.5 cutoff for very significant multicollinearity. Hence, it can be said that though there are some collinear relationships among the explanatory variables, they are not significant enough to severely affect our model. In the linear regression model, income is surprisingly not significant. This may be due to collinearity that may exist among the explanatory variables. For example, if income can be partially explained by education and age, then it may lose its explanatory power for credit card balance. According to the model, kids, education and age have positive correlation with the response variable. One explanation may be that as families have more children, consumption increases and they take on more credit card debt. Contradicting the original hypotheses, the more years of education and older one gets, the more outstanding credit card debt one has (although age had a negative association with the existence of debt, it had a positive association with the accumulation of credit card debt). This may be due to the fact that educated individuals may rely on higher, consistent income sources, so they have a greater ability to pay back their debt in the future, and consume more as a result. Furthermore, financial concerns may increase with age as people being to pay for cars, materials for their children and for house mortgages. These factors may all contribute to credit card debt. One interesting observation in this model is that, while Hispanics and Others (Asians) are not significant, black is significantly negatively correlated with credit card debt. This might indicate that Others and Hispanics consumption patterns are roughly the same compared to white if they take on debt, whereas blacks take on less debt. Given the negative coefficients for the interaction terms, this would seem to indicate that blacks spend less on kids and education, so they have less outstanding credit card debt. A logistical regression was conducted to investigate how each explanatory variable contributes to explaining the chance of taking on credit card debt. In this model, Other becomes a significant factor along with income, kids, education and age; whereas black and Hispanic continue to be insignificant. One interesting observation made is that an Asian family is less likely to take on debt compared to a white family when all other factors are held constant. Other races have roughly the same chance of taking on debt as white families. The addition of an interaction term between income and race in the linear regression model indicates that there are positive correlations between income_hispanics and debt and income_black and debt, while there is a negative correlation between income_other and debt. One likely explanation for the aforementioned observations may be cultural differences. For instance, the regressions performed would lend themselves to the broad interpretation that Asians tend to save money (relative to other races) and use cash or debit cards instead of credit cards, whereas other racial groups may be more comfortable with using credit cards or spending ahead of time. As their income increases, Asians are more likely to save and spend less
7 compared to whites, while Hispanics and blacks tend to spend more comparatively. Therefore, this model mainly shows the differences in consumption pattern and credit card debt holding between the racial category Other and the various races Hispanic, black, and white considered. One possible option for future studies could be comparing Asian immigrants and Americanborn Asians credit card debt holding behaviors. The current study would indicate that Asians are less likely to take on debt, but if they do, they take on roughly the same debt as Whites. Further research could be done to see if Asian immigrants tend not to take on any credit card debt, while Americanborn Asians share the similar debtholding behaviors with other racial groups in the U.S.
8 Appendix: Table 1: Logistic Regression. logit logisticccbal income kids educ age black hispanic other Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Iteration 5: log likelihood = Iteration 6: log likelihood = Iteration 7: log likelihood = Logistic regression Number of obs = LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = logisticcc~l Coef. Std. Err. z P> z [95% Conf. Interval] income 4.52e e e e07 kids educ age black hispanic other _cons Note: 113 failures and 0 successes completely determined. Table 2: Linear Regression. regress log10ccbal income kids educ age black hispanic other if ccbal>0 Source SS df MS Number of obs = 8484 F( 7, 8476) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income 8.70e e e e09 kids educ age black hispanic other _cons Table 3: Interaction Terms Table 3.1. regress logccbal1 income kids educ age black hispanic other income_black if c > cbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income 9.12e e e e09 kids educ age black hispanic other income_black 1.80e e e e06 _cons
9 Table 3.2. gen income_hispanic = income*hispanic. regress logccbal1 income kids educ age black hispanic other income_hispanic i > f ccbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income 9.14e e e e09 kids educ age black hispanic other income_his~c 3.48e e e e07 _cons Table 3.3. regress logccbal1 income kids educ age black hispanic other income_other if c > cbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income 8.24e e e e09 kids educ age black hispanic other income_other 1.06e e e e07 _cons Table 3.4 Interaction Term: kids_black regress log10ccbal income kids educ age black hispanic other kids_black if ccb > al>0 Source SS df MS Number of obs = F( 8, 8475) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE =.71862
10 log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income 8.64e e e e09 kids educ age black hispanic other kids_black _cons Table 3.5 Interaction Term: kids_hispanic regress log10ccbal income kids educ age black hispanic other kids_hispanic if > ccbal>0 Source SS df MS Number of obs = F( 8, 8475) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income 8.69e e e e09 kids educ age black hispanic other kids_hispa~c _cons
11 Table 3.6 Interaction Term: kids_other regress log10ccbal income kids educ age black hispanic other kids_other if cc > bal>0 Source SS df MS Number of obs = F( 8, 8475) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income 8.71e e e e09 kids educ age black hispanic other kids_other _cons
12 Table 3.7 Table 3.8
13 Table 3.9 Table gen age_black = age*black. regress logccbal1 income kids educ age black hispanic other age_black if ccba > l>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income 8.28e e e e09 kids educ age black hispanic other age_black _cons
14 Table gen age_hispanic = age*hispanic. regress logccbal1 income kids educ age black hispanic other age_hispanic if c > cbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income 8.59e e e e09 kids educ age black hispanic other age_hispanic _cons Table gen age_other = age*other. regress logccbal1 income kids educ age black hispanic other age_other if ccba > l>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual Rsquared = Adj Rsquared = Total Root MSE =.7203 logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income 8.72e e e e09 kids educ age black hispanic other age_other _cons
15 42 Residuals Density Diagnostic Graph 1: Histogram of Residuals Residuals Graph 2: Scatter Plot of Residuals vs. Fitted Values Fitted values
16 Table 4: Normality test Stata output:. swilk logccbal1 ShapiroWilk W test for normal data Variable Obs W V z Prob>z logccbal sfrancia logccbal1 ShapiroFrancia W' test for normal data Variable Obs W' V' z Prob>z logccbal Table 5: Test for multicollinearity among explanatory variables. corr income kids educ age (obs=22090) income kids educ age income kids educ age References: Federal Reserve. (2009) Survey of Consumer Finances. [Data file]. Retrieved from StataCorp (2010). StataSE (Version 11) [Computer software]. College Station, TX: StataCorp LP.
The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)
The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader) Abstract This project measures the effects of various baseball statistics on the win percentage of all the teams in MLB. Data
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationCollege Education Matters for Happier Marriages and Higher Salaries Evidence from State Level Data in the US
College Education Matters for Happier Marriages and Higher Salaries Evidence from State Level Data in the US Anonymous Authors: SH, AL, YM Contact TF: Kevin Rader Abstract It is a general consensus
More informationMarginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
More informationRegression in Stata. Alicia Doyle Lynch HarvardMIT Data Center (HMDC)
Regression in Stata Alicia Doyle Lynch HarvardMIT Data Center (HMDC) Documents for Today Find class materials at: http://libraries.mit.edu/guides/subjects/data/ training/workshops.html Several formats
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is Rsquared? Rsquared Published in Agricultural Economics 0.45 Best article of the
More informationDETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10
More informationDiscussion Section 4 ECON 139/239 2010 Summer Term II
Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationREGRESSION LINES IN STATA
REGRESSION LINES IN STATA THOMAS ELLIOTT 1. Introduction to Regression Regression analysis is about eploring linear relationships between a dependent variable and one or more independent variables. Regression
More informationRegression Analysis. Data Calculations Output
Regression Analysis In an attempt to find answers to questions such as those posed above, empirical labour economists use a useful tool called regression analysis. Regression analysis is essentially a
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationCorrelation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More informationMODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING
Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More informationInteraction effects between continuous variables (Optional)
Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA Email: peverso1@swarthmore.edu 1. Introduction
More informationFailure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
More informationLectures 8, 9 & 10. Multiple Regression Analysis
Lectures 8, 9 & 0. Multiple Regression Analysis In which you learn how to apply the principles and tests outlined in earlier lectures to more realistic models involving more than explanatory variable and
More informationMODELING AUTO INSURANCE PREMIUMS
MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies
More informationUsing Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, Last revised March 28, 2015
Using Stata 11 & higher for Logistic Regression Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised March 28, 2015 NOTE: The routines spost13, lrdrop1, and extremes are
More informationMulticollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015
Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,
More informationInteraction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationDepartment of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)
Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More informationLecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
More informationLecture 10: Logistical Regression II Multinomial Data. Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II
Lecture 10: Logistical Regression II Multinomial Data Prof. Sharyn O Halloran Sustainable Development U9611 Econometrics II Logit vs. Probit Review Use with a dichotomous dependent variable Need a link
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More informationESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationIntroduction to Stata
Introduction to Stata September 23, 2014 Stata is one of a few statistical analysis programs that social scientists use. Stata is in the midrange of how easy it is to use. Other options include SPSS,
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationStata Walkthrough 4: Regression, Prediction, and Forecasting
Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25yearold nephew, who is dating a 35yearold woman. God, I can t see them getting
More informationLinear Regression Models with Logarithmic Transformations
Linear Regression Models with Logarithmic Transformations Kenneth Benoit Methodology Institute London School of Economics kbenoit@lse.ac.uk March 17, 2011 1 Logarithmic transformations of variables Considering
More informationData Analysis Methodology 1
Data Analysis Methodology 1 Suppose you inherited the database in Table 1.1 and needed to find out what could be learned from it fast. Say your boss entered your office and said, Here s some software project
More informationis paramount in advancing any economy. For developed countries such as
Introduction The provision of appropriate incentives to attract workers to the health industry is paramount in advancing any economy. For developed countries such as Australia, the increasing demand for
More informationRegression stepbystep using Microsoft Excel
Step 1: Regression stepbystep using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationRockefeller College University at Albany
Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.
More informationThe average hotel manager recognizes the criticality of forecasting. However, most
Introduction The average hotel manager recognizes the criticality of forecasting. However, most managers are either frustrated by complex models researchers constructed or appalled by the amount of time
More informationQuantitative Methods for Economics Tutorial 9. Katherine Eyal
Quantitative Methods for Economics Tutorial 9 Katherine Eyal TUTORIAL 9 4 October 2010 ECO3021S Part A: Problems 1. In Problem 2 of Tutorial 7, we estimated the equation ŝleep = 3, 638.25 0.148 totwrk
More informationEcon 371 Problem Set #3 Answer Sheet
Econ 371 Problem Set #3 Answer Sheet 4.3 In this question, you are told that a OLS regression analysis of average weekly earnings yields the following estimated model. AW E = 696.7 + 9.6 Age, R 2 = 0.023,
More informationHURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
More informationA Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector
Journal of Modern Accounting and Auditing, ISSN 15486583 November 2013, Vol. 9, No. 11, 15191525 D DAVID PUBLISHING A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationc 2015, Jeffrey S. Simonoff 1
Modeling Lowe s sales Forecasting sales is obviously of crucial importance to businesses. Revenue streams are random, of course, but in some industries general economic factors would be expected to have
More informationTitle. Syntax. stata.com. fp Fractional polynomial regression. Estimation
Title stata.com fp Fractional polynomial regression Syntax Menu Description Options for fp Options for fp generate Remarks and examples Stored results Methods and formulas Acknowledgment References Also
More informationUsing Minitab for Regression Analysis: An extended example
Using Minitab for Regression Analysis: An extended example The following example uses data from another text on fertilizer application and crop yield, and is intended to show how Minitab can be used to
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationGETTING STARTED: STATA & R BASIC COMMANDS ECONOMETRICS II. Stata Output Regression of wages on education
GETTING STARTED: STATA & R BASIC COMMANDS ECONOMETRICS II Stata Output Regression of wages on education. sum wage educ Variable Obs Mean Std. Dev. Min Max +
More informationRegression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.
Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between
More informationAn Analysis of the Undergraduate Tuition Increases at the University of Minnesota Duluth
Proceedings of the National Conference On Undergraduate Research (NCUR) 2012 Weber State University March 2931, 2012 An Analysis of the Undergraduate Tuition Increases at the University of Minnesota Duluth
More informationQuick Stata Guide by Liz Foster
by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the
More informationTesting for serial correlation in linear paneldata models
The Stata Journal (2003) 3, Number 2, pp. 168 177 Testing for serial correlation in linear paneldata models David M. Drukker Stata Corporation Abstract. Because serial correlation in linear paneldata
More informationDetermining Factors of a Quick Sale in Arlington's Condo Market. Team 2: Darik Gossa Roger Moncarz Jeff Robinson Chris Frohlich James Haas
Determining Factors of a Quick Sale in Arlington's Condo Market Team 2: Darik Gossa Roger Moncarz Jeff Robinson Chris Frohlich James Haas Executive Summary The real estate market for condominiums in Northern
More informationVOL. 4, NO. 4, September 2015 ISSN 23072466 International Journal of Economics, Finance and Management 20112015. All rights reserved.
Credit Information Sharing and its Impact on Access to Bank Credit across Income Bracket Groupings Baah Aye Kusi, Kwadjo AnsahAdu University of Ghana Business School, Department of Finance, Ghana Valley
More informationAddressing Alternative. Multiple Regression. 17.871 Spring 2012
Addressing Alternative Explanations: Multiple Regression 17.871 Spring 2012 1 Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election? Treatment is not liking Bill Clinton 2 Bivariate
More informationLab 5 Linear Regression with Withinsubject Correlation. Goals: Data: Use the pig data which is in wide format:
Lab 5 Linear Regression with Withinsubject Correlation Goals: Data: Fit linear regression models that account for withinsubject correlation using Stata. Compare weighted least square, GEE, and random
More informationQuantitative Methods for Economics Tutorial 12. Katherine Eyal
Quantitative Methods for Economics Tutorial 12 Katherine Eyal TUTORIAL 12 25 October 2010 ECO3021S Part A: Problems 1. State with brief reason whether the following statements are true, false or uncertain:
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationYiming Peng, Department of Statistics. February 12, 2013
Regression Analysis Using JMP Yiming Peng, Department of Statistics February 12, 2013 2 Presentation and Data http://www.lisa.stat.vt.edu Short Courses Regression Analysis Using JMP Download Data to Desktop
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationS TAT E P LA N N IN G OR G A N IZAT IO N
S TAT E P LA N N IN G OR G A N IZAT IO N D G FOR REGIO N A L D E V E LO P MENT A N D STRUCTUR AL A DJ USTMENT W O RKING PA PER AN ECONOMETRIC ANALYSIS OF SURVEY STUDY ON BILKENT CYBERPARK AND BATI AKDENIZ
More informationCHAPTER 5. Exercise Solutions
CHAPTER 5 Exercise Solutions 91 Chapter 5, Exercise Solutions, Principles of Econometrics, e 9 EXERCISE 5.1 (a) y = 1, x =, x = x * * i x i 1 1 1 1 1 1 1 1 1 1 1 1 1 1 y * i (b) (c) yx = 1, x = 16, yx
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationBRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS
BRIEF OVERVIEW ON INTERPRETING COUNT MODEL RISK RATIOS An Addendum to Negative Binomial Regression Cambridge University Press (2007) Joseph M. Hilbe 2008, All Rights Reserved This short monograph is intended
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS OneSample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationKSTAT MINIMANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINIMANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationHow to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
More informationFrom this it is not clear what sort of variable that insure is so list the first 10 observations.
MNL in Stata We have data on the type of health insurance available to 616 psychologically depressed subjects in the United States (Tarlov et al. 1989, JAMA; Wells et al. 1989, JAMA). The insurance is
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationFrom the help desk: hurdle models
The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationFIRM SPECIFIC FACTORS THAT DETERMINE INSURANCE COMPANIES PERFORMANCE IN ETHIOPIA
FIRM SPECIFIC FACTORS THAT DETERMINE INSURANCE COMPANIES PERFORMANCE IN ETHIOPIA Daniel Mehari, MSc Arba Minch University, Arba Minch, Ethiopia Tilahun Aemiro, Msc Bahir Dar University, Bahir Dar, Ethiopia
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationSydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1. 1. Introduction p. 2. 2. Statistical Methods Used p. 5. 3. 10 and under Males p.
Sydney Roberts Predicting Age Group Swimmers 50 Freestyle Time 1 Table of Contents 1. Introduction p. 2 2. Statistical Methods Used p. 5 3. 10 and under Males p. 8 4. 11 and up Males p. 10 5. 10 and under
More informationChapter 18. Effect modification and interactions. 18.1 Modeling effect modification
Chapter 18 Effect modification and interactions 18.1 Modeling effect modification weight 40 50 60 70 80 90 100 male female 40 50 60 70 80 90 100 male female 30 40 50 70 dose 30 40 50 70 dose Figure 18.1:
More informationMEASURING THE INVENTORY TURNOVER IN DISTRIBUTIVE TRADE
MEASURING THE INVENTORY TURNOVER IN DISTRIBUTIVE TRADE Marijan Karić, Ph.D. Josip Juraj Strossmayer University of Osijek Faculty of Economics in Osijek Gajev trg 7, 31000 Osijek, Croatia Phone: +385 31
More informationDeveloping Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationxtmixed & denominator degrees of freedom: myth or magic
xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or
More informationA Predictive Model for NFL Rookie Quarterback Fantasy Football Points
A Predictive Model for NFL Rookie Quarterback Fantasy Football Points Steve Bronder and Alex Polinsky Duquesne University Economics Department Abstract This analysis designs a model that predicts NFL rookie
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationImplied Volatility Skews in the Foreign Exchange Market. Empirical Evidence from JPY and GBP: 19972002
Implied Volatility Skews in the Foreign Exchange Market Empirical Evidence from JPY and GBP: 19972002 The Leonard N. Stern School of Business Glucksman Institute for Research in Securities Markets Faculty
More informationEARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 200708
EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 200708 PURPOSE Matthew Wetstein, Alyssa Nguyen & Brianna Hays The purpose of the present study was to identify specific
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
More informationDEMOGRAPHICS OF PAYDAY LENDING IN OKLAHOMA
DEMOGRAPHICS OF PAYDAY LENDING IN OKLAHOMA Haydar Kurban, PhD Adji Fatou Diagne HOWARD UNIVERSITY CENTER ON RACE AND WEALTH 1840 7th street NW Washington DC, 20001 TABLE OF CONTENTS 1. Executive Summary
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationOutline. Topic 4  Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4  Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test  Fall 2013 R 2 and the coefficient of correlation
More informationModule 14: Missing Data Stata Practical
Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189250103 and MRC grant G0900724
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationApplied Regression Analysis Using STATA
Applied Regression Analysis Using STATA Josef Brüderl Regression analysis is the statistical method most often used in social research. The reason is that most social researchers are interested in identifying
More informationEconometrics II. Lecture 9: Sample Selection Bias
Econometrics II Lecture 9: Sample Selection Bias Måns Söderbom 5 May 2011 Department of Economics, University of Gothenburg. Email: mans.soderbom@economics.gu.se. Web: www.economics.gu.se/soderbom, www.soderbom.net.
More informationSIMPLE LINEAR CORRELATION. r can range from 1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.
SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation
More information