Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY
|
|
- Mitchell Greene
- 8 years ago
- Views:
Transcription
1 Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship between race and the existence and amount of credit card debt, along with the associations between debt and secondary variables such as income, age, education, and kids. The analysis was split into three stages by first conducting a logistical regression on data obtained from a survey conducted by the Federal Reserve, then performing a logtransformed multiple linear regression on the data, and finally incorporating interaction terms into the multiple linear regression. According to the logistical regression, income and age were negatively correlated with taking on credit card debt, while variables kids and education were positively associated with taking on debt; racial categories African-American and Hispanic were not statistically significant and the racial category Other was less likely to take credit card debt compared to whites with all else held constant. The log-transformed, multiple linear regression demonstrated that kids, education, and age are positively associated with the amount of credit card debt. The racial category for blacks was negatively correlated with amount of credit card debt compared to whites while Hispanic and Other were not statistically significant. Considering interaction terms, an increase in income had a greater effect (positive coefficient) for blacks than it would for whites, and a similar relationship was observed for Hispanics. However, Other has a negative coefficient, demonstrating the effect of income on credit card balance for Other is less than the effect for the racial category white. This study may ultimately expose insights as to how race can be correlated with patterns of credit card spending and lead to future research into the nuances of immigrant versus domestic categories of the same race. INTRODUCTION: This study involved an investigation of how race and other factors may impact credit-card debt and if applicable, the extent of the accumulated debt. The purpose of the initial, logistical regression run in our two-part study was to determine the relationship between race as the explanatory variable and the existence of credit card debt. To control for potential confounding variables that may distort this association, income, number of kids, level of education, and age were also included as explanatory variables and their relationship with the existence of credit card debt was evaluated. The hypothesis presented for the first part of this report predicted a negative association between income and credit card balance; that is, a higher level of income would be correlated with a lower probability of having credit card debt. Number of kids was predicted to be positively associated with the existence of credit card debt, while education and age are predicted to have a negative association with the response variable. The second stage of the study was conducted in order to determine the association between the same set of explanatory variables and the amount of credit card debt accumulated given that an unpaid balance existed. Similar to the first part of the study, the primary relationship explored was the potential effect of race on the quantity of credit card debt. The hypotheses for this stage predicted a negative association between income and credit card balance; that is, a higher level of income would be correlated with a lower credit card balance. Number of kids was predicted to be positively associated with the amount of debt, while education and age were predicted to have negative
2 association with amount of debt. The possible interactions that may exist between explanatory variables were also explored within this report. The analysis conducted within this report would likely be of great interest to credit card companies in terms of evaluating the likelihood that their customers might accrue unpaid credit card balances. Although race was the principal factor considered, the other explanatory variables included may also provide insight into how credit card debt may be determined. Given the recent financial crisis and the lingering impact on consumers, the hope is that this report may shed light on the various factors associated with the existence and accumulation of credit card debt. METHODS: The data used in this report was obtained from the Federal Reserve s 2007 Survey of Consumer Finances in which 4,418 families were surveyed. The response variable ccbal represents credit card balance in dollars and five explanatory variables were examined. The primary explanatory variable of interest, race, was divided into subcategories of white, African-American/black, Hispanic, and Other (the racial category Asian was incorporated into Other in the original data set). Income was defined as the total amount of income of the household in dollars, while the variable kids represented the total number of children in the household. The variable titled EDUC indicated the total number of years of education completed by the head of the household. Finally, the explanatory variable age represented the age of the head of household. Although the survey data included many other explanatory variables, the aforementioned predictors were chosen based on their relevance to daily spending and interests of the authors of this report. Statistical analysis was conducted on the data compilation using the statistical package StataSE. A logistic regression was run in order to investigate which of the variables accounted for an individual s likelihood to take on credit card debt, a binary response variable (the subcategories of race were incorporated into the model as dummy variables). Next, a multiple linear regression was run to determine how the predictor variables affected the amount of credit card debt incurred, given an individual had credit card debt, which was executed by setting the parameter ccal>0. Since the survey data was right-skewed, a log transformation was performed on ccbal to minimize the scale during the multiple linear regression. The criterion for statistical significance of variables for both models was p<0.05. Possible interactions between predictor variables were also addressed. Since the effect of race was the primary concern of the report, twelve regressions were run that each addressed an interaction between either African-American, Hispanic, or Other (white being the baseline for all comparison) and education, age, kids, or income. Several diagnostics tests were conducted to evaluate the regression models. Heteroskedasticity was tested for using the Breusch-Pagan/Cook-Weisberg test given ccbal>0. The Shapiro-Francia normality test and Wilkes normality test were then executed to verify normality. RESULTS: Part 1: Logistic Regression The first stage of this study used a logistic regression in order to determine who might take on credit card debt (Y=1) and who would not (Y=0).
3 From Table 1, the logistic regression model is as follows: P(Y=1) = exp( e-07*income *kids *education *age *black *Hispanic *other) / [1 + exp( e-07*income *kids *education *age *black *Hispanic *other)] Race Black Significance P-value = 0.394, not significant. Hispanic P-value = 0.223, not significant. Other P-value = 0.004, significant. The variables black and Hispanic were not statistically significant in the model, as the p-values for their coefficients are above This demonstrates that blacks and Hispanics are not any more likely to take on credit card debt than whites, holding all other x variables constant. For the racial category Other (Asians, etc.), the variable was found to be significant in the model and the coefficient was negative, demonstrating that this racial category would be less likely to take on debt than whites (and blacks and Hispanics), holding all else constant. As expected, the coefficient for income was negative, demonstrating that an increase in income is associated with being less likely to take on debt, with all else constant. Also as expected, the coefficient for kids was positive and the coefficient for age was negative, demonstrating that as the number of kids increases, the probability of taking on credit card debt increases (holding all else constant), and as age increases, the probability of taking on credit card debt decreases (controlling for all other variables). The positive coefficient for education did not correspond to the original hypothesis that an increase in education would be related to a lower likelihood of debt; instead, an increase in years of education is associated with a greater likelihood of debt. Part 2: Linear Regression Due to the right-skewed nature of the response variable, credit card balance (ccbal), the y variable was transformed using a logarithmic (base 10) transformation. A multiple regression was run with log10ccbal and the five x variables: race, income, number of kids, years of education, and age. The results are presented in Table 1. The estimated regression model from Table 2 is as follows: Log10ccbal = e-09*income *kids *education *age *black *Hispanic *other In this linear regression model, the variables kids, education, age, and racial subcategory black were significant factors with p value <.01, whereas income (p=0.136), Hispanic (p=0.518) and Other (p=0.977) are insignificant. Among the significant predictors, kids, education and age are positively correlated with credit card balance, whereas black is negatively correlated. For every child a white family has, credit card balance will be multiplied by 10^0.05 = 1.12, holding all other factors constant. For every 1 year
4 increase in years of education, credit card balance will experience a multiplicative increase by 10^(.151) = For every 1 year increase in years of age, white has a 10^(.00386) = 1.01 multiplicative increase in debt, all else being the same, and blacks take on 10^(0.196) = 1.57 times less debt than whites, given that all other factors are the same. Part 3: Interaction Terms A. Interaction between Income and Race Race Black Significance P < 0.001, significant Hispanic P = 0.019, significant Other P < 0.001, significant Regression formula from Table 3.1 [See Appendix]: log10ccbal = e-09*income *kids *edu *age -.307*black *Hispanic *other e-06*income_black. Regression formula from Table 3.2 [See Appendix]: log10ccbal = e-09*income *kids *edu *age -.189*black *Hispanic *other e-07*income_hispanic. The interaction term income_black has a positive coefficient and is significant with p-value less than 0.001, showing that an increase in income for blacks increases credit card balance by a slight but statistically significant multiplicative value compared to whites, all else constant. There is a similar result for income_hispanic as income_black. Regression formula from Table 3.3 [See Appendix]: log10ccbal = e-09*income *kids *edu *age -.187*black *Hispanic +.144*other e-06*income_other. The interaction term income_other has a negative coefficient and is significant with p-value less than 0.001, showing that an increase in income for racial category Other decreases credit card balance by a slight but statistically significant multiplicative value compared to whites, with all else constant. B. Interaction between Kids and Race Race Black Significance P < 0.001, significant Hispanic P = 0.4, not significant Other P = 0.25, not significant
5 The interaction term kids_black has a negative coefficient and is significant with p-value less than 0.001, showing that an additional child for blacks decreases credit card balance by a multiple of 10^-.137 =0.729 compared to whites all else constant. The effects for Hispanic and Other are not significant, demonstrating that the effects of kids on credit card balance for these groups is about the same as whites. C. Interaction between Education and Race: Race Black Significance P = 0.005, significant Hispanic P < 0.001, significant Other P = 0.731, not significant The interaction term education_black has a negative coefficient and is significant showing that an additional year of education for blacks decreases credit card balance by a multiple of 10^ =.935 compared to whites, with all else constant. The effect of Hispanic is about the same as the effect just discussed for Blacks. The effect Other is not significant, demonstrating that the effects of education on credit card balance for Other is about the same as whites. D. Interaction between Age and Race: Race Black Significance P < 0.001, significant Hispanic P =0.011, significant Other P = 0.971, not significant The interaction term age_black has a coefficient and is significant showing that an additional year of age for blacks increases credit card balance by a multiple of 10^.00871= 1.02 compared to whites, with all else constant. The effect of Hispanic is about the same as the effect just discussed for Blacks. The effect of Other is not significant, demonstrating that the effect of age on credit card balance for Others is about the same as whites. CONCLUSION/DISCUSSION In this study, a logarithmic transformation was originally performed on the data set obtained from the Federal Reserve since the data was very right-skewed. Furthermore, since the validity of the model may be affected by the normality of values, two normality tests were performed to see if the residuals were normally distributed (see Table 4 in Appendix). The Shapiro-Francia normality test and Wilkes normality test yielded two different results (See Appendix), and the histogram (Graph 1 in Appendix) shows that the residuals are not normally distributed. This may be due to the huge sample size and the exclusion of other factors not included in this model, such as personal preference. When
6 considering the predictive power of the current models, these diagnostic results should be taken into account. The model also demonstrates that the residuals have constant variance according to the Breusch- Pagan / Cook-Weisberg test (hettest) for heteroskedasticity, which had a p-value for the multiple linear regression. Since this p value>0.05, the null hypothesis of the heteroskedastic test that there is constant variance fails to be rejected. Therefore, the multiple linear regression has constant variance and is not heteroskedastic. The scatter plot of residuals against fitted values looks normal except for an abnormal straight line. Independence among the samples is implied in the model. Multicollinearity was tested for by correlating the explanatory variables against each other (Table 5 for Appendix). The largest correlation value amongst the explanatory variables is between age and kids, which is and falls below the 0.5 cutoff for very significant multicollinearity. Hence, it can be said that though there are some collinear relationships among the explanatory variables, they are not significant enough to severely affect our model. In the linear regression model, income is surprisingly not significant. This may be due to collinearity that may exist among the explanatory variables. For example, if income can be partially explained by education and age, then it may lose its explanatory power for credit card balance. According to the model, kids, education and age have positive correlation with the response variable. One explanation may be that as families have more children, consumption increases and they take on more credit card debt. Contradicting the original hypotheses, the more years of education and older one gets, the more outstanding credit card debt one has (although age had a negative association with the existence of debt, it had a positive association with the accumulation of credit card debt). This may be due to the fact that educated individuals may rely on higher, consistent income sources, so they have a greater ability to pay back their debt in the future, and consume more as a result. Furthermore, financial concerns may increase with age as people being to pay for cars, materials for their children and for house mortgages. These factors may all contribute to credit card debt. One interesting observation in this model is that, while Hispanics and Others (Asians) are not significant, black is significantly negatively correlated with credit card debt. This might indicate that Others and Hispanics consumption patterns are roughly the same compared to white if they take on debt, whereas blacks take on less debt. Given the negative coefficients for the interaction terms, this would seem to indicate that blacks spend less on kids and education, so they have less outstanding credit card debt. A logistical regression was conducted to investigate how each explanatory variable contributes to explaining the chance of taking on credit card debt. In this model, Other becomes a significant factor along with income, kids, education and age; whereas black and Hispanic continue to be insignificant. One interesting observation made is that an Asian family is less likely to take on debt compared to a white family when all other factors are held constant. Other races have roughly the same chance of taking on debt as white families. The addition of an interaction term between income and race in the linear regression model indicates that there are positive correlations between income_hispanics and debt and income_black and debt, while there is a negative correlation between income_other and debt. One likely explanation for the aforementioned observations may be cultural differences. For instance, the regressions performed would lend themselves to the broad interpretation that Asians tend to save money (relative to other races) and use cash or debit cards instead of credit cards, whereas other racial groups may be more comfortable with using credit cards or spending ahead of time. As their income increases, Asians are more likely to save and spend less
7 compared to whites, while Hispanics and blacks tend to spend more comparatively. Therefore, this model mainly shows the differences in consumption pattern and credit card debt holding between the racial category Other and the various races Hispanic, black, and white considered. One possible option for future studies could be comparing Asian immigrants and American-born Asians credit card debt holding behaviors. The current study would indicate that Asians are less likely to take on debt, but if they do, they take on roughly the same debt as Whites. Further research could be done to see if Asian immigrants tend not to take on any credit card debt, while Americanborn Asians share the similar debt-holding behaviors with other racial groups in the U.S.
8 Appendix: Table 1: Logistic Regression. logit logisticccbal income kids educ age black hispanic other Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Iteration 5: log likelihood = Iteration 6: log likelihood = Iteration 7: log likelihood = Logistic regression Number of obs = LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = logisticcc~l Coef. Std. Err. z P> z [95% Conf. Interval] income -4.52e e e e-07 kids educ age black hispanic other _cons Note: 113 failures and 0 successes completely determined. Table 2: Linear Regression. regress log10ccbal income kids educ age black hispanic other if ccbal>0 Source SS df MS Number of obs = 8484 F( 7, 8476) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income -8.70e e e e-09 kids educ age black hispanic other _cons Table 3: Interaction Terms Table 3.1. regress logccbal1 income kids educ age black hispanic other income_black if c > cbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -9.12e e e e-09 kids educ age black hispanic other income_black 1.80e e e e-06 _cons
9 Table 3.2. gen income_hispanic = income*hispanic. regress logccbal1 income kids educ age black hispanic other income_hispanic i > f ccbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -9.14e e e e-09 kids educ age black hispanic other income_his~c 3.48e e e e-07 _cons Table 3.3. regress logccbal1 income kids educ age black hispanic other income_other if c > cbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -8.24e e e e-09 kids educ age black hispanic other income_other -1.06e e e e-07 _cons Table 3.4 Interaction Term: kids_black regress log10ccbal income kids educ age black hispanic other kids_black if ccb > al>0 Source SS df MS Number of obs = F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.71862
10 log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income -8.64e e e e-09 kids educ age black hispanic other kids_black _cons Table 3.5 Interaction Term: kids_hispanic regress log10ccbal income kids educ age black hispanic other kids_hispanic if > ccbal>0 Source SS df MS Number of obs = F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income -8.69e e e e-09 kids educ age black hispanic other kids_hispa~c _cons
11 Table 3.6 Interaction Term: kids_other regress log10ccbal income kids educ age black hispanic other kids_other if cc > bal>0 Source SS df MS Number of obs = F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income -8.71e e e e-09 kids educ age black hispanic other kids_other _cons
12 Table 3.7 Table 3.8
13 Table 3.9 Table gen age_black = age*black. regress logccbal1 income kids educ age black hispanic other age_black if ccba > l>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -8.28e e e e-09 kids educ age black hispanic other age_black _cons
14 Table gen age_hispanic = age*hispanic. regress logccbal1 income kids educ age black hispanic other age_hispanic if c > cbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -8.59e e e e-09 kids educ age black hispanic other age_hispanic _cons Table gen age_other = age*other. regress logccbal1 income kids educ age black hispanic other age_other if ccba > l>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.7203 logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -8.72e e e e-09 kids educ age black hispanic other age_other _cons
15 -4-2 Residuals Density Diagnostic Graph 1: Histogram of Residuals Residuals Graph 2: Scatter Plot of Residuals vs. Fitted Values Fitted values
16 Table 4: Normality test Stata output:. swilk logccbal1 Shapiro-Wilk W test for normal data Variable Obs W V z Prob>z logccbal sfrancia logccbal1 Shapiro-Francia W' test for normal data Variable Obs W' V' z Prob>z logccbal Table 5: Test for multi-collinearity among explanatory variables. corr income kids educ age (obs=22090) income kids educ age income kids educ age References: Federal Reserve. (2009) Survey of Consumer Finances. [Data file]. Retrieved from StataCorp (2010). StataSE (Version 11) [Computer software]. College Station, TX: StataCorp LP.
The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader)
The Numbers Behind the MLB Anonymous Students: AD, CD, BM; (TF: Kevin Rader) Abstract This project measures the effects of various baseball statistics on the win percentage of all the teams in MLB. Data
More informationMULTIPLE REGRESSION EXAMPLE
MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if
More informationMarginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015
Marginal Effects for Continuous Variables Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 21, 2015 References: Long 1997, Long and Freese 2003 & 2006 & 2014,
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
More informationDiscussion Section 4 ECON 139/239 2010 Summer Term II
Discussion Section 4 ECON 139/239 2010 Summer Term II 1. Let s use the CollegeDistance.csv data again. (a) An education advocacy group argues that, on average, a person s educational attainment would increase
More informationDETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS
DETERMINANTS OF CAPITAL ADEQUACY RATIO IN SELECTED BOSNIAN BANKS Nađa DRECA International University of Sarajevo nadja.dreca@students.ius.edu.ba Abstract The analysis of a data set of observation for 10
More informationPlease follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software
STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used
More informationCorrelation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
More informationAugust 2012 EXAMINATIONS Solution Part I
August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,
More informationMODEL I: DRINK REGRESSED ON GPA & MALE, WITHOUT CENTERING
Interpreting Interaction Effects; Interaction Effects and Centering Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Models with interaction effects
More informationRegression Analysis: A Complete Example
Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty
More informationECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2
University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages
More informationMultinomial and Ordinal Logistic Regression
Multinomial and Ordinal Logistic Regression ME104: Linear Regression Analysis Kenneth Benoit August 22, 2012 Regression with categorical dependent variables When the dependent variable is categorical,
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationMODELING AUTO INSURANCE PREMIUMS
MODELING AUTO INSURANCE PREMIUMS Brittany Parahus, Siena College INTRODUCTION The findings in this paper will provide the reader with a basic knowledge and understanding of how Auto Insurance Companies
More informationInteraction effects between continuous variables (Optional)
Interaction effects between continuous variables (Optional) Richard Williams, University of Notre Dame, http://www.nd.edu/~rwilliam/ Last revised February 0, 05 This is a very brief overview of this somewhat
More informationFailure to take the sampling scheme into account can lead to inaccurate point estimates and/or flawed estimates of the standard errors.
Analyzing Complex Survey Data: Some key issues to be aware of Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 24, 2015 Rather than repeat material that is
More informationMulticollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015
Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,
More informationInternational Statistical Institute, 56th Session, 2007: Phil Everson
Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction
More informationDepartment of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)
Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation
More informationInteraction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015
Interaction effects and group comparisons Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised February 20, 2015 Note: This handout assumes you understand factor variables,
More informationNonlinear Regression Functions. SW Ch 8 1/54/
Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics
ESTIMATING AVERAGE TREATMENT EFFECTS: IV AND CONTROL FUNCTIONS, II Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Quantile Treatment Effects 2. Control Functions
More informationLecture 15. Endogeneity & Instrumental Variable Estimation
Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental
More informationHandling missing data in Stata a whirlwind tour
Handling missing data in Stata a whirlwind tour 2012 Italian Stata Users Group Meeting Jonathan Bartlett www.missingdata.org.uk 20th September 2012 1/55 Outline The problem of missing data and a principled
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationGeneralized Linear Models
Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the
More informationStata Walkthrough 4: Regression, Prediction, and Forecasting
Stata Walkthrough 4: Regression, Prediction, and Forecasting Over drinks the other evening, my neighbor told me about his 25-year-old nephew, who is dating a 35-year-old woman. God, I can t see them getting
More informationis paramount in advancing any economy. For developed countries such as
Introduction The provision of appropriate incentives to attract workers to the health industry is paramount in advancing any economy. For developed countries such as Australia, the increasing demand for
More informationLinear Regression Models with Logarithmic Transformations
Linear Regression Models with Logarithmic Transformations Kenneth Benoit Methodology Institute London School of Economics kbenoit@lse.ac.uk March 17, 2011 1 Logarithmic transformations of variables Considering
More informationThe average hotel manager recognizes the criticality of forecasting. However, most
Introduction The average hotel manager recognizes the criticality of forecasting. However, most managers are either frustrated by complex models researchers constructed or appalled by the amount of time
More informationData Analysis Methodology 1
Data Analysis Methodology 1 Suppose you inherited the database in Table 1.1 and needed to find out what could be learned from it fast. Say your boss entered your office and said, Here s some software project
More informationRockefeller College University at Albany
Rockefeller College University at Albany PAD 705 Handout: Hypothesis Testing on Multiple Parameters In many cases we may wish to know whether two or more variables are jointly significant in a regression.
More informationA Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing Sector
Journal of Modern Accounting and Auditing, ISSN 1548-6583 November 2013, Vol. 9, No. 11, 1519-1525 D DAVID PUBLISHING A Panel Data Analysis of Corporate Attributes and Stock Prices for Indian Manufacturing
More informationTitle. Syntax. stata.com. fp Fractional polynomial regression. Estimation
Title stata.com fp Fractional polynomial regression Syntax Menu Description Options for fp Options for fp generate Remarks and examples Stored results Methods and formulas Acknowledgment References Also
More informationHURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009
HURDLE AND SELECTION MODELS Jeff Wooldridge Michigan State University BGSE/IZA Course in Microeconometrics July 2009 1. Introduction 2. A General Formulation 3. Truncated Normal Hurdle Model 4. Lognormal
More informationRegression step-by-step using Microsoft Excel
Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationGLM I An Introduction to Generalized Linear Models
GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial
More informationAn Analysis of the Undergraduate Tuition Increases at the University of Minnesota Duluth
Proceedings of the National Conference On Undergraduate Research (NCUR) 2012 Weber State University March 29-31, 2012 An Analysis of the Undergraduate Tuition Increases at the University of Minnesota Duluth
More informationc 2015, Jeffrey S. Simonoff 1
Modeling Lowe s sales Forecasting sales is obviously of crucial importance to businesses. Revenue streams are random, of course, but in some industries general economic factors would be expected to have
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationQuick Stata Guide by Liz Foster
by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the
More informationLab 5 Linear Regression with Within-subject Correlation. Goals: Data: Use the pig data which is in wide format:
Lab 5 Linear Regression with Within-subject Correlation Goals: Data: Fit linear regression models that account for within-subject correlation using Stata. Compare weighted least square, GEE, and random
More informationAddressing Alternative. Multiple Regression. 17.871 Spring 2012
Addressing Alternative Explanations: Multiple Regression 17.871 Spring 2012 1 Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election? Treatment is not liking Bill Clinton 2 Bivariate
More informationSAS Software to Fit the Generalized Linear Model
SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling
More informationS TAT E P LA N N IN G OR G A N IZAT IO N
S TAT E P LA N N IN G OR G A N IZAT IO N D G FOR REGIO N A L D E V E LO P MENT A N D STRUCTUR AL A DJ USTMENT W O RKING PA PER AN ECONOMETRIC ANALYSIS OF SURVEY STUDY ON BILKENT CYBERPARK AND BATI AKDENIZ
More informationVOL. 4, NO. 4, September 2015 ISSN 2307-2466 International Journal of Economics, Finance and Management 2011-2015. All rights reserved.
Credit Information Sharing and its Impact on Access to Bank Credit across Income Bracket Groupings Baah Aye Kusi, Kwadjo Ansah-Adu University of Ghana Business School, Department of Finance, Ghana Valley
More informationKSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management
KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationMultiple Linear Regression
Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is
More informationFIRM SPECIFIC FACTORS THAT DETERMINE INSURANCE COMPANIES PERFORMANCE IN ETHIOPIA
FIRM SPECIFIC FACTORS THAT DETERMINE INSURANCE COMPANIES PERFORMANCE IN ETHIOPIA Daniel Mehari, MSc Arba Minch University, Arba Minch, Ethiopia Tilahun Aemiro, Msc Bahir Dar University, Bahir Dar, Ethiopia
More informationSUMAN DUVVURU STAT 567 PROJECT REPORT
SUMAN DUVVURU STAT 567 PROJECT REPORT SURVIVAL ANALYSIS OF HEROIN ADDICTS Background and introduction: Current illicit drug use among teens is continuing to increase in many countries around the world.
More informationChapter 18. Effect modification and interactions. 18.1 Modeling effect modification
Chapter 18 Effect modification and interactions 18.1 Modeling effect modification weight 40 50 60 70 80 90 100 male female 40 50 60 70 80 90 100 male female 30 40 50 70 dose 30 40 50 70 dose Figure 18.1:
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationBasic Statistical and Modeling Procedures Using SAS
Basic Statistical and Modeling Procedures Using SAS One-Sample Tests The statistical procedures illustrated in this handout use two datasets. The first, Pulse, has information collected in a classroom
More informationDeveloping Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@
Developing Risk Adjustment Techniques Using the SAS@ System for Assessing Health Care Quality in the lmsystem@ Yanchun Xu, Andrius Kubilius Joint Commission on Accreditation of Healthcare Organizations,
More informationDetermining Factors of a Quick Sale in Arlington's Condo Market. Team 2: Darik Gossa Roger Moncarz Jeff Robinson Chris Frohlich James Haas
Determining Factors of a Quick Sale in Arlington's Condo Market Team 2: Darik Gossa Roger Moncarz Jeff Robinson Chris Frohlich James Haas Executive Summary The real estate market for condominiums in Northern
More informationFrom this it is not clear what sort of variable that insure is so list the first 10 observations.
MNL in Stata We have data on the type of health insurance available to 616 psychologically depressed subjects in the United States (Tarlov et al. 1989, JAMA; Wells et al. 1989, JAMA). The insurance is
More informationxtmixed & denominator degrees of freedom: myth or magic
xtmixed & denominator degrees of freedom: myth or magic 2011 Chicago Stata Conference Phil Ender UCLA Statistical Consulting Group July 2011 Phil Ender xtmixed & denominator degrees of freedom: myth or
More informationMultiple Regression: What Is It?
Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in
More informationHow to set the main menu of STATA to default factory settings standards
University of Pretoria Data analysis for evaluation studies Examples in STATA version 11 List of data sets b1.dta (To be created by students in class) fp1.xls (To be provided to students) fp1.txt (To be
More informationMEASURING THE INVENTORY TURNOVER IN DISTRIBUTIVE TRADE
MEASURING THE INVENTORY TURNOVER IN DISTRIBUTIVE TRADE Marijan Karić, Ph.D. Josip Juraj Strossmayer University of Osijek Faculty of Economics in Osijek Gajev trg 7, 31000 Osijek, Croatia Phone: +385 31
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationFrom the help desk: hurdle models
The Stata Journal (2003) 3, Number 2, pp. 178 184 From the help desk: hurdle models Allen McDowell Stata Corporation Abstract. This article demonstrates that, although there is no command in Stata for
More informationModule 14: Missing Data Stata Practical
Module 14: Missing Data Stata Practical Jonathan Bartlett & James Carpenter London School of Hygiene & Tropical Medicine www.missingdata.org.uk Supported by ESRC grant RES 189-25-0103 and MRC grant G0900724
More informationApplied Regression Analysis Using STATA
Applied Regression Analysis Using STATA Josef Brüderl Regression analysis is the statistical method most often used in social research. The reason is that most social researchers are interested in identifying
More informationOutline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares
Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More informationSIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.
SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationPoisson Models for Count Data
Chapter 4 Poisson Models for Count Data In this chapter we study log-linear models for count data under the assumption of a Poisson error structure. These models have many applications, not only to the
More informationPanel Data Analysis Fixed and Random Effects using Stata (v. 4.2)
Panel Data Analysis Fixed and Random Effects using Stata (v. 4.2) Oscar Torres-Reyna otorres@princeton.edu December 2007 http://dss.princeton.edu/training/ Intro Panel data (also known as longitudinal
More informationSolución del Examen Tipo: 1
Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical
More informationA Predictive Model for NFL Rookie Quarterback Fantasy Football Points
A Predictive Model for NFL Rookie Quarterback Fantasy Football Points Steve Bronder and Alex Polinsky Duquesne University Economics Department Abstract This analysis designs a model that predicts NFL rookie
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationClass 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)
Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
More informationDEMOGRAPHICS OF PAYDAY LENDING IN OKLAHOMA
DEMOGRAPHICS OF PAYDAY LENDING IN OKLAHOMA Haydar Kurban, PhD Adji Fatou Diagne HOWARD UNIVERSITY CENTER ON RACE AND WEALTH 1840 7th street NW Washington DC, 20001 TABLE OF CONTENTS 1. Executive Summary
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationThe following postestimation commands for time series are available for regress:
Title stata.com regress postestimation time series Postestimation tools for regress with time series Description Syntax for estat archlm Options for estat archlm Syntax for estat bgodfrey Options for estat
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationUnderstanding Characteristics of Caravan Insurance Policy Buyer
Understanding Characteristics of Caravan Insurance Policy Buyer May 10, 2007 Group 5 Chih Hau Huang Masami Mabuchi Muthita Songchitruksa Nopakoon Visitrattakul Executive Summary This report is intended
More informationI n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s
I n d i a n a U n i v e r s i t y U n i v e r s i t y I n f o r m a t i o n T e c h n o l o g y S e r v i c e s Linear Regression Models for Panel Data Using SAS, Stata, LIMDEP, and SPSS * Hun Myoung Park,
More informationEARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 2007-08
EARLY VS. LATE ENROLLERS: DOES ENROLLMENT PROCRASTINATION AFFECT ACADEMIC SUCCESS? 2007-08 PURPOSE Matthew Wetstein, Alyssa Nguyen & Brianna Hays The purpose of the present study was to identify specific
More informationOverview Classes. 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7)
Overview Classes 12-3 Logistic regression (5) 19-3 Building and applying logistic regression (6) 26-3 Generalizations of logistic regression (7) 2-4 Loglinear models (8) 5-4 15-17 hrs; 5B02 Building and
More informationStatistics 305: Introduction to Biostatistical Methods for Health Sciences
Statistics 305: Introduction to Biostatistical Methods for Health Sciences Modelling the Log Odds Logistic Regression (Chap 20) Instructor: Liangliang Wang Statistics and Actuarial Science, Simon Fraser
More informationSTATISTICA Formula Guide: Logistic Regression. Table of Contents
: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary
More informationRegression Analysis (Spring, 2000)
Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity
More informationMGT 267 PROJECT. Forecasting the United States Retail Sales of the Pharmacies and Drug Stores. Done by: Shunwei Wang & Mohammad Zainal
MGT 267 PROJECT Forecasting the United States Retail Sales of the Pharmacies and Drug Stores Done by: Shunwei Wang & Mohammad Zainal Dec. 2002 The retail sale (Million) ABSTRACT The present study aims
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationIntroduction to structural equation modeling using the sem command
Introduction to structural equation modeling using the sem command Gustavo Sanchez Senior Econometrician StataCorp LP Mexico City, Mexico Gustavo Sanchez (StataCorp) November 13, 2014 1 / 33 Outline Outline
More informationMultiple Linear Regression in Data Mining
Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple
More informationA Primer on Forecasting Business Performance
A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.
More informationThe Volatility Index Stefan Iacono University System of Maryland Foundation
1 The Volatility Index Stefan Iacono University System of Maryland Foundation 28 May, 2014 Mr. Joe Rinaldi 2 The Volatility Index Introduction The CBOE s VIX, often called the market fear gauge, measures
More information