Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY

Transcription

1 Statistics 104 Final Project A Culture of Debt: A Study of Credit Card Spending in America TF: Kevin Rader Anonymous Students: LD, MH, IW, MY ABSTRACT: This project attempted to determine the relationship between race and the existence and amount of credit card debt, along with the associations between debt and secondary variables such as income, age, education, and kids. The analysis was split into three stages by first conducting a logistical regression on data obtained from a survey conducted by the Federal Reserve, then performing a logtransformed multiple linear regression on the data, and finally incorporating interaction terms into the multiple linear regression. According to the logistical regression, income and age were negatively correlated with taking on credit card debt, while variables kids and education were positively associated with taking on debt; racial categories African-American and Hispanic were not statistically significant and the racial category Other was less likely to take credit card debt compared to whites with all else held constant. The log-transformed, multiple linear regression demonstrated that kids, education, and age are positively associated with the amount of credit card debt. The racial category for blacks was negatively correlated with amount of credit card debt compared to whites while Hispanic and Other were not statistically significant. Considering interaction terms, an increase in income had a greater effect (positive coefficient) for blacks than it would for whites, and a similar relationship was observed for Hispanics. However, Other has a negative coefficient, demonstrating the effect of income on credit card balance for Other is less than the effect for the racial category white. This study may ultimately expose insights as to how race can be correlated with patterns of credit card spending and lead to future research into the nuances of immigrant versus domestic categories of the same race. INTRODUCTION: This study involved an investigation of how race and other factors may impact credit-card debt and if applicable, the extent of the accumulated debt. The purpose of the initial, logistical regression run in our two-part study was to determine the relationship between race as the explanatory variable and the existence of credit card debt. To control for potential confounding variables that may distort this association, income, number of kids, level of education, and age were also included as explanatory variables and their relationship with the existence of credit card debt was evaluated. The hypothesis presented for the first part of this report predicted a negative association between income and credit card balance; that is, a higher level of income would be correlated with a lower probability of having credit card debt. Number of kids was predicted to be positively associated with the existence of credit card debt, while education and age are predicted to have a negative association with the response variable. The second stage of the study was conducted in order to determine the association between the same set of explanatory variables and the amount of credit card debt accumulated given that an unpaid balance existed. Similar to the first part of the study, the primary relationship explored was the potential effect of race on the quantity of credit card debt. The hypotheses for this stage predicted a negative association between income and credit card balance; that is, a higher level of income would be correlated with a lower credit card balance. Number of kids was predicted to be positively associated with the amount of debt, while education and age were predicted to have negative

2 association with amount of debt. The possible interactions that may exist between explanatory variables were also explored within this report. The analysis conducted within this report would likely be of great interest to credit card companies in terms of evaluating the likelihood that their customers might accrue unpaid credit card balances. Although race was the principal factor considered, the other explanatory variables included may also provide insight into how credit card debt may be determined. Given the recent financial crisis and the lingering impact on consumers, the hope is that this report may shed light on the various factors associated with the existence and accumulation of credit card debt. METHODS: The data used in this report was obtained from the Federal Reserve s 2007 Survey of Consumer Finances in which 4,418 families were surveyed. The response variable ccbal represents credit card balance in dollars and five explanatory variables were examined. The primary explanatory variable of interest, race, was divided into subcategories of white, African-American/black, Hispanic, and Other (the racial category Asian was incorporated into Other in the original data set). Income was defined as the total amount of income of the household in dollars, while the variable kids represented the total number of children in the household. The variable titled EDUC indicated the total number of years of education completed by the head of the household. Finally, the explanatory variable age represented the age of the head of household. Although the survey data included many other explanatory variables, the aforementioned predictors were chosen based on their relevance to daily spending and interests of the authors of this report. Statistical analysis was conducted on the data compilation using the statistical package StataSE. A logistic regression was run in order to investigate which of the variables accounted for an individual s likelihood to take on credit card debt, a binary response variable (the subcategories of race were incorporated into the model as dummy variables). Next, a multiple linear regression was run to determine how the predictor variables affected the amount of credit card debt incurred, given an individual had credit card debt, which was executed by setting the parameter ccal>0. Since the survey data was right-skewed, a log transformation was performed on ccbal to minimize the scale during the multiple linear regression. The criterion for statistical significance of variables for both models was p<0.05. Possible interactions between predictor variables were also addressed. Since the effect of race was the primary concern of the report, twelve regressions were run that each addressed an interaction between either African-American, Hispanic, or Other (white being the baseline for all comparison) and education, age, kids, or income. Several diagnostics tests were conducted to evaluate the regression models. Heteroskedasticity was tested for using the Breusch-Pagan/Cook-Weisberg test given ccbal>0. The Shapiro-Francia normality test and Wilkes normality test were then executed to verify normality. RESULTS: Part 1: Logistic Regression The first stage of this study used a logistic regression in order to determine who might take on credit card debt (Y=1) and who would not (Y=0).

3 From Table 1, the logistic regression model is as follows: P(Y=1) = exp( e-07*income *kids *education *age *black *Hispanic *other) / [1 + exp( e-07*income *kids *education *age *black *Hispanic *other)] Race Black Significance P-value = 0.394, not significant. Hispanic P-value = 0.223, not significant. Other P-value = 0.004, significant. The variables black and Hispanic were not statistically significant in the model, as the p-values for their coefficients are above This demonstrates that blacks and Hispanics are not any more likely to take on credit card debt than whites, holding all other x variables constant. For the racial category Other (Asians, etc.), the variable was found to be significant in the model and the coefficient was negative, demonstrating that this racial category would be less likely to take on debt than whites (and blacks and Hispanics), holding all else constant. As expected, the coefficient for income was negative, demonstrating that an increase in income is associated with being less likely to take on debt, with all else constant. Also as expected, the coefficient for kids was positive and the coefficient for age was negative, demonstrating that as the number of kids increases, the probability of taking on credit card debt increases (holding all else constant), and as age increases, the probability of taking on credit card debt decreases (controlling for all other variables). The positive coefficient for education did not correspond to the original hypothesis that an increase in education would be related to a lower likelihood of debt; instead, an increase in years of education is associated with a greater likelihood of debt. Part 2: Linear Regression Due to the right-skewed nature of the response variable, credit card balance (ccbal), the y variable was transformed using a logarithmic (base 10) transformation. A multiple regression was run with log10ccbal and the five x variables: race, income, number of kids, years of education, and age. The results are presented in Table 1. The estimated regression model from Table 2 is as follows: Log10ccbal = e-09*income *kids *education *age *black *Hispanic *other In this linear regression model, the variables kids, education, age, and racial subcategory black were significant factors with p value <.01, whereas income (p=0.136), Hispanic (p=0.518) and Other (p=0.977) are insignificant. Among the significant predictors, kids, education and age are positively correlated with credit card balance, whereas black is negatively correlated. For every child a white family has, credit card balance will be multiplied by 10^0.05 = 1.12, holding all other factors constant. For every 1 year

4 increase in years of education, credit card balance will experience a multiplicative increase by 10^(.151) = For every 1 year increase in years of age, white has a 10^(.00386) = 1.01 multiplicative increase in debt, all else being the same, and blacks take on 10^(0.196) = 1.57 times less debt than whites, given that all other factors are the same. Part 3: Interaction Terms A. Interaction between Income and Race Race Black Significance P < 0.001, significant Hispanic P = 0.019, significant Other P < 0.001, significant Regression formula from Table 3.1 [See Appendix]: log10ccbal = e-09*income *kids *edu *age -.307*black *Hispanic *other e-06*income_black. Regression formula from Table 3.2 [See Appendix]: log10ccbal = e-09*income *kids *edu *age -.189*black *Hispanic *other e-07*income_hispanic. The interaction term income_black has a positive coefficient and is significant with p-value less than 0.001, showing that an increase in income for blacks increases credit card balance by a slight but statistically significant multiplicative value compared to whites, all else constant. There is a similar result for income_hispanic as income_black. Regression formula from Table 3.3 [See Appendix]: log10ccbal = e-09*income *kids *edu *age -.187*black *Hispanic +.144*other e-06*income_other. The interaction term income_other has a negative coefficient and is significant with p-value less than 0.001, showing that an increase in income for racial category Other decreases credit card balance by a slight but statistically significant multiplicative value compared to whites, with all else constant. B. Interaction between Kids and Race Race Black Significance P < 0.001, significant Hispanic P = 0.4, not significant Other P = 0.25, not significant

5 The interaction term kids_black has a negative coefficient and is significant with p-value less than 0.001, showing that an additional child for blacks decreases credit card balance by a multiple of 10^-.137 =0.729 compared to whites all else constant. The effects for Hispanic and Other are not significant, demonstrating that the effects of kids on credit card balance for these groups is about the same as whites. C. Interaction between Education and Race: Race Black Significance P = 0.005, significant Hispanic P < 0.001, significant Other P = 0.731, not significant The interaction term education_black has a negative coefficient and is significant showing that an additional year of education for blacks decreases credit card balance by a multiple of 10^ =.935 compared to whites, with all else constant. The effect of Hispanic is about the same as the effect just discussed for Blacks. The effect Other is not significant, demonstrating that the effects of education on credit card balance for Other is about the same as whites. D. Interaction between Age and Race: Race Black Significance P < 0.001, significant Hispanic P =0.011, significant Other P = 0.971, not significant The interaction term age_black has a coefficient and is significant showing that an additional year of age for blacks increases credit card balance by a multiple of 10^.00871= 1.02 compared to whites, with all else constant. The effect of Hispanic is about the same as the effect just discussed for Blacks. The effect of Other is not significant, demonstrating that the effect of age on credit card balance for Others is about the same as whites. CONCLUSION/DISCUSSION In this study, a logarithmic transformation was originally performed on the data set obtained from the Federal Reserve since the data was very right-skewed. Furthermore, since the validity of the model may be affected by the normality of values, two normality tests were performed to see if the residuals were normally distributed (see Table 4 in Appendix). The Shapiro-Francia normality test and Wilkes normality test yielded two different results (See Appendix), and the histogram (Graph 1 in Appendix) shows that the residuals are not normally distributed. This may be due to the huge sample size and the exclusion of other factors not included in this model, such as personal preference. When

6 considering the predictive power of the current models, these diagnostic results should be taken into account. The model also demonstrates that the residuals have constant variance according to the Breusch- Pagan / Cook-Weisberg test (hettest) for heteroskedasticity, which had a p-value for the multiple linear regression. Since this p value>0.05, the null hypothesis of the heteroskedastic test that there is constant variance fails to be rejected. Therefore, the multiple linear regression has constant variance and is not heteroskedastic. The scatter plot of residuals against fitted values looks normal except for an abnormal straight line. Independence among the samples is implied in the model. Multicollinearity was tested for by correlating the explanatory variables against each other (Table 5 for Appendix). The largest correlation value amongst the explanatory variables is between age and kids, which is and falls below the 0.5 cutoff for very significant multicollinearity. Hence, it can be said that though there are some collinear relationships among the explanatory variables, they are not significant enough to severely affect our model. In the linear regression model, income is surprisingly not significant. This may be due to collinearity that may exist among the explanatory variables. For example, if income can be partially explained by education and age, then it may lose its explanatory power for credit card balance. According to the model, kids, education and age have positive correlation with the response variable. One explanation may be that as families have more children, consumption increases and they take on more credit card debt. Contradicting the original hypotheses, the more years of education and older one gets, the more outstanding credit card debt one has (although age had a negative association with the existence of debt, it had a positive association with the accumulation of credit card debt). This may be due to the fact that educated individuals may rely on higher, consistent income sources, so they have a greater ability to pay back their debt in the future, and consume more as a result. Furthermore, financial concerns may increase with age as people being to pay for cars, materials for their children and for house mortgages. These factors may all contribute to credit card debt. One interesting observation in this model is that, while Hispanics and Others (Asians) are not significant, black is significantly negatively correlated with credit card debt. This might indicate that Others and Hispanics consumption patterns are roughly the same compared to white if they take on debt, whereas blacks take on less debt. Given the negative coefficients for the interaction terms, this would seem to indicate that blacks spend less on kids and education, so they have less outstanding credit card debt. A logistical regression was conducted to investigate how each explanatory variable contributes to explaining the chance of taking on credit card debt. In this model, Other becomes a significant factor along with income, kids, education and age; whereas black and Hispanic continue to be insignificant. One interesting observation made is that an Asian family is less likely to take on debt compared to a white family when all other factors are held constant. Other races have roughly the same chance of taking on debt as white families. The addition of an interaction term between income and race in the linear regression model indicates that there are positive correlations between income_hispanics and debt and income_black and debt, while there is a negative correlation between income_other and debt. One likely explanation for the aforementioned observations may be cultural differences. For instance, the regressions performed would lend themselves to the broad interpretation that Asians tend to save money (relative to other races) and use cash or debit cards instead of credit cards, whereas other racial groups may be more comfortable with using credit cards or spending ahead of time. As their income increases, Asians are more likely to save and spend less

7 compared to whites, while Hispanics and blacks tend to spend more comparatively. Therefore, this model mainly shows the differences in consumption pattern and credit card debt holding between the racial category Other and the various races Hispanic, black, and white considered. One possible option for future studies could be comparing Asian immigrants and American-born Asians credit card debt holding behaviors. The current study would indicate that Asians are less likely to take on debt, but if they do, they take on roughly the same debt as Whites. Further research could be done to see if Asian immigrants tend not to take on any credit card debt, while Americanborn Asians share the similar debt-holding behaviors with other racial groups in the U.S.

8 Appendix: Table 1: Logistic Regression. logit logisticccbal income kids educ age black hispanic other Iteration 0: log likelihood = Iteration 1: log likelihood = Iteration 2: log likelihood = Iteration 3: log likelihood = Iteration 4: log likelihood = Iteration 5: log likelihood = Iteration 6: log likelihood = Iteration 7: log likelihood = Logistic regression Number of obs = LR chi2(7) = Prob > chi2 = Log likelihood = Pseudo R2 = logisticcc~l Coef. Std. Err. z P> z [95% Conf. Interval] income -4.52e e e e-07 kids educ age black hispanic other _cons Note: 113 failures and 0 successes completely determined. Table 2: Linear Regression. regress log10ccbal income kids educ age black hispanic other if ccbal>0 Source SS df MS Number of obs = 8484 F( 7, 8476) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income -8.70e e e e-09 kids educ age black hispanic other _cons Table 3: Interaction Terms Table 3.1. regress logccbal1 income kids educ age black hispanic other income_black if c > cbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -9.12e e e e-09 kids educ age black hispanic other income_black 1.80e e e e-06 _cons

9 Table 3.2. gen income_hispanic = income*hispanic. regress logccbal1 income kids educ age black hispanic other income_hispanic i > f ccbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -9.14e e e e-09 kids educ age black hispanic other income_his~c 3.48e e e e-07 _cons Table 3.3. regress logccbal1 income kids educ age black hispanic other income_other if c > cbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -8.24e e e e-09 kids educ age black hispanic other income_other -1.06e e e e-07 _cons Table 3.4 Interaction Term: kids_black regress log10ccbal income kids educ age black hispanic other kids_black if ccb > al>0 Source SS df MS Number of obs = F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.71862

10 log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income -8.64e e e e-09 kids educ age black hispanic other kids_black _cons Table 3.5 Interaction Term: kids_hispanic regress log10ccbal income kids educ age black hispanic other kids_hispanic if > ccbal>0 Source SS df MS Number of obs = F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income -8.69e e e e-09 kids educ age black hispanic other kids_hispa~c _cons

11 Table 3.6 Interaction Term: kids_other regress log10ccbal income kids educ age black hispanic other kids_other if cc > bal>0 Source SS df MS Number of obs = F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = log10ccbal Coef. Std. Err. t P> t [95% Conf. Interval] income -8.71e e e e-09 kids educ age black hispanic other kids_other _cons

12 Table 3.7 Table 3.8

13 Table 3.9 Table gen age_black = age*black. regress logccbal1 income kids educ age black hispanic other age_black if ccba > l>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -8.28e e e e-09 kids educ age black hispanic other age_black _cons

14 Table gen age_hispanic = age*hispanic. regress logccbal1 income kids educ age black hispanic other age_hispanic if c > cbal>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE = logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -8.59e e e e-09 kids educ age black hispanic other age_hispanic _cons Table gen age_other = age*other. regress logccbal1 income kids educ age black hispanic other age_other if ccba > l>0 Source SS df MS Number of obs = 8484 F( 8, 8475) = Model Prob > F = Residual R-squared = Adj R-squared = Total Root MSE =.7203 logccbal1 Coef. Std. Err. t P> t [95% Conf. Interval] income -8.72e e e e-09 kids educ age black hispanic other age_other _cons

15 -4-2 Residuals Density Diagnostic Graph 1: Histogram of Residuals Residuals Graph 2: Scatter Plot of Residuals vs. Fitted Values Fitted values

16 Table 4: Normality test Stata output:. swilk logccbal1 Shapiro-Wilk W test for normal data Variable Obs W V z Prob>z logccbal sfrancia logccbal1 Shapiro-Francia W' test for normal data Variable Obs W' V' z Prob>z logccbal Table 5: Test for multi-collinearity among explanatory variables. corr income kids educ age (obs=22090) income kids educ age income kids educ age References: Federal Reserve. (2009) Survey of Consumer Finances. [Data file]. Retrieved from StataCorp (2010). StataSE (Version 11) [Computer software]. College Station, TX: StataCorp LP.