Supplement 13A: Partial F Test Purpose of the Partial F Test For a given regression model, could some of the predictors be eliminated without sacrificing too much in the way of fit? Conversely, would it be worthwhile to add a certain set of new predictors to a given regression model? The partial F test is designed to answer questions such as these by comparing two linear models for the same response variable. The extra sum of squares is used to measure the marginal increase in the error sum of squares when one or more predictors are deleted from a model. Conversely, the extra sum of squares measures the marginal reduction in the error sum of squares when one or more predictors are added to a model. Eliminating Some Predictors We will start by showing how to assess the effect of eliminating some predictors from a model that contains k predictors. The model containing all the predictors is called the full model: (13A.1) Y = 0 + 1 X 1 + 2 X 2 + + k X k A model with fewer predictors is a reduced model. We estimate the linear regression for each of the two models, and then look at the error sum of squares (SSE) from the ANOVA table for each model. We can use the following notation, assuming that m predictors were eliminated in the reduced model: Full model SSE: SSE Full df Full = n k 1 Reduced model SSE: SSE Reduced df Reduced = n k 1+m Extra SSE: SSE Reduced SSE Full df=( n k 1+m) ( n k 1) = m The partial F test statistic is the ratio of two variances. The numerator is the difference in error sums of squares (the extra sum of squares ) between the two models, divided by the number of predictors eliminated. The denominator is the mean squared error for the full model (SSE Full ) divided by its degrees of freedom. (13A.2) F SSEReduced SSE m SSEFull nk1 Full if m predictors are eliminated Degrees of freedom for this test will then be (m, n k 1). If only one predictor has been eliminated, then m = 1. We can ulate the p-value for the partial F test using =F.DIST.RT(F, m, n k 1). Illustration: Predicting Used Car Prices CarPrice Table 13A.1 shows a data set consisting of 40 observations on prices of used cars of a particular brand and model (hence controlling for an obviously important factor that would affect prices).
The response variable is Y (SellPrice) = sale price of the vehicle (in thousands of dollars). We have observations on three potential predictors: X 1 (Age) = age of car in years, X 2 (Mileage) = miles on odometer (in thousands of miles), X 3 (ManTran) = 1 if manual transmission, 0 otherwise. The three predictors are viewed as non-stochastic, independent variables (we can later investigate the latter assumption by looking at VIFs, if we wish). TABLE 13A.1 Selling Price and Characteristics of 40 Used Cars X 1 X 2 X 3 Age Mileage ManTran CarPrice Y SellPrice 13 148.599 0 0.370 2 17.367 0 29.810 13 174.904 0 0.390 10 145.886 0 11.210 8 93.22 0 12.270 5 75.907 0 19.260 Note: Only the first and last three observations are shown here. The units for SellPrice and Mileage have been adjusted to thousands to improve data conditioning. Eliminating a Single Predictor Let us first test whether the single predictor ManTran could be eliminated to achieve a more parsimonious model than using all three predictors. We are comparing two potential linear regression models: Full model: Reduced model: SellPrice = 0 + 1 Age + 2 Mileage + 3 ManTran SellPrice = 0 + 1 Age + 2 Mileage Here are the ANOVA tables from these two regressions, presented side-by-side: Full Model ANOVA table Regression 2,334.5984 3 778.1995 Error 199.1586 36 5.5322 Reduced Model ANOVA table Regression 2,314.3730 2 1,157.1865 Error 219.3840 37 5.9293 The elimination of ManTran increases the sum of squared errors, as you would expect (you have already learned that extra predictors can never decrease the R 2.even if they are not significant). Although the predictor ManTran is contributing something to the model s overall explanatory power (reduced SSE) the question remains whether ManTran is making a statistically significant extra contribution. The ulations are: Full model: SSE Full =199.1586 df Full = n k 1 = 40 3 1 = 36 Reduced model: SSE Reduced =219.3840 df Reduced = n k 1+m = 40 3 1+1 = 37 Extra SSE: SSE Reduced SSE Full df=(n k 1+m) (n k 1) = 1
F SSEReduced SSEFull 219.3840 199.1586 m 1 20.2254 3.6559 SSEFull 199.1586 5.5322 n k 1 36 From Excel, we obtain the p-value =F.DIST.RT(3.65559,1,36) =.0639. Therefore, if we are using α =.05, we would say that the extra sum of squares is not significant (i.e., ManTran does not make a significant marginal contribution). Instead of using the p-value, we could reach the same conclusion by comparing F = 3.6559 with F.05 (1,36) =F.INV.RT(0.05,1,36) = 4.114 to draw the same conclusion. In effect, the hypotheses we are testing are: H 0 : 3 = 0 H 1 : 3 0 The test statistic is not far enough from zero to reject the hypothesis H 0 : 3 = 0. You may already have realized that if we are only considering the effect of one single predictor, we could reach the same conclusion from its t-statistic in the fitted regression of the full model: Regression output Variables Coefficients Std. Error t (df=36) p-value Intercept 33.7261 0.9994 33.747 7.60E-29 Age -1.6630 0.2938-5.660 1.98E-06 Mileage -0.0584 0.0224-2.610.0131 ManTran? -1.6538 0.8650-1.912.0639 In the single predictor case, the partial F test statistic is equal to the square of the corresponding t test statistic in the full model. The t-test uses the same degrees of freedom as the denominator of the partial F test, so the p-values will be the same as long as we use a two-tailed t-test (to eliminate the sign so that rejection in either tail could occur): Predictor ManTran: t 2 = (-1.912) 2 = 3.656 Excel s p-value: =T.DIST.2T(1.912,36) =.0639 In effect, the hypotheses we are testing are: H 0 : 3 = 0 H 1 : 3 0 In the case of a single predictor, we could get by without using the partial F test. It is shown here because it illustrates the test in a simple way, and reveals the connection between F and t distributions. An advantage of the t-test is that it could also be used to test a one-sided hypothesis (e.g., H 1 : 3 < 0) which might be relevant in the case of this example (all our predictors seem to have an inverse relationship with a car s selling price).
Eliminating More Than One Predictor We now turn to the more general case of using the partial F test to assess the effect of eliminating m predictors simultaneously (where m > 1). This can be especially useful when we have a large model with many predictors that we are thinking of eliminating because their effects seem to be weak in the full model. To test the effects of discarding m predictors at once, the hypotheses are: H 0 : All the j = 0 for a subset of m predictors in the full model H 1 : Not all the j = 0 (at least some of the m coefficients are non-zero) For example, suppose we want to know whether we can eliminate both Mileage and ManTran at once. The hypotheses are: H 0 : 2 = 0 and 3 = 0 H 1 : One or both coefficients are non-zero The models to be compared are: Full model: Reduced model: SellPrice = 0 + 1 Age + 2 Mileage + 3 ManTran SellPrice = 0 + 1 Age Here are the ANOVA tables from these two regressions, presented side-by-side: Full Model ANOVA table Regression 2,334.5984 3 778.1995 Error 199.1586 36 5.5322 Reduced Model ANOVA table Regression 2,269.8421 1 2,269.8421 Residual 263.9148 38 6.9451 The elimination of both Mileage and ManTran increases the sum of squared errors, as you would. The question is whether these two predictors are making a statistically significant extra contribution to reducing the sum of squared errors. The ulations are: Full model: SSE Full =199.1586 df Full = n k 1 = 40 3 1 = 36 Reduced model: SSE Reduced =263.9148 df Reduced = n k 1+m = 40 3 2+2 = 38 Extra SSE: SSE Reduced SSE Full df=(n k 1+m) (n k 1) = m = 2 F 263.9148 199.1586 2 64.7562 11.7053 199.1586 5.5322 36 From Excel, we obtain the p-value =F.DIST.RT(11.7053,2,36) =.0001. If we are using α =.05, we would say that the extra sum of squares is highly significant (i.e., these two predictors do make a
significant marginal contribution). Alternatively, we can compare F = 11.705 with F.05 (2,26) =F.INV.RT(0.05,2,36) = 3.259 to draw the same conclusion. Adding Predictors We have been discussing eliminating predictors. The ulations for adding predictors to a linear model are similar if we define the full model as the big model (more predictors) and the reduced model as the small model (fewer predictors). The extra sum of squares is still the difference between the two sums of squares: (13A.3) F SSE for big model SSE for small model Number of extra predictors SSE for big model nk1 More Complex Models We can use variations on these partial F tests based on error sums of squares for other purposes. For example, we can test whether two coefficients in a model are the same (e.g., 2 = 3 ) or to ulate the effects of any given predictor given the presence of other sets of predictors in the model (using coefficient of partial determination). Such tests are ordinarily reserved for more advanced classes in statistics, and may entail using more specialized software. Full Results for Car Data CarPrice To allow you to explore the car data on your own, full results are shown below for the full model based on the used car data. SellPrice is negatively affected by Age and Mileage (both highly significant) and marginally so by ManTran (p-value significant at α =.10 but not at α =.05. You can also look at the data file and do your own regressions. Regression Analysis R² 0.921 Adjusted R² 0.915 n 40 R 0.960 k 3 Std. Error 2.352 Dep. Var. SellPrice ANOVA table F p-value Regression 2,334.5984 3 778.1995 140.67 6.17E-20 Residual 199.1586 36 5.5322 Regression output confidence interval Variables Coefficients Std. Error t (df=36) p-value 95% lower 95% upper VIF Intercept 33.7261 0.9994 33.747 7.60E-29 31.6993 35.7530 Age -1.6630 0.2938-5.660 1.98E-06-2.2589-1.0671 6.384 Mileage -0.0584 0.0224-2.610.0131-0.1038-0.0130 6.371 ManTran? -1.6538 0.8650-1.912.0639-3.4081 0.1004 1.014
It appears that as a car ages, it loses about $1,663 in value per year (ceteris paribus). Similarly, for each extra mile driven, a car loses on average about $58. Cars with manual transmission seem to sell for about $1,654 less than those with automatic transmission (remember, the brand and model are controlled already). There is evidence of multicollinearity between Age and Mileage, which would be expected (as cars get older, they accumulate more miles). This would require further consideration, by the analyst. Section Exercises 13A.1 Instructions: Use α =.05 in all tests. (a) Perform a full linear regression to predict ColGrad% using all eight predictors in DATA SET E shown here. State the SSE and df for the full model. (b) Fit a reduced linear regression model by eliminating predictor Age. State the SSE and df for the reduced model. (c) Calculate the partial F test statistic to see whether predictor Age was significant. (d) Calculate the p-value for the partial F-test. What is your conclusion? (e) Does your conclusion from the partial F test agree with the test using the t-statistic in the full model regression? (e) Fit a reduced regression model by eliminating two predictors Age and Seast simultaneously. State the SSE and df for the reduced model. (f) Calculate the partial F test statistic to see whether predictors Age and Seast can both be eliminated. State your conclusion. References Kutner, Michael H.; Christopher J. Nachtsheim; and John Neter. Applied Linear Regression Models. 4th ed. McGraw-Hill/Irwin, 2004, pp. 256-271.