How To Find Out If A Person Is A Better At Math

Transcription

1 Chapter 13 Robust ad Resistat Regressio Whe the errors are ormal, least squares regressio is clearly best but whe the errors are oormal, other methods may be cosidered. A particular cocer is log-tailed error distributios. Oe approach is to remove the largest residuals as outliers ad still use least squares but this may ot be effective whe there are several large residuals because of the leave-out-oe ature of the outlier tests. Furthermore, the outlier test is a accept/reject procedure that is ot smooth ad may ot be statistically efficiet for the estimatio of β. Robust regressio provides a alterative. There are several methods. M-estimates choose β to miimize Possible choices for ρ are 1. ρ x x 2 is just least squares ρ y i x T i β σ 2. ρ x x is called least absolute deviatios regressio (LAD). This is also called L 1 regressio. 3. ρ x x 2 2 if x c c x c 2 2 otherwise is called Huber s method ad is a compromise betwee least squares ad LAD regressio. c ca be a estimate of σ but ot the usual oe which is ot robust. Somethig media ˆε i for example. Robust regressio is related to weighted least squares. The ormal equatios tell us that With weights ad i o-matrix form this becomes: X T y X ˆβ 0 w i x i j y i p x i j β j 0 j 1 p j 1 Now differetiatig the M-estimate criterio with respect to β j ad settig to zero we get ρ y i p j 1 x i jβ j σ x i j 0 j 1 p 150

2 CHAPTER 13. ROBUST AND RESISTANT REGRESSION 151 Now let u i y i p j 1 x i jβ j to get ρ u i u i x i j y i p x i j β j 0 j 1 p j 1 so we ca make the idetificatio of ad we fid for our choices of ρ above: 1. LS: w u is costat. w u ρ u! u 2. LAD: w u 1!#" u" - ote the asymptote at 0 - this makes a weightig approach difficult. 3. Huber: w u$ 1 if " u"&% c c!#" u" otherwise There are may other choices that have bee used. Because the weights deped o the residuals, a iteratively reweighted least squares approach to fittig must be used. We ca sometimes get stadard errors by var ˆ ˆβ ˆσ 2 X T W X ' 1 (use a robust estimate of σ 2 also). We demostrate the methods o the Chicago isurace data. Usig least squares first. > data(chicago) > g <- lm(ivolact race + fire + theft + age + log(icome),chicago) > summary(g) Coefficiets: Estimate Std. Error t value Pr(> t ) (Itercept) race fire e-05 theft age log(icome) Residual stadard error: o 41 degrees of freedom Multiple R-Squared: 0.752, Adjusted R-squared: F-statistic: 24.8 o 5 ad 41 degrees of freedom, p-value: 2.01e-11 Least squares works well whe there are ormal errors but ca be upset by log-tailed errors. A coveiet way to apply the Huber method is to apply the rlm() fuctio which is part of the MASS (see the book Moder Applied Statistics i S+) which also gives stadard errors. The default is to use the Huber method but there are other choices. > library(mass) > g <- rlm( ivolact race + fire + theft + age + log(icome), chicago) Coefficiets: Value Std. Error t value (Itercept) race

3 CHAPTER 13. ROBUST AND RESISTANT REGRESSION 152 fire theft age log(icome) Residual stadard error: o 41 degrees of freedom The R 2 ad F-statistics are ot give because they caot be calculated (at least ot i the same way). The umerical values of the coefficiets have chaged a small amout but the geeral sigificace of the variables remais the same ad our substative coclusio would ot be altered. Had we see somethig differet, we would eed to fid out the cause. Perhaps some group of observatios were ot beig fit well ad the robust regressio excluded these poits. Aother method that ca be used is Least Trimmed Squares(LTS). Here oe miimizes q i( 1 ˆε2) i* where q is some umber less tha ad + i, idicates sortig. This method has a high breakdow poit because it ca tolerate a large umber of outliers depedig o how q is chose. The Huber ad L 1 methods will still fail if some ε i -. LTS is a example of a resistat regressio method. Resistat methods are good at dealig with data where we expect there to be a certai umber of bad observatios that we wat to have o weight i the aalysis. > library(lqs) > g <- ltsreg(ivolact race + fire + theft + age + log(icome),chicago) > g$coef (Itercept) race fire theft age log(icome) > g <- ltsreg(ivolact race + fire + theft + age + log(icome),chicago) > g$coef (Itercept) race fire theft age log(icome) The default choice of q is. / p 1 1,/ 20 where. x0 idicates the largest iteger less tha or equal to x. I repeated the commad twice ad you will otice that the results are somewhat differet. This is because the default geetic algorithm used to compute the coefficiets is o-determiistic. A exhaustive search method ca be used > g <- ltsreg(ivolact race + fire + theft + age + log(icome),chicago, samp="exact") > g$coef (Itercept) race fire theft age log(icome) This takes about 20 miutes o a 400Mhz Itel Petium II processor. For larger datasets, it will take much loger so this method might be impractical. The most otable differece from LS for the purposes of this data is the decrease i the race coefficiet - if the same stadard error applied the it would verge o isigificace. However, we do t have the stadard errors for the LTS regressio coefficiets. We ow use a geeral method for iferece which is especially useful whe such theory is lackig - the Bootstrap. To uderstad how this method works, thik about how we might empirically determie the distributio of a estimator. We could repeatedly geerate artificial data from the true model, compute the estimate each

4 CHAPTER 13. ROBUST AND RESISTANT REGRESSION 153 time ad gather the results to study the distributio. This techique, called simulatio, is ot available to us for real data because we do t kow the true model. The Bootstrap emulates the simulatio procedure above except istead of samplig from the true model, it samples from the observed data itself. Remarkably, this techique is ofte effective. It sidesteps the eed for theoretical calculatios that may be extremely difficult or eve impossible. The Bootstrap may be the sigle most importat iovatio i Statistics i the last 20 years. To see how the bootstrap method compares with simulatio, let s spell out the steps ivolved. I both cases, we cosider X fixed. Simulatio I geeral the idea is to sample from the kow distributio ad compute the estimate, repeatig may times to fid as good a estimate of the samplig distributio of the estimator as we eed. For the regressio case, it is easiest to start with a sample from the error distributio sice these are assumed to be idepedet ad idetically distributed: 1. Geerate ε from the kow error distributio. 2. Form y 4 Xβ 5 ε from the kow β. 3. Compute ˆβ. We repeat these three steps may times. We ca estimate the samplig distributio of ˆβ usig the empirical distributio of the geerated ˆβ, which we ca estimate as accurately as we please by simply ruig the simulatio for log eough. This techique is useful for a theoretical ivestigatio of the properties of a proposed ew estimator. We ca see how its performace compares to other estimators. However, it is of o value for the actual data sice we do t kow the true error distributio ad we do t kow the true β. The bootstrap method mirrors the simulatio method but uses quatities we do kow. Istead of samplig from the populatio distributio which we do ot kow i practice, we resample from the data itself. Bootstrap 1. Geerate ε6 by samplig with replacemet from ˆε ˆε. 2. Form y6:4 X ˆβ 5 ε6 3. Compute ˆβ6 from ; X 7 y6=< This time, we use oly quatities that we kow. For small, it is possible to compute ˆβ6 for every possible sample from ˆε ˆε, but usually we ca oly take as may samples as we have computig power available. This umber of bootstrap samples ca be as small as 50 if all we wat is a estimate of the variace of our estimates but eeds to be larger if cofidece itervals are wated. To implemet this, we eed to be able to take a sample of residuals with replacemet. sample() is good for geeratig radom samples of idices: > sample(10,rep=t) [1] ad hece a radom sample (with replacemet) of RTS residuals is: > g$res[sample(47,rep=t)]

5 CHAPTER 13. ROBUST AND RESISTANT REGRESSION 154 (rest deleted You will otice that there is a repeated value eve i this small sippet. We ow execute the bootstrap - first we make a matrix to save the results i ad the repeat the bootstrap process 1000 times: (This takes about 6 miutes to ru o a 400Mhz Itel Petium II processor) > x <- model.matrix( race+fire+theft+age+log(icome),chicago)[,-1] > bcoef <- matrix(0,1000,6) > for(i i 1:1000){ + ewy <- g$fit + g$res[sample(47,rep=t)] + brg <- ltsreg(x,ewy,samp="best") + bcoef[i,] <- brg$coef + } It is ot coveiet to use the samp="exact" sice that would require 1000 times the 20 miutes it takes to make origial estimate. That s about two weeks, so I compromised ad used the secod best optio of samp="best". This likely meas that our bootstrap estimates of variability will be somewhat o the high side. This illustrates a commo practical difficulty with the bootstrap it ca take a log time to compute. Fortuately, this problem recedes as processor speeds icrease. It is otable that this calculatio was the oly oe i this book that did ot take a egligible amout of time. You typically do ot eed the latest ad greatest computer to do statistics o the size of datasets ecoutered i this book. To test the ull hypothesis that H 0 : β race > 0 agaist the alterative H 1 : β race? 0 we may figure what fractio of the bootstrap sampled β race were less tha zero: > legth(bcoef[bcoef[,2]<0,2])/1000 [1] So our p-value is 1.9% ad we reject the ull at the 5% level. We ca also make a 95% cofidece iterval for this parameter by takig the empirical quatiles: > quatile(bcoef[,2],c(0.025,0.975)) 2.5% 97.5% We ca get a better picture of the distributio by lookig at the desity ad markig the cofidece iterval: > plot(desity(bcoef[,2]),xlab="coefficiet of Race",mai="") > ablie(v=quatile(bcoef[,2],c(0.025,0.975))) See Figure We see that the distributio is approximately ormal with perhaps so logish tails. This would be more accurate if we took more tha 1000 bootstrap resamples. The coclusio here would be that the race variable is sigificat but the effect is less tha that estimated by least squares. Which is better? This depeds o what the true model is which we will ever kow but sice the QQ plot did ot idicate ay big problem with o-ormality I would ted to prefer the LS estimates. However, this does illustrate a geeral problem that occurs whe more tha oe statistical method is available for a give dataset. Summary

6 CHAPTER 13. ROBUST AND RESISTANT REGRESSION 155 Desity Coefficiet of Race Figure 13.1: Bootstrap distributio of ˆβ race with 95% cofidece itervals 1. Robust estimators provide protectio agaist log-tailed errors but they ca t overcome problems with the choice of model ad its variace structure. This is ufortuate because these problems are more serious tha o-ormal error. 2. Robust estimates just give you ˆβ ad possibly stadard errors without the associated iferetial methods. Software ad methodology for this iferece is ot easy to come by. The bootstrap is a geeral purpose iferetial method which is useful i these situatios. 3. Robust methods ca be used i additio to LS as a cofirmatory method. You have cause to worry if the two estimates are far apart.