I. Chi-squared Distributions

1 M 358K Supplemet to Chapter 23: CHI-SQUARED DISTRIBUTIONS, T-DISTRIBUTIONS, AND DEGREES OF FREEDOM To uderstad t-distributios, we first eed to look at aother family of distributios, the chi-squared distributios. These will also appear i Chapter 26 i studyig categorical variables. Notatio: N(μ, σ) will stad for the ormal distributio with mea μ ad stadard deviatio σ. The symbol ~ will idicate that a radom variable has a certai distributio. For example, Y ~ N(4, 3) is short for Y has a ormal distributio with mea 4 ad stadard deviatio 3. I. Chi-squared Distributios Defiitio: The chi-squared distributio with k degrees of freedom is the distributio of a radom variable that is the sum of the squares of k idepedet stadard ormal radom variables. Weʼll call this distributio χ 2 (k). Thus, if Z 1,..., Z k are all stadard ormal radom variables (i.e., each Zi ~ N(0,1)), ad if they are idepedet, the Z 1 2 +... + Z k 2 ~ χ 2 (k). For example, if we cosider takig simple radom samples (with replacemet) y 1,..., y k from some N(µ,σ) distributio, ad let Y i deote the radom variable whose value is y i, the each is stadard ormal, ad,, are idepedet, so + + ~ χ 2 (k). Notice that the phrase degrees of freedom refers to the umber of idepedet stadard ormal variables ivolved. The idea is that sice these k variables are idepedet, we ca choose them freely (i.e., idepedetly). The followig exercise should help you assimilate the defiitio of chi-squared distributio, as well as get a feel for the χ 2 (1) distributio. Exercise 1: Use the defiitio of a χ 2 (1) distributio ad the 66-95-99.7 rule for the stadard ormal distributio (ad/or aythig else you kow about the stadard ormal distributio) to help sketch the graph of the probability desity

2 fuctio of a χ 2 (1) distributio. (For example, what ca you coclude about the χ 2 (1) curve from the fact that about 68% of the area uder the stadard ormal curve lies betwee -1 ad 1? What ca you coclude about the χ 2 (1) curve from the fact that about 5% of the area uder the stadard ormal lies beyod ± 2?) For k > 1, itʼs harder to figure out what the χ 2 (k) distributio looks like just usig the defiitio, but simulatios usig the defiitio ca help. The followig diagram shows histograms of four radom samples of size 1000 from a N(0,1) distributio: These four samples were put i colums labeled st1, st2, st3, st4. Takig the sum of the squares of the first two of these colums the gives (usig the defiitio of a chi-squared distributio with two degrees of freedom) a radom sample of size 1000 from a χ 2 (2) distributio. Similarly, addig the squares of the first three colums gives a radom sample from a χ 2 (3) distributio, ad formig the colum (st1) 2 +(st2) 2 + (st3) 2 +(st4) 2 yields a radom sample from a χ 2 (4) distributio. Histograms of these three samples from chi-squared distributios are show below, with the sample from the χ 2 (2) distributio i the upper left, the sample from the χ 2 (3) distributio i the upper right, ad the sample from the χ 2 (4) distributio i the lower left. The histograms show the shapes of the three distributios: the χ 2 (2) has a sharp peak at the left; the χ 2 (3) distributio has a less sharp peak ot quite as far left; ad the χ 2 (4) distributio has a still lower peak still a little further to the right. All three distributios are oticeably skewed to the right.

3 There is a picture of a typical chi-squared distributio o p. A-113 of the text. Thought questio: As k gets bigger ad bigger, what type of distributio would you expect the χ 2 (k) distributio to look more ad more like? [Hit: A chi-squared distributio is the sum of idepedet radom variables.] Theorem: A χ 2 (1) radom variable has mea 1 ad variace 2. The proof of the theorem is beyod the scope of this course. It requires usig a (rather messy) formula for the probability desity fuctio of a χ 2 (1) variable. Some courses i mathematical statistics iclude the proof. Exercise 2: Use the Theorem together with the defiitio of a χ 2 (k) distributio ad properties of the mea ad stadard deviatio to fid the mea ad variace of a χ 2 (k) distributio. II. t Distributios Defiitio: The t distributio with k degrees of freedom is the distributio of a Z radom variable which is of the form where U k i. Z ~ N(0,1) ii. U ~ χ 2 (k), ad iii. Z ad U are idepedet.

4 Commet: Notice that this defiitio says that the otio of degrees of freedom for a t-distributio comes from the otio of degrees of freedom of a chi-squared distributio: The degrees of freedom of a t-distributio are the umber of squares of idepedet ormal radom variables that go ito makig up the chi-squared distributio occurrig uder the radical i the deomiator of the t radom Z variable. U k To see what a t-distributio looks like, we ca use the four stadard ormal samples of 1000 obtaied above to simulate a t distributio with 3 degrees of freedom: We use colum s1 as our sample from Z ad (st2) 2 + (st3) 2 +(st4) 2 as our Z sample from U to calculate a sample from the t distributio with 3 degrees U 3 of freedom. The resultig histogram is: Note that this histogram shows a distributio similar to the t-model with 2 degrees of freedom show o p. 554 of the textbook: Itʼs arrower i the middle tha a ormal curve would be, but has heavier tails ote i particular the outliers that would be very uusual i a ormal distributio. The followig ormal probability plot of the simulated data draws attetio to the outliers as well as the oormality. (The plot is quite typical of a ormal probability plot for a distributio with heavy tails o both sides.)

5 III. Why the t-statistic itroduced o p. 553 of the textbook has a t- distributio: 1. Geeral set-up ad otatio: Puttig together the two parts of the defiitio of t-statistic i the box o p. 553 gives t = y µ, s where y ad s are, respectively, the mea ad sample stadard deviatio calculated from the sample y 1, y 2,, y. To talk about the distributio of the t-statistic, we eed to cosider all possible radom 1 samples of size from the populatio for Y. Weʼll use the covetio of usig capital letters for radom variables ad small letters for their values for a particular sample. I this case, we have three statistics ivolved: Y, S ad T. All three have the same associated radom process: Choose a radom sample from the populatio for Y. Their values are as follows: The value of Y is the sample mea y of the sample chose. The value of S is the sample stadard deviatio s of the sample chose. The value of T is the t-statistic t = y µ calculated for the sample chose. s The distributios of Y, S ad T are called the samplig distributios of the mea, the sample stadard deviatio, ad the t-statistic, respectively.

6 Note that the formula for calculatig t from the data gives the formula T = Y µ S, expressig the radom variable T as a fuctio of the radom variables Y ad S. Weʼll first discuss the t-statistic i the case where our uderlyig radom variable Y is ormal, the exted to the more geeral situatio stated i Chapter 23. 2. The case of Y ormal. For Y ormal, we will use the followig theorem: Theorem: If Y is ormal with mea µ ad stadard deviatio, ad if we oly cosider simple radom samples with replacemet 2, of fixed size, the a) The (samplig) distributio of Y is ormal with mea µ ad stadard deviatio, b) Y ad S are idepedet radom variables, ad c) (-1) S 2 2 ~ χ2 (-1) The proof of this theorem is beyod the scope of this course, but may be foud i most textbooks o mathematical statistics. Note that (a) is a special case of the Cetral Limit Theorem. We will give some discussio of the plausibility of parts (b) ad (c) i the Commets sectio below. So for ow suppose Y is a ormal radom variable with mea µ ad stadard deviatio : Y~ N(µ, ). By (a) of the Theorem, the samplig distributio of the sample mea Y (for simple radom samples with replacemet, of fixed size ) is ormal with mea µ ad stadard deviatio : Y ~ N(µ, Stadardizig Y the gives ). Y µ ~ N(0,1). (*) But we doʼt kow, so we eed to approximate it by the sample stadard deviatio s. It would be temptig to say that sice s is approximately equal to,

7 this substitutio (i other words, cosiderig Y µ ) should give us somethig s approximately ormal. Ufortuately, there are two problems with this: First, usig a approximatio i the deomiator of a fractio ca sometimes make a big differece i what youʼre tryig to approximate (See Footote 3 for a example.) Secod, we are usig a differet value of s for differet samples (sice s is calculated from the sample, just as the value of Y is.) This is why we eed to work with the radom variable S rather tha the idividual sample stadard deviatio s. I other words, we eed to work with the radom variable T = Y µ S To use the theorem, first apply a little algebra to to see that Y µ S = (**) Sice Y is ormal, the umerator i the right side of (**) is stadard ormal, as oted i equatio (*) above. Also, by (c) of the theorem, the deomiator of the right side of (**) is of the form U ( 1) where U = (-1) S 2 2 ~ χ2 (-1). Sice alterig radom variables by subtractig costats or dividig by costats does ot affect idepedece, (b) of the theorem implies that the umerator ad deomiator of the right side of (**) are idepedet. Thus for Y ormal, our test statistic T = Y µ S satisfies the defiitio of a t distributio with -1 degrees of freedom. 3. More geerally: The textbook states (pp. 555 556) assumptios ad coditios that are eeded to use the t-model: The headig Idepedece Assumptio o p. 555 icludes a Idepedece Assumptio, a Radomizatio Coditio, ad the 10% Coditio. These three essetially say that the sample is close eough to a simple radom with replacemet to make the theorem close eough to true, still assumig ormality of Y. The headig Normal Populatio Assumptio o p. 556 cosists of the Nearly Normal Coditio, which essetially says that we ca also weake ormality somewhat ad still have the theorem close eough to true for most practical purposes. (The rough idea here is that, by the

8 cetral limit theorem, Y will still be close eough to ormal to make the theorem close eough to true.) The appropriateess of these coditios as good rules of thumb has bee established by a combiatio of mathematical theorems ad simulatios. 4. Commets: i. To help covice yourself of the plausibility of Part (b) of the theorem, try a simulatio as follows: Take a umber of simple radom samples from a ormal distributio ad plot the resultig values of Y vs S. Here is the result from oe such simulatio: The left plot shows y vs s for 1000 draws of a sample of size 25 from a stadard ormal distributio. The right plot shows y vs s for 1000 draws of a sample of size 25 from a skewed distributio. The left plot is elliptical i ature, which is what is expected if the two variables plotted are ideed idepedet. O the other had, the right plot shows a oticeable depedece betwee Y ad S: y icreases as s icreases, ad the coditioal variace of Y (as idicated by the scatter) also icreases as S icreases. ii. To get a little isight ito (c) of the Theorem, ote first that (-1) S 2 = 2, which is ideed a sum of squares, but of squares, ot -1. However, the radom variables beig squared are ot idepedet; the depedece arises from the relatioship Y= Y. Usig this relatioship, it is possible to show

9 that (-1) is ideed the sum of -1 idepedet, stadard ormal radom variables. Although the geeral proof is somewhat ivolved, the idea is fairly easy to see whe = 2: First, a little algebra shows that (for = 2) Y - Y = ad Y - Y =. Pluggig these ito the formula for S 2 (with = 2) the gives S 2 2 (-1) = 2 = (***) Sice Y 1 ad Y 2 are idepedet ad both are ormal, Y 1 - Y 2 is also ormal (by a theorem from probability). Sice Y 1 ad Y 2 have the same distributio, E(Y 1 - Y 2 ) = E(Y 1 ) - E(Y 2 ) = 0 Usig idepedece of Y 1 ad Y 2, we ca also calculate Var(Y 1 - Y 2 ) = Var(Y 1 ) + Var(Y 2 ) = 2σ Stadardizig Y 1 - Y 2 the shows that is stadard ormal, so S 2 2 equatio (***) shows that (-1) ~ χ 2 (1) whe = 2. Foototes 1. Radom is admittedly a little vague here. I sectio 2, iterpret it to mea simple radom sample with replacemet. (See also Footote 2). I sectio 3, iterpret radom to mea Fittig the coditios ad assumptios for the t- model. 2. Techically, the requiremets are that the radom variables Y 1, Y 2,, Y represetig the first, secod, etc. values i the sample are idepedet ad idetically distributed (abbreviated as iid), which meas they are idepedet ad have the same distributio (i.e., the same probability desity fuctio). 3. Cosider, for example, usig 0.011 as a approximatio of 0.01 whe estimatig 1/0.01. Although 0.011 differs from 0.01 by oly 0.001, whe we use the approximatio i the deomiator, we get 1/0.011 = 90. 90, which differs by more tha 9 from 1/0.01 = 100 a differece almost 3 orders of magitude greater tha the differece betwee 0.01 ad 0.001.