Chapter 7 - Samplig Distributios 1 Itroductio What is statistics? It cosist of three major areas: Data Collectio: samplig plas ad experimetal desigs Descriptive Statistics: umerical ad graphical summaries of the data collected from a sample Iferetial Statistics: estimatio, cofidece itervals ad hypothesis testig of parameters of iterest Statistical procedures are part (steps -5 below) of the Scietific Method first espoused by Sir Fracis Baco (1561-166), who wrote to lear the secrets of ature ivolves collectig data ad carryig out experimets. The moder methodology: 1. Observe some pheomeo. State a hypothesis explaiig the pheomeo 3. Collect data 4. Test: Does the data support the hypothesis? 5. Coclusio. If the test fails, go back to step. If you ecouter a scietific claim that you disagree with, scrutiize the steps of the scietific method used. Statistics do t lie, but liars do statistics. - Mark Twai. What is mathematical statistics?: The study of the theoretical foudatio of statistics. What is a statistic? Let X 1, X,..., X be a set of observable radom variables (such as a radom sample of idividuals from a populatio of iterest). A statistic T is a fuctio applied to X 1, X,..., X. POPULATION vs. SAMPLE: T = T (X 1, X,..., X ) Populatio: The etire group of idividuals (subjects or uits), that ca be either existet or coceptual, that we wat iformatio about. Sample: A part of the populatio from which data is collected. 1
PARAMETER vs. STATISTIC: Parameter: A umerical value calculated from all idividuals i the populatio. { x xp (x) if x is discrete Populatio mea: µ = xf(x)dx if x is cotiuous { Populatio variace: σ x = (x µ) P (x) if x is discrete (x µ) f(x)dx if x is cotiuous Populatio proportio: p is the true proportio of 1 s i the populatio. Populatio miimum: m = mi x x. Populatio media: φ.5 is the (ot ecessarily uique) value such that P (X φ.5 ).5 ad P (X φ.5 ).5. Statistic: A umerical value calculated from a sample X 1,..., X. Sample mea: X = T (X 1, X,..., X ) = 1 i=1 X i Sample variace: S = T (X 1, X,..., X ) = 1 1 i=1 (X i X) Sample proportio: ˆp = T (X 1, X,..., X ) = umber of 1 s is the proportio of 1 s i the sample First order statistic: X (1) = T (X 1, X,..., X ) = mi(x 1, X,..., X ) Sample media: ˆφ.5 T (X 1, X,..., X ) = { the middle value if is odd the average of the two middle values if is eve. Samplig Distributios The value of a statistic varies from sample to sample. I other words, differet samples will result i differet values of a statistic. Therefore, a statistic is a radom variable with a distributio! Samplig Distributio: The distributio of statistic values from all possible samples of size. Brute force way to costruct a samplig distributio: Take all possible samples of size from the populatio. Compute the value of the statistic for each sample. Display the distributio of statistic values as a table, graph, or equatio.
.1 Samplig Distributio of X Oe commo populatio parameter of iterest is the populatio mea µ. I iferetial statistics, it is commo to use the statistic X to estimate µ. Thus, the samplig distributio of X is of iterest. Mea ad Variace For ay sample size ad a SRS X 1, X,..., X from ay populatio distributio with mea µ x ad variace σ x: E(X) = µ x = µ x ad E( i=1 X i) = µ x Var(X) = σ x = σ x/ ad Var( i=1 X i) = σ x This result was proved i Example 5.7 usig Theorem 5.1: Let a i for i = 1,,..., k be costats ad let X i for i = 1,,..., k be radom variables. The ( k ) k E a i X i = a i E(X i ) (idepedece ot required) ad Var i=1 i=1 ( k i=1 a ix i ) = k i=1 a i Var(X i ) if X 1, X,..., X k are mutually idepedet. Samplig Distributio whe the data are ormal For ay sample size ad a SRS X 1, X,..., X from a ormal populatio distributio N(µ x, σ x) (Theorem 7.1): X N(µ x, σ x/) i=1 X i N(µ x, σ x) Examples: Suppose that adult male cholesterol levels are distributed as N(10mg/dL, (37mg/dL) ). 1. Give a iterval cetered at the mea µ which captures the middle 95% of all cholesterol values.. Give the samplig distributio of X, the sample mea of cholesterol values take from SRSs of size = 10. 3. Give a iterval cetered at the mea µ which captures the middle 95% of all sample mea cholesterol values take from SRSs of size = 10. 3
Samplig Distributio for large sample sizes For a LARGE sample size ad a SRS X 1, X,..., X from ay populatio distributio with mea µ x ad variace σ x <, the approximate samplig distributios are: ( ) X N µ x, σ x ad X i N ( µ x, σx). This last result follows from the celebrated Cetral Limit Theorem, stated i your book as Theorem 7.4: Let X 1, X,..., X be a SRS from a distributio with mea µ x ad variace σ x <. The the distributio of coverges to N(0, 1) as. We will prove this theorem later. Importat Examples: i=1 U = X µ x σ x / 1. Beroulli { trials. 1 if with probability p = Let X = 0 if with probability(1 p) = The X Ber(p) = Bi( = 1, p). (a) Draw a picture of the pdf of X. (b) Fid E(X) ad V ar(x). (c) Suppose a SRS X 1, X,..., X 40 was collected. Give the approximate samplig distributio of X (ormally deoted by ˆp = X, which idicates that X is a sample proportio).. Normal approximatio to the Biomial (sectio 7.5) I the previous example we cosidered the rv X Ber(p) = Bi( = 1, p). Suppose that a SRS X 1, X,..., X has bee collected with > 1. 4
(a) Give the distributio of Y = i X i, so that Y is the umber of successes out of trials (which is a discrete distributio you leared about i chapter 3). (b) Draw a picture of the pdf of Y = i X i. (c) Give E(Y ) ad V ar(y ). (d) I the Example #1c the Cetral Limit Theorem showed that for ay sample size, whe X Ber(p), the ˆp = X N(, ). (e) I additio to meas X, the Cetral Limit Theorem also gives the approximate samplig distributio of a sum X i. Use the Cetral Limit Theorem to give the approximate samplig distributio of Y = i X i. (f) If the true proportio of supporters of healthcare reform i the Motaa populatio is p =.53, the out of a SRS of Motaas of size = 1000, whats the probability that less the 500 will pledge support?. Samplig Distributio of S Oe commo populatio parameter of iterest is the populatio variace σ. I iferetial statistics, it is commo to use the statistic S to estimate σ. Thus, the samplig distributio of S is of iterest. χ distributio: The sum of squares of idepedet stadard ormal variables is distributed as a χ radom radom variable. More formally (Theorem 7.): 5
If Z 1,..., Z ν are idepedet ad distributed as N(0, 1), the ν i=1 Z i χ (ν). χ (ν) is called the chi-square distributio with ν degrees of freedom. For ay sample size ad a SRS X 1, X,..., X from a ormal distributio N(µ x, σx), ( ) Xi µ x χ (). i=1 σ x Use Table 6 o p. 850 of the textbook for probability calculatios. Mea ad Variace If C χ (ν), the E(C) = ν Var(C) = ν. For ay sample size > 1 ad a SRS X 1, X,..., X from ay populatio distributio with mea µ x ad variace σ x, E(S ) = σ x Var(S ) = σ4 x 1. Samplig distributio whe the data are ormal For ay sample size > 1 ad a SRS X 1, X,..., X from a ormal distributio N(µ x, σx) (Theorem 7.3): ( 1)S χ ( 1).3 Samplig Distributio of X µ S/ σ x I iferetial statistics, the test statistic X µ S/ is ofte used to determie how may stadard errors (s/ ) the sample mea X is from a hypothesized value of µ. Thus, the samplig distributio of X µ S/ is of iterest. t distributio (Defiitio 7.): If Z N(0, 1), W χ (ν), ad Z ad W are idepedet, the: T = Z t(ν). W/ν t(ν) is called the t distributio with ν degrees of freedom. 6
Mea ad Variace If T t(ν), the E(T ) = 0 for ν > 1 Var(T ) = ν for ν > ν Samplig distributio whe the data are ormal For ay sample size > 1 ad a SRS X 1, X,..., X from N(µ, σ ), the X ad S are idepedet Ad ow by Theorems 7.1 ad 7.3 ad Defiitio 7. T = (X µ)/σ ( 1)S /σ 1 = X µ S/ t( 1). Samplig Distributio for large sample sizes For a LARGE sample size ad a SRS X 1, X,..., X from ay populatio distributio with mea µ x ad variace σ x < : T = X µ x S/ t( 1). Some useful facts: The pdf of T is give by f(t) = ν+1 Γ( ) ( ) (ν+1)/ 1 Γ( ν ) 1 + t νπ ν The t distributios are symmetric about 0 ad is bell-shaped like the ormal N(0, 1) distributio but with thicker tails. As ν, the t(ν) distributio approaches the stadard ormal distributio. Use Table 5 o page 849 for probability calculatios. Examples: Suppose that adult male cholesterol levels are distributed as N(10mg/dL, σ ). 1. Give the samplig distributio of X µ S/, where the statistics X ad S are calculated from a SRS of size = 10. 7
. If S = 36.5, give a iterval cetered at the mea µ which captures the middle 95% of all sample mea cholesterol values take from SRSs of size = 10..4 Samplig Distributio of S 1/σ 1 S /σ I iferetial statistics, it is ofte of iterest to compare the variaces σ1 ad σ from two populatios, ad determie if they are differet. Based o two SRSs, oe of size 1 with sample variace S1 ad the other of size with sample variace S, the statistic S 1 /σ 1 is ofte used. S /σ Thus, the samplig distributio of S 1 /σ 1 is of iterest. S /σ F distributio (Defiitio 7.3): If W 1 χ (ν 1 ) ad W χ (ν ) are idepedet, the: F = W 1/ν 1 W /ν F (ν 1, ν ). F (ν 1, ν ) is called the F distributio with ν 1 umerator degrees of freedom ad ν deomiator degrees of freedom. Mea ad Variace If F F (ν 1, ν ), the E(F ) = ν ν for ν > Var(F ) = ν (ν 1+ν ) ν 1 (ν ) (ν 4) for ν > 4 Samplig Distributio whe the data are ormal If X 1, X,..., X 1 are a SRS from N(µ 1, σ 1) ad if Y 1, X,..., X are a idepedet SRS from N(µ, σ ), the W 1 = ( 1 1)S1 σ1 are idepedet, ad so χ ( 1 1) ad W = ( 1)S σ χ ( 1) F = W 1/( 1 1) W /( 1) = S 1/σ 1 S /σ F ( 1 1, 1). Some useful facts: 8
If X F (ν 1, ν ), the the pdf of X is f(x) = Γ ( ν 1 +ν ) Γ ( ) ( ν 1 Γ ν ) ( ν1 Use Table 7 o p. 85 for probability calculatios. ν ) ν1 / ( x (ν 1/) 1 1 + ν ) (ν1 +ν )/ 1 x. ν Example: Is the variace of female reactio times differet tha the variace of male reactio times? Jaso Paulak at the Uiversity of Ciciati ra a web based reactio experimet to aswer this questio. 1 = 398 females participated, with X 1 = 517 ms ad S 1 = 899 ms. = 469 males participated, with X = 383. ms ad S = 335.7 ms. Give the samplig distributio of F = S 1 /σ 1. S /σ If the two populatio variaces are ideed the same, σ 1 = σ, the what is the probability of observig a ratio of sample variaces that we did, or larger? 9
3 Proof of the Cetral Limit Theorem Cetral Limit Theorem Let X 1, X,..., X be a SRS from a distributio with mea µ x ad variace σ x <. The lim U X µ x = lim σ x / = U N(0, 1). Proof (sectio 7.4): The proof relies o momet geeratig fuctios (mgf) from sectio 3.9 of your textbook. We saw that every rv, as well as havig a uique pmf or pdf, also has a uique mgf (Theorem 7.5), writte as M(t). This otatio for the mgf reiforces the fact that it is a fuctio of a real umber t i some eighborhood of t = 0. The mgf for a cotiuous rv Y is defied by M(t) = E(e ty ) = e ty f(y)dy, where f(y) is the pdf of Y. To prove the Cetral Limit Theorem: 1. Use Taylor series to fid the mgf of Z i = X i µ σ.. Fid the mgf of U = X µx σ x /. 3. Show that as, the mgf of U coverges to e t /. 4. Recall that the mgf of a stadard ormal rv is e t /. Thus, by Theorem 7.5, the distributio of U = lim U is a stadard ormal! Here we go: 1. Let Z i = X i µ. This is where we eed the fiite variace assumptio: if σ is ifiite, the σ Z i is ot well defied! (a) Show that EZ = 0, V ar(z) = E(Z ) = 1. Hit: This is easy sice Z is a liear fuctio of X. 10
(b) Show that M Z (0) = 1, d dt M Z(0) = 0, ad d dt M Z (0) = 1. (c) Use Taylor s Theorem (ad the results from (b)) to expad M Z (t) about t = 0 to show that M Z ( t ) = 1 + (t/ ) + R( t! ), where R( t ) is a remaider of cubic terms t of ad higher. (d) Show that the remaider lim R( t ) (t/ ) = lim R( t ) = 0. 11
. By defiitio U = X µ x σ x / = ( i X ) i µ x = 1 Z i, σ x which shows that the mgf of U whe X 1,..., X is a simple radom sample is M U (t) = Π i=1m Zi ( t ( ) = M Z ( t ) ) (by Theorem 6. ad Exercise 3.158). 3. Show that lim M U (t) = e t. Hit: Substitute i the Taylor series approximatio for M Z ( t ); use the fact that lim ( 1 + a ) = e lim a ; use the fact that lim R( t ) (t/ ) = lim R( t ) = 0. i 4. Recall that if U N(0, 1), the M U (t) = e t / from sectio 3.9 of your textbook. To prove this, multiply the two expoetials i the itegral, the complete the square to show that E(e tu ) = e t 1 π e 1 (u t) 1 du. Now observe that π e 1 (u t) is the pdf of a N(t, 1) rv. 1