Case Study Normal ad t Distributios Bret Halo ad Bret Larget Departmet of Statistics Uiversity of Wiscosi Madiso October 11 13, 2011 Case Study Body temperature varies withi idividuals over time (it ca be higher whe oe is ill with a fever, or durig or after physical exertio). However, if we measure the body temperature of a sigle healthy perso whe at rest, these measuremets vary little from day to day, ad we ca associate with each perso a idividual restig body temperture. There is, however, variatio amog idividuals of restig body temperture. A sample of = 130 idividuals had a average restig body temperature of 98.25 degrees Fahreheit ad a stadard deviatio of 0.73 degrees Fahreheit. The ext slide shows a estimated desity plot from this sample. Normal 1 / 33 Normal Case Study Body Temperature 2 / 33 Desity Plot Normal Distributios Desity 0.6 0.5 0.4 0.3 0.2 0.1 0.0 The estimated desity has these features: it is bell-shaped; it is early symmetric. May (but ot all) biological variables have similar shapes. Oe reaso is a geeralized the cetral limit theorem: radom variables that are formed by addig may radom effects will be approximately ormally distributed. Importat for iferece, eve whe uderlyig distributios are ot ormal, the samplig distributio of the sample mea is approximately ormal. 96 97 98 99 100 101 Restig Body Temperature (F) Normal Case Study Body Temperature 3 / 33 Normal Case Study Body Temperature 4 / 33
Example: Populatio Example: Samplig Distributio A populatio that is skewed. Populatio Samplig distributio of the sample mea whe = 130. Samplig Distributio, =130 0.006 0.06 Desity 0.004 0.002 Desity 0.04 0.02 0.000 0.00 0 200 400 600 x 80 90 100 110 120 130 x Normal Case Study Body Temperature 5 / 33 Normal Case Study Body Temperature 6 / 33 Case Study: Questios The Big Picture Case Study How ca we use the sample data to estimate with cofidece the mea restig body temperture i a populatio? How would we test the ull hypothesis that the mea restig body temperture i the populatio is, i fact, equal to the well-kow 98.6 degrees Fahreheit? How robust are the methods of iferece to oormality i the uderlyig populatio? How large of a sample is eeded to esure that a cofidece iterval is o larger tha some specified amout? May iferece problems with a sigle quatitative, cotiuous variable may be modeled as a large populatio (bucket) of idividual umbers with a mea µ ad stadard deviatio σ. A radom sample of size has a sample mea x ad sample stadard deviatio s. Iferece about µ based o sample data assumes that the samplig distributio of x is approximately ormal with E( x) = µ ad SD( x) = σ/. To prepare to uderstad iferece methods for sigle samples of quatitative data, we eed to uderstad: the ormal ad related distributios; the samplig distributio of x. Normal Case Study Body Temperature 7 / 33 Normal Case Study Body Temperature 8 / 33
Cotiuous Distributios A cotiuous radom variable has possible values over a cotiuum. The total probability of oe is ot i discrete chuks at specific locatios, but rather is groud up like a very fie dust ad sprikled o the umber lie. We caot represet the distributio with a table of possible values ad the probability of each. Istead, we represet the distributio with a probability desity fuctio which measures the thickess of the probability dust. Probability is measured over itervals as the area uder the curve. A legal probability desity f : is ever egative (f (x) 0 for < x < ). has a total area uder the curve of oe ( f (x)dx = 1). The Stadard Normal Desity The stadard ormal desity is a symmetric, bell-shaped probability desity with equatio: φ(z) = 1 e z2 2, ( < z < ) 2π Desity 0.4 0.3 0.2 0.1 0.0 2 0 2 Possible Values Normal Cotiuous Radom Variables Desity 9 / 33 Normal Stadard Normal Distributio Desity 10 / 33 Momets Bechmarks The mea of the stadard ormal distributio is µ = 0. This poit is the ceter of the desity ad the poit where the desity is highest. The stadard deviatio of the stadard ormal distributio is σ = 1. Notice that the poits 1 ad 1, which are respectively oe stadard deviatio below ad above the mea, are at poits of iflectio of the ormal curve. (This is useful for roughly estimatig the stadard deviatio from a plotted desity or histogram.) The area betwee 1 ad 1 uder a stadard ormal curve is approximately 68%. The area betwee 2 ad 2 uder a stadard ormal curve is approximately 95%. More precisely, the area betwee 1.96 ad 1.96. = 0.9500, which is why we have used 1.96 for 95% cofidece itervals for proportios. Normal Stadard Normal Distributio Desity 11 / 33 Normal Stadard Normal Distributio Probability Calculatios 12 / 33
Stadard Normal Desity Geeral Areas Stadard Normal Desity Desity Area withi 1 = 0.68 Area withi 2 = 0.95 Area withi 3 = 0.997 There is o formula to calculate geeral areas uder the stadard ormal curve. (The itegral of the desity has o closed form solutio.) We prefer to use R to fid probabilities. You also eed to lear to use ormal tables for exams. 3 2 1 0 1 2 3 Possible Values Normal Stadard Normal Distributio Probability Calculatios 13 / 33 Normal Stadard Normal Distributio Probability Calculatios 14 / 33 R The fuctio porm() calculates probabilities uder the stadard ormal curve by fidig the area to the left. For example, the area to the left of 1.57 is > porm(-1.57) [1] 0.05820756 ad the area to the right of 2.12 is > 1 - porm(2.12) [1] 0.01700302 Normal Stadard Normal Distributio Probability Calculatios 15 / 33 Tables The table o pages 672 673 displays right tail probabilities for z = 0 to z = 4.09. A poit o the axis rouded to two decimal places a.bc correspods to a row for a.b ad a colum for c. The umber i the table for this row ad colum is the area to the right. Symmetry of the ormal curve ad the fact that the total area is oe are eeded. The area to the left of 1.57 is the area to the right of 1.57 which is 0.05821 i the table. The area to the right of 2.12 is 0.01711. Whe usig the table, it is best to draw a rough sketch of the curve ad shade i the desired area. This practice allows oe to approximate the correct probability ad catch simple errors. Fid the area betwee z = 1.64 ad z = 2.55 o the board. Normal Stadard Normal Distributio Probability Calculatios 16 / 33
R Tables The fuctio qorm() is the iverse of porm() ad fids a quatile, or locatio where a give area is to the right. For example, the 0.9 quatile of the stadard ormal curve is > qorm(0.9) [1] 1.281552 ad the umber z so that the area betwee z ad z is 0.99 is > qorm(0.995) [1] 2.575829 sice the area to the left of z ad to the right of z must each be (1 0.99)/2 = 0.005 ad 1 0.005 = 0.995. Draw a sketch! Fidig quatiles from the ormal table almost always requires some roud off error. To fid the umber z so that the area betwee z ad z is 0.99 requires fidig the probability 0.00500 i the middle of the table. We see z = 2.57 has a right tail area of 0.00508 ad z = 2.58 has a right ail area of 0.00494, so the value of z we seek is betwee 2.57 ad 2.58. For exam purposes, it is okay to pick the closest, here 2.57. Use the table to fid the 0.03 quatile as accurately as possible. Draw a sketch! Normal Stadard Normal Distributio Quatile Calculatios 17 / 33 Normal Stadard Normal Distributio Quatile Calculatios 18 / 33 Geeral Normal Desity Geeral Normal Desity Normal Desity The geeral ormal desity with mea µ ad stadard deviatio σ is a symmetric, bell-shaped probability desity with equatio: ( ) 2 f (x) = 1 e 1 x µ 2 σ, ( < x < ) 2πσ Sketches of geeral ormal curves have the same shape as stadard ormal curves, but have rescaled axes. Desity Area withi 1 SD = 0.68 Area withi 2 SD = 0.95 Area withi 3 SD = 0.997 µ 3σ µ 2σ µ σ µ µ + σ µ + 2σ µ + 3σ Possible Values Normal Geeral Norma Distributio Desity 19 / 33 Normal Geeral Norma Distributio Desity 20 / 33
All Normal Curves Have the Same Shape Normal Tail Probability All ormal curves have the same shape, ad are simply rescaled versios of the stadard ormal desity. Cosequetly, every area uder a geeral ormal curve correspods to a area uder the stadard ormal curve. The key stadardizatio formula is Solvig for x yields z = x µ σ x = µ + zσ which says algebraically that x is z stadard deviatios above the mea. Example If X N(100, 2), fid P(X > 97.5). Solutio: P(X > 97.5) = ( X 100 P > 2 = P(Z > 1.25) = 1 P(Z > 1.25) = 0.8944 ) 97.5 100 2 Normal Geeral Norma Distributio Probability Calculatios 21 / 33 Normal Geeral Norma Distributio Probability Calculatios 22 / 33 Normal Quatiles Example If X N(100, 2), fid the cutoff values for the middle 70% of the distributio. Solutio: The cutoff poits will be the 0.15 ad 0.85 quatiles. From the table, 1.03 < z < 1.04 ad z = 1.04 is closest. Thus, the cutoff poits are the mea plus or mius 1.04 stadard deviatios. 100 1.04(2) = 97.92, 100 + 1.04(2) = 102.08 I R, a sigle call to qorm() fids these cutoffs. > qorm(c(0.15, 0.85), 100, 2) [1] 97.92713 102.07287 Case Study Example I a populatio, suppose that: the mea restig body temperature is 98.25 degrees Fahreheit; the stadard deviatio is 0.73 degrees Fahreheit; restig body temperatures are ormally distributed. Let X be the restig body temperature of a radomly chose idividual. Fid: 1 P(X < 98), the proportio of idividuals with temperature less tha 98. 2 P(98 < X < 100), the proportio of idividuals with temperature betwee 98 ad 100. 3 The 0.90 quatile of the distributio. 4 The cutoff values for the middle 50% of the distributio. Normal Geeral Norma Distributio Quatile Calculatios 23 / 33 Normal Geeral Norma Distributio Applicatio 24 / 33
Aswers (with R, table will be close) The χ 2 Distributio 1 0.366 2 0.6257 3 99.19 4 97.76 ad 98.74 The χ 2 distributio is used to fid p-values for the test of idepedece ad the G-test we saw earlier for cotigecy tables. Now that the ormal distributio has bee itroduced, we ca better motivate the χ 2 distributio. Defiitio If Z 1,..., Z k are idepedet stadard ormal radom variables, the X 2 = Z 2 1 + + Z 2 k has a χ 2 distributio with k degrees of freedom. Normal Geeral Norma Distributio Applicatio 25 / 33 Normal Other Distributios Chi-square Distributios 26 / 33 The χ 2 Distributio The fuctios pchisq() ad qchisq() fid probabilities ad quatiles, respectively, from the χ 2 distributios. The table o pages 669 671 has the same iformatio for limited umbers of quatiles for each χ 2 distributio with 100 or fewer degrees of freedom. Ulike the ormal distributios where all ormal curves are just rescaligs of the stadard ormal curve, each χ 2 distributio is differet. t Distributio Defiitio If Z is a stadard ormal radom variable ad if X 2 is a χ 2 radom variable with k degrees of freedom, the T = Z X 2 /k has a t distributio with k degrees of freedom. t desities are symmetric, bell-shaped, ad cetered at 0 just like the stadard ormal desity, but are more spread out (higher variace). As the degrees of freedom icreases, the t distributios coverge to the stadard ormal. t distributios will be useful for statistical iferece for oe or more populatios of quatitative variables. Normal Other Distributios Chi-square Distributios 27 / 33 Normal Other Distributios t Distributios 28 / 33
The Cetral Limit Theorem Mea of the Samplig Distributio of X The Cetral Limit Theorem If X 1,..., X are a idepedet sample from a commo distributio F with mea E(X i ) = µ ad variace Var(X i ) = σ 2, (which eed ot be ormal), the X = i=1 X i is approximately ormal with E( X ) = µ ad Var( X ) = σ2 size is sufficietly large. if the sample The cetral limit theorem (ad its cousis) justifies almost all iferece methods the rest of the semester. The mea of the samplig distributio of X is foud usig the liearity properties of expectatio. ( i=1 E( X ) = E X ) i ) = E(X 1 + + X ) ) (E(X1 = ) + + E(X ) ) ) = µ = µ Normal The Cetral Limit Theorem 29 / 33 Normal The Cetral Limit Theorem 30 / 33 Variace of the Samplig Distributio of X The variace of the samplig distributio of X is foud usig the properties of variaces of sums. Also, SE( X ) = σ. ( i=1 Var( X ) = Var X ) i ) 2Var(X1 = + + X ) ) 2(Var(X1 = ) + + Var(X ) ) ) 2σ 2 = = σ2 Case Study Example I a populatio, suppose that: the mea restig body temperature is 98.25 degrees Fahreheit; the stadard deviatio is 0.73 degrees Fahreheit; restig body temperatures are ormally distributed. Let X 1,..., X 40 be the restig body temperatures of 40 radomly chose idividuals from the populatio. Fid: 1 P( X < 98), the probability that the sample mea is less tha 98. 2 P(98 < X < 100), the probability that the sample mea is betwee 98 ad 100. 3 the 0.90 quatile of the samplig distributio of X. 4 The cutoff values for the middle 50% of the samplig distributio of X. Normal The Cetral Limit Theorem 31 / 33 Normal The Cetral Limit Theorem 32 / 33
Aswers (with R, table will be close) 1 0.0152 2 0.9848 3 98.4 4 98.17 ad 98.33 Normal The Cetral Limit Theorem 33 / 33