Descriptive Statistics We leared to describe data sets graphically. We ca also describe a data set umerically. Measures of Locatio Defiitio The sample mea is the arithmetic average of values. We deote the sample mea by y = 1 Σ y i i=1 = y 1 + y 2 + + y Example Exercise 2.15 Y = weight gai (lb) of lambs o a special diet for = 6 lambs. Compute the sample mea for the resultig data set. 11 13 19 2 10 1 Figure 2.27 illusrates that the sample mea ca be viewed as a balacig poit i the data The sample media is the value of the data earest their middle. We call the media Q 2. To fid the media of a data set Put the data i order The media is o The middle value if is odd o The average of the two middle values if is eve Example Exercise 2.17 lambs o a special diet agai The ordered values are y (1) =1, y (2) =2, y (3) =10, y (4) =11, y (5) =13, y (6) =19 Fid Q 2. Page 1
Notice the ew otatio. A lower case letter deotig the outcome of a radom variable with parethesis i the subscript y (i) deotes the i th observatio i order from smallest to largest. i=1 deotes the smallest observatio (miimum) ad i= deotes the largest (maximum). Questio: Which measure of locatio (or measure of ceter) do we report? Mea or media? To aswer this, explore what happes o certai data sets to the relatio betwee the mea ad media. Cosider two data sets. Fid the mea ad the media for both. 1 2 3 4 5 1 2 3 4 20 What happeed ad why? So the mea ad media ca idicate skewess. Data skewed right, mea media Data skewed left, mea media Data symmetric, mea media Measures of Dispersio The quartiles of a data set are poits that separate the data ito quarters (or fourths). Q 1 separates the lower quarter (25%) from the upper three quarters (75%) Q 2 separates the lower two quarters (50%)from the upper two quarters (50%) Q 3 separates the lower three quarters (75%) from the upper quarter (25%) Notice the media is the secod quartile. Oe way to report the dispersio (or spread) of a data set is to report the iter quartile rage. Defiitio The iter quartile rage is IQR = Q 3 Q 1 Defiitio The sample rage is y () y (1) = max mi Defiitio The five umber summary is {y (1), Q 1, Q 2, Q 3, y () } Descriptive Statistics Page 2
Example (from Example 2.22) I a commo biology experimet, radishes were grow i total darkess ad the legth (mm) of each radish shoot was measured at the ed of three days. Fid the five umber summary for these data. 8 10 11 15 15 15 20 20 22 25 29 30 33 35 37 A boxplot (a.k.a. box ad whisker plot) is a graphical display of the five umber summary. The box spas the quartiles ad the whiskers exted from the quartiles to the mi / max. Boxplots are ofte used for comparative purposes as i figure 2.32. Radish Legth at Three Days Grow Uder Three Coditios Descriptive Statistics Page 3
Defiitio A outlier is a observatio that differs dramatically from the rest of the data. Formally y i is a outlier if Example 2.25 Y = radish growth i full light coditio. The data are 3 5 5 7 7 8 9 10 10 10 10 14 20 21 Fid ay outliers. Defiitio The sample variace is S 2 = 1-1 Σi =1 ( Y i - Y ) 2 Defiitio The sample stadard deviatio is S = S 2 Example 2.28 I a experimet o chrysathemums, a botaist measured the stem elogatio (mm i 7 days) of five plats grow o the same greehouse bech 76 72 65 70 82 Fid the sample stadard deviatio. Descriptive Statistics Page 4
Empirical Rule For uimodal, ot too skewed data sets, the empirical rule states the followig: ~ 68% of the data lie betwee Y s ad Y s ~ 95% of the data lie betwee Y 2s ad Y 2s > 99% of the data lie betwee Y 3s ad Y 3s Example 2.36 Suppose Y = pulse rate after 5 miutes of exercise. For = 28 subjects, we fid Y = 98 (beats/mi) ad S = 13.4 (beats/mi). Thus, e.g., from the empirical rule we expect ~95% of the data to lie betwee 98 (2)(13.4) = 98 26.8 = 71.2 beats/mi ad 98 + (2)(13.4) = 98 + 26.8 = 124.8 beats/mi Populatio Defiitio The populatio is the larger group of subjects (orgaisms, plots, regios, ecosystems, etc.) o which we wish to draw ifereces Defiitio A parameter is a quatified populatio characteristic. Defiitio A statistic is a sample quatity used to estimate a populatio parameter Defiitio The populatio proportio is the proportio of subjects exhibitig a particular trait or outcome i the populatio. (It geeralizes to the probability that ay populatio elemet will exhibit the trait.) NOTATION: p Defiitio The SAMPLE PROPORTION is the umber of sample elemets exhibitig the trait, divided by the sample size,. NOTATION: p Descriptive Statistics Page 5