5/8/013 C H 3A P T E R Outlie 3 1 Measures of Cetral Tedecy 3 Measures of Variatio 3 3 3 Measuresof Positio 3 4 Exploratory Data Aalysis Copyright 013 The McGraw Hill Compaies, Ic. C H 3A P T E R Objectives 1 Summarize data, usig measures of cetral tedecy, such as the mea, media, mode, ad midrage. Describe data, usig measures of variatio, such as the rage, variace, ad stadard deviatio. 3 Idetify the positio of a data value i a data set, usig various measures of positio, such as percetiles, deciles, ad quartiles. C H 3A P T E R Objectives 4 Use the techiques of exploratory data aalysis, icludig boxplots ad five umber summaries, to discover various aspects of data. Itroductio Traditioal Statistics Average Variatio Positio 3.1 Measures of Cetral Tedecy A statistic is a characteristic or measure obtaied by usig the data values from a sample. A parameter is a characteristic or measure obtaied by usig all the data values for a specific populatio. 5 6 1
5/8/013 Measures of Cetral Tedecy Measures of Cetral Tedecy Geeral Roudig Rule The basic roudig rule is that roudig should ot be doe util the fial aswer is calculated. Use of paretheses o calculators or use of spreadsheets help to avoid early roudig error. What Do We Mea By Average? Mea Media Mode Midrage Weighted Mea 7 8 Measures of Cetral Tedecy: Mea The mea is the quotiet of the sum of the values ad the total umber of values. The symbol X is used for sample mea. X X X X X = = 1+ + X 3 + + For a populatio, the Greek letter µ (mu) is used for the mea. X1+ X X + X3 + + X N µ = = N N Sectio 3-1 Example 3-1 Page #114 9 10 Example 3-1: Days Off per Year The data represet the umber of days off per year for a sample of idividuals selected from ie differet coutries. Fid the mea. 0, 6, 40, 36, 3, 4, 35, 4, 30 X1+ X + X3+ + X X X = = 0 + 6 + 40 + 36 + 3+ 4 + 35 + 4 + 30 76 X = = = 30.7 9 9 Roudig Rule: Mea The mea should be rouded to oe more decimal place tha occurs i the raw data. The mea, i most cases, is ot a actual data value. The mea umber of days off is 30.7 years. 11 1
5/8/013 Measures of Cetral Tedecy: Mea for Grouped Data The mea for grouped data is calculated by multiplyig the frequecies ad midpoits of the classes. f X X = m Sectio 3-1 Example 3-3 Page #115 13 14 Example 3-3: Miles Ru Below is a frequecy distributio of miles ru per week. Fid the mea. Class Boudaries Frequecy 55 5.5-10.5 1 10.5-15.5 15.5-0.5 3 0.5-5.5 5 5.5-30.5 4 30.5-35.5 3 35.5-40.5 Σf = 0 Example 3-3: Miles Ru Class Frequecy, f Midpoit, X m 5.5-10.5 10.5-15.5 15.5-0.5 0.5-5.5 5.5-30.5 30.5-35.5 35.5-40.5 1 3 5 4 3 Σf = 0 f X 490 X = m = = 4.5 miles 0 8 13 18 3 8 33 38 f X m 8 6 54 115 11 99 76 Σ f X m = 490 15 16 Measures of Cetral Tedecy: Media The media is the midpoit of the data array. The symbol for the media is MD. The media will be oe of the data values if there is a odd umber of values. The media will be the average of two data values if there is a eve umber of values. Sectio 3-1 Example 3-4 Page #118 17 18 3
5/8/013 Example 3-4: Hotel Rooms The umber of rooms i the seve hotels i dowtow Pittsburgh is 713, 300, 618, 595, 311, 401, ad 9. Fid the media. Sort i ascedig order. 9, 300, 311, 401, 595, 618, 713 Select the middle value. MD = 401 Sectio 3-1 Example 3-6 Page #118 The media is 401 rooms. 19 0 Example 3-6: Toradoes i the U.S. The umber of toradoes that have occurred i the Uited States over a 8- year period follows. Fid the media. 684, 764, 656, 70, 856, 1133, 113, 1303 Fid the average of the two middle values. 656, 684, 70, 764, 856, 113, 1133, 1303 764 + 856 160 MD = = = 810 The media umber of toradoes is 810. Measures of Cetral Tedecy: Mode The mode is the value that occurs most ofte i a data set. It is sometimes said to be the most typical case. There may be o mode, oe mode (uimodal), two modes (bimodal), or may modes (multimodal). 1 Example 3-9: NFL Sigig Bouses Fid the mode of the sigig bouses of eight NFL players for a specific year. The bouses i millios of dollars are 18.0, 14.0, 34.5, 10, 11.3, 10, 1.4, 10 Sectio 3-1 Example 3-9 Page #119 You may fid it easier to sort first. 10, 10, 10, 11.3, 1.4, 14.0, 18.0, 34.5 Select the value that occurs the most. The mode is 10 millio dollars. 3 4 4
5/8/013 Example 3-10: Bak Braches Fid the mode for the umber of braches that six baks have. 401, 344, 09, 01, 7, 353 Sectio 3-1 Example 3-10 Page #10 Sice each value occurs oly oce, there is o mode. Note: Do ot say that the mode is zero. That would be icorrect, because i some data, such as temperature, zero ca be a actual value. 5 6 Sectio 3-1 Example 3-11 Page #10 Example 3-11: Licesed Nuclear Reactors The data show the umber of licesed uclear reactors i the Uited States for a recet 15-year period. Fid the mode. 104 104 104 104 104 107 109 109 109 110 109 111 11 111 109 104 ad 109 both occur the most. The data set is said to be bimodal. The modes are 104 ad 109. 7 8 Example 3-1: Miles Ru per Week Fid the modal class for the frequecy distributio of miles that 0 ruers ra i oe week. Sectio 3-1 Example 3-1 Page #10 Class Frequecy 5.5 10.5 1 10.5 15.5 15.55 0.5 3 0.5 5.5 5 5.5 30.5 4 30.5 35.5 3 35.5 40.5 The modal class is 0.5 5.5. The mode, the midpoit of the modal class, is 3 miles per week. 9 30 5
5/8/013 Measures of Cetral Tedecy: Midrage The midrage is the average of the lowest ad highest values i a data set. MR = Lowest + Highest Sectio 3-1 Example 3-15 Page #1 31 3 Example 3-15: Water-Lie Breaks I the last two witer seasos, the city of Browsville, Miesota, reported these umbers of water-lie breaks per moth. Fid the midrage., 3, 6, 8, 4, 1 1+ 8 9 MR = = = 4.5 The midrage is 4.5. Measures of Cetral Tedecy: Weighted Mea Fid the weighted mea of a variable by multiplyig each value by its correspodig weight ad dividig the sum of fthe products by the sum of fthe weights. wx 1 1+ wx wx + + wx X = = w + w + + w w 1 33 34 Example 3-17: Grade Poit Average A studet received the followig grades. Fid the correspodig GPA. Course Credits, w Grade, X Eglish Compositio 3 A (4 poits) Sectio 3-1 Example 3-17 Page #13 Itroductio to Psychology 3 C ( poits) Biology 4 B (3 poits) Physical Educatio D (1 poit) wx X = w 34 + 3 + 43 + 1 3 = = =.7 3+ 3+ 4+ 1 The grade poit average is.7. 35 36 6
5/8/013 Properties of the Mea Uses all data values. Varies less tha the media or mode Used i computig other statistics, such as the variace Uique, usually ot oe of the data values Caot be used with ope-eded classes Affected by extremely high or low values, called outliers Properties of the Media Gives the midpoit Used whe it is ecessary to fid out whether the data values fall ito the upper half or lower half of the distributio. Ca be used for a ope-eded distributio. Affected less tha the mea by extremely high or extremely low values. 37 38 Properties of the Mode Used whe the most typical case is desired Easiest average to compute Ca be used with omial data Not always uique or may ot exist Properties of the Midrage Easy to compute. Gives the midpoit. Affected by extremely high or low values i a data set 39 40 Distributios 3- Measures of Variatio How Ca We Measure Variability? Rage Variace Stadard Deviatio Coefficiet of Variatio Chebyshev s Theorem Empirical Rule (Normal) 41 4 7
5/8/013 Rage The rage is the differece betwee the highest ad lowest values i a data set. R = Highest Lowest Sectio 3- Example 3-18/19 Page #131 43 44 Example 3-18/19: Outdoor Pait Two experimetal brads of outdoor pait are tested to see how log each will last before fadig. Six cas of each brad costitute a small populatio. The results (i moths) are show. Fid the mea ad rage of each group. Brad A Brad B 10 35 60 45 50 30 30 35 40 40 0 5 Example 3-18: Outdoor Pait Brad A Brad B 10 35 60 45 50 30 30 35 40 40 0 5 X 10 µ = = = 35 Brad A: N 6 R = 60 10 = 50 X 10 µ = = = 35 Brad B: N 6 R = 45 5 = 0 The average for both brads is the same, but the rage for Brad A is much greater tha the rage for Brad B. Which brad would you buy? 45 46 Variace & Stadard Deviatio The variace is the average of the squares of the distace each value is from the mea. The stadard deviatio is the square root of the variace. The stadard deviatio is a measure of how spread out your data are. Uses of the Variace ad Stadard Deviatio To determie the spread of the data. To determie the cosistecy of a variable. To determie the umber of data values that fall withi a specified iterval i a distributio (Chebyshev s Theorem). Used i iferetial statistics. 47 48 8
5/8/013 Variace & Stadard Deviatio (Populatio Theoretical Model) The populatio variace is X ( µ ) σ = N The populatio stadard deviatio is ( X µ ) σ = N Sectio 3- Example 3-1 Page #133 49 50 Example 3-1: Outdoor Pait Fid the variace ad stadard deviatio for the data set for Brad A pait. 10, 60, 50, 30, 40, 0 Moths, X µ X µ (X µ) 10 35 5 65 60 35 5 65 50 35 15 5 30 35 5 5 40 35 5 5 0 35 15 5 1750 σ = 1750 = 6 = 91.7 σ = = ( X µ ) 1750 6 17.1 Variace & Stadard Deviatio (Sample Theoretical Model) The sample variace is X X ( ) s = 1 The sample stadard deviatio is s = ( X X) 1 51 5 Variace & Stadard Deviatio (Sample Computatioal Model) Variace & Stadard Deviatio (Sample Computatioal Model) Is mathematically equivalet to the theoretical formula. Saves time whe calculatig by had Does ot use the mea Is more accurate whe the mea has bee rouded. The sample variace is s = X ( X) ( 1) The sample stadard deviatio is s = s 53 54 9
5/8/013 Sectio 3- Example 3-3 Page #137 Example 3-3: Europea Auto Sales Fid the variace ad stadard deviatio for the amout of Europea auto sales for a sample of 6 years. The data are i millios of dollars. 11., 11.9, 1.0, 1.8, 13.4, 14.3 X X X X ( ) 11. 11.9 1.0 1.8 13.4 14.3 75.6 15.44 141.61 144.00 163.84 179.56 04.49 958.94 s = ( 1) ( ) ( ) 65 ( ) 6958.94 75.6 s = s ( 6958.94 75.6 ) /65 ( ) = s = 1.8 s = 1.13 55 56 Coefficiet of Variatio The coefficiet of variatio is the stadard deviatio divided by the mea, expressed as a percetage. s CVAR = 100% X Use CVAR to compare stadard deviatios whe the uits are differet. Sectio 3- Example 3-5 Page #140 57 58 Example 3-5: Sales of Automobiles The mea of the umber of sales of cars over a 3-moth period is 87, ad the stadard deviatio is 5. The mea of the commissios is $55, ad the stadard deviatio is $773. Compare the variatios of the two. 5 CVar = 100% = 5.7% Sales 87 773 CVar = 100% = 14.8% Commissios 55 Commissios are more variable tha sales. Rage Rule of Thumb The Rage Rule of Thumb approximates the stadard deviatio as Rage s 4 whe the distributio is uimodal ad approximately symmetric. 59 60 10
5/8/013 Rage Rule of Thumb Use X s to approximate the lowest value ad X + s to approximate the highest value i a data set. Example: X = 10, Rage = 1 1 LOW 10 s = 3 ( 3) = 4 4 HIGH 10 + 3 = 16 ( ) Chebyshev s Theorem The proportio of values from ay data set that fall withi k stadard deviatios of the mea will be at least 1 1/k, where k is a umber greater tha 1 (k is ot ecessarily a iteger). # of stadard deviatios, k Miimum Proportio withi k stadard deviatios Miimum Percetage withi k stadard deviatios 1 1/4 = 3/4 75% 3 1 1/9 = 8/9 88.89% 4 1 1/16 = 15/16 93.75% 61 6 Chebyshev s Theorem Sectio 3- Example 3-7 Page #143 63 64 Example 3-7: Prices of Homes The mea price of houses i a certai eighborhood is $50,000, ad the stadard deviatio is $10,000. Fid the price rage for which at least 75% of the houses will sell. Chebyshev s s Theorem states that at least 75% of a data set will fall withi stadard deviatios of the mea. 50,000 (10,000) = 30,000 50,000 + (10,000) = 70,000 At least 75% of all homes sold i the area will have a price rage from $30,000 ad $70,000. Sectio 3- Example 3-8 Page #143 65 66 11
5/8/013 Example 3-8: Travel Allowaces A survey of local compaies foud that the mea amout of travel allowace for executives was $0.5 per mile. The stadard deviatio was 0.0. Usig Chebyshev s theorem, fid the miimum percetage of the data values that will fall betwee $0.0 ad $0.30. ( ) ( ).30.5 /.0 =.5.5.0 /.0 =.5 k =.5 1 1/ k = 1 1/.5 = 0.84 At least 84% of the data values will fall betwee $0.0 ad $0.30. Empirical Rule (Normal) The percetage of values from a data set that fall withi k stadard deviatios of the mea i a ormal (bell-shaped) distributio is listed below. # of stadard deviatios, k Proportio withi k stadard deviatios 1 68% 95% 3 99.7% 67 68 Empirical Rule (Normal) 3-3 Measures of Positio z-score Percetile Quartile Outlier 69 70 Measures of Positio: z-score A z-score or stadard score for a value is obtaied by subtractig the mea from the value ad dividig the result by the stadard deviatio. X X X µ z = z = s σ A z-score represets the umber of stadard deviatios a value is above or below the mea. Sectio 3-3 Example 3-9 Page #150 71 7 1
5/8/013 Example 3-9: Test Scores A studet scored 65 o a calculus test that had a mea of 50 ad a stadard deviatio of 10; she scored 30 o a history test with a mea of 5 ad a stadard deviatio of 5. Compare her relative positios o the two tests. X X 65 50 z = = = 1.5 Calculus s 10 X X 30 5 z = = = 1.0 History s 5 She has a higher relative positio i the Calculus class. Measures of Positio: Percetiles Percetiles separate the data set ito 100 equal groups. A percetile rak for a datum represets the percetage of data values below the datum. ( X ) # of values below + 0.5 Percetile = 100% total # of values p c = 100 73 74 Measures of Positio: Example of a Percetile Graph Sectio 3-3 Example 3-3 Page #155 75 76 Example 3-3: Test Scores A teacher gives a 0-poit test to 10 studets. Fid the percetile rak of a score of 1. 18, 15, 1, 6, 8,, 3, 5, 0, 10 Sort i ascedig order., 3, 5, 6, 8, 10, 1, 15, 18, 0 6 values ( X ) # of values below + 0.5 Percetile = 100% total # of values 6+ 0.5 = 100% 10 = 65% A studet whose score was 1 did better tha 65% of the class. Sectio 3-3 Example 3-34 Page #156 77 78 13
5/8/013 Example 3-34: Test Scores A teacher gives a 0-poit test to 10 studets. Fid the value correspodig to the 5 th percetile. 18, 15, 1, 6, 8,, 3, 5, 0, 10 Sort i ascedig order., 3, 5, 6, 8, 10, 1, 15, 18, 0 p 10 5 c = = =.5 3 100 100 Measures of Positio: Quartiles ad Deciles Deciles separate the data set ito 10 equal groups. D 1 =P 10, D 4 =P 40 Quartiles separate the data set ito 4 equal groups. Q 1 =P 5, Q =MD, Q 3 =P 75 The Iterquartile Rage, IQR = Q 3 Q 1. The value 5 correspods to the 5 th percetile. 79 80 Procedure Table Fidig Data Values Correspodig to Q 1, Q, ad Q 3 Step 1 Step Arrage the data i order from lowest to highest. Fid the media of the data values. This is the value for Q. Step 3 Step 4 Fid the media of the data values that fall below Q. This is the value for Q 1. Fid the media of the data values that fall above Q. This is the value for Q 3. Sectio 3-3 Example 3-36 Page #158 8 Example 3-36: Quartiles Fid Q 1, Q, ad Q 3 for the data set. 15, 13, 6, 5, 1, 50,, 18 Sort i ascedig order. 5, 6, 1, 13, 15, 18,, 50 13+ 15 Q = = 14 6+ 1 Q1 = = 9 18 + Q3 = = 0 Measures of Positio: Outliers A outlier is a extremely high or low data value whe compared with the rest of the data values. A data value less tha Q 1 1.5(IQR) or greater tha Q 3 + 1.5(IQR) ca be cosidered a outlier. 83 84 14
5/8/013 3.4 Exploratory Data Aalysis The Five-Number Summary is composed of the followig umbers: Low, Q 1, MD, Q 3, High The Five-Number Summary ca be graphically represeted usig a Boxplot. Costructig Boxplots 1. Fid the five-umber summary.. Draw a horizotal axis with a scale that icludes the maximum ad miimum data values. 3. Draw a box with vertical sides through Q 1 ad Q 3, ad draw a vertical lie though the media. 4. Draw a lie from the miimum data value to the left side of the box ad a lie from the maximum data value to the right side of the box. 85 Bluma, Chapter 86 Example 3-38: Meteorites The umber of meteorites foud i 10 U.S. states is show. Costruct a boxplot for the data. 89, 47, 164, 96, 30, 15, 138, 78, 48, 39 Sectio 3-4 Example 3-38 Page #171 30, 39, 47, 48, 78, 89, 138, 164, 15, 96 Low Q 1 MD Q 3 High Five-Number Summary: 30-47-83.5-164-96 47 83.5 164 30 96 87 88 15