50 ParI 1 Exploring and Understanding Data 7. Sugar in cereals. The histogram displays the sugar content (as a percent of weight) of 49 brands of breakfast cereals.. Run times. One of the authors collected the times (in minutes) it took him to run 4 miles on various courses during the period 1986 to 1997. Here is a histogram of the times. o 8 16 24 32 40 48 56 Sugar (%) a) Describe this distribution. b) What do you think might account for this shape? 8. Singers. The display shows the heights of some of the singers in a chorus, collected so that the singers could be positioned on stage with shorter ones in front and taller ones in back. 20 50 I 40 f 30 l o. f; r 20 r 1. l I mn 28.0 29.0 30.0 31.0 32.0 33.0 34.0 350 4Mile TIme (min) Describe the distribution and summarize the important features. What is it about rwming that might account for the shape you see? 11. Home runs. The stemandleaf display shows the number of home runs hit by Mark McGwire during the 19822001 seasons. Describe the distribution, mentioning its shape and any W1usual features. I 5 60 68 Height (in.) a) Describe the distribution. b) Can you account for the features you see here? 9. Wineries. The histogram shows the sizes (in acres) of 36 wineries in the Finger Lakes region of New York. 15 f r I I o 120 Size (acres) )6 240 TO.5 5' 4 j 3 223 33 22j 1 o 3jj 12. Bird species. The Cornell Lab of Ornithology holds an annual Christmas Bird Count, in which birdwatchers at various locations around the country see how many different species of birds they can spot. Here are some of the COW1ts reported from sites in Texas during the 1999 event. 228 183 160 178 181 160 186 206 157 162 177 156 206 175 153 166 167 153 163 162 152 a) Create a stemandleaf display of these data. b) Write a brief description of the distribution. Be sure to discuss the overall shape as well as any unusual features. a) Approximately what percentage of these wineries are under 60 acres? 13. Horsepower. Create a stemandleaf display for these b) Write a brief description of this distribution (shape, horsepowers of autos reviewed by Consumer Reports one center, spread, unusual features). r,_a_n_d_d_e_sc:..ribe the distribution. r....,;y_e_a..:
Chapter 4 Displaying Quantitative Data 51 155 3 130 80 65 142 125 129 71 69 125 ll5 138 68 78 150 133 135 90 97 68 5 88 ll5 1 95 85 9 115 71 97 1 65 90 75 120 80 70 14. Population growth. Here is a "backtoback" stemandleaf display that shows two data sets at onceon e going to the left, one to the right. It compares the percent change in population for two regions of the United States (based on census figures for 1990 and 2000). The fastest growing states were Nevada at 66% and Arizona at 40%. Write a few sentences describing the difference in growth rates for the two regions of the United States. To show the distributions better, this display breaks each stem into two lines, putting leaves 04 on one stem and leaves 59 on the other. N6/MW stp;tes s/w.stp;tes 5 5 4 4 0 3 3 00:1. ".:l 00:1.1.34 i sr" 21..00 1. 00:1.:1.3"","444 3333"":;<0055 0 333 443"1 0 :1. 15. Hurricanes. The data below give the number of hurricanes that happened each year from 1944 through 2000 as reported by Science magazine. 11 1,1, 11,1,1,1,1,1,1, 0, 5,6,1,3,5,3 a) Create a dotplot of these data. b) Describe the distribution. 16. Hurricanes, again. A bimodal distribution usually indicates that there are actually two different behaviors present in the data. Investigating those two behaviors separately can produce important insights. Here are the data again, broken into two groups showing the number of hurricanes recorded annually before and after 1970. Create an appropriate visual display and write a few sentences comparing the wo distributions. 19441969 3,2, I, 2,4,3,7,2,3,3,2, 5,2,2,4,2,2,6,0,2,5,1, 3, I, 0, 3 19702000 2, 1, 0, 1, 2, 3, 2, 1, 2, 2, 2, 3, 1, 1, 1, 3, 0, 1, 3, 2, 1. 2, 1, 1. 0,5,6, I, 3, 5, 3 T 17. Acid rain. Two researchers measured the ph (a scale on which a value of 7 is neutral and values below 7 are acidic) of water collected from rain and snow over a 6month period in Allegheny County, Pennsylvania. Describe their data with a graph and a few sentences. 4.57 5.62 4.12 5.29 4.64 4.31 4.30 4.39 4.45 5.67 4.39 4.52 4.26 4.26 4.40 5.78 4.73 4.56 5.08 4.41 4.12 5.51 4.82 4.63 4.29 4.60 18. Marijuana. In 1995 the Council of Europe published a report entitled The European School Survey Project on Alcohol and Other Drugs. Among other issues, the survey investigated the percentages of 9th graders who had used marijuana. Here are the results for 20 Western European cotultries. Create an appropriate graph of these data, and describe the distribution. Austria % Italy 19% Belgium 19% Luxembourg 6% Denmark 17% Netherlands 31% England 40% No. Ireland 23% Finland 5% Norway 6% France 12% Portugal 7% Germany 21% Scotland 53% Greece 2% Spain 15% Iceland % Sweden 6% Ireland 37% Switzerland 27% 19, Hospital stays. The U.S. National Center for Health Statistics compiles data on the length of stay by patients in shortterm hospitals, and publishes its finding in Vital and Health Statistics. Data from a sample of 39 male patients and 35 female patients on length of stay (in days) are displayed in these histograms. 15 " c 1l 8'. a 0.0 12.5 Men 15 5 0.0.0 20.0 Women a) What would you suggest be changed about these histograms to make them easier to compare? b) Describe these distributions by writing a few sentences comparing the duration of hospitalization for men and women.
52 Part 1 Exploring and Understanding Data c) Can you suggest a reason for the peak in women's length of stay? 20. Deaths. A National Vital Statistics Report indicated that nearly 300,000 black Americans died in 1999, compared with just over 2 million wrote Americans. Here are calculator histograms displaying the distributions of their ages at death. T""" fa Min=Ei5 Min=65 MIlX<75 "=18.7772 MolX< 75 n=2d.'tdb1 Most of the bars in these histograms display tenyear age groups. For example, the first histogram shows that for white Americans about 19% of the deaths were of people between 65 and 74 years old. The leftmost bars represent the percentage of total deaths that were children aged 0 through 4 years and the rightmost bars people over 85. Write a brief comparison of the distributions. 21. Final grades. A professor (of something other than Statistics!) distributed the following histogram to show the distribution of grades on his 200point final exam. Comment on the display. '" 40 I 30 I iii a 20 I 501 1 50 r 11 150 Final Grade 22. FinaJ grades revisited. After receiving many complaints about his final grade histogram from students currently taking a Statistics course, the professor distributed the following revised histogram. f n n nn pn In I I nln n i.@n 75.00 92.50 1.00 127.50 145.00 162.50 180.00 Final Grade a) Comment on this display. b) Describe the distribution of grades. 23. Zip codes. Holes R Us, an Internet company that sells piercing jewelry, keeps transaction records on its sales. At a recent sales meeting, one of the staff presented a histogram of the zip codes of the last 500 customers so that they might understand where sales are coming from. Corrunent on the usefulness and appropriateness of the display. 80 15000 40000 65000 Zip 24. CEO data revisited. For each CEO, a code is listed that corresponds to the industry of the CEO's company. Here are a few of the codes and the industries to which they correspond. Industry Industry Code Financial services 1 Food/ drink/ tobacco 2 Health 3 Insurance 4 Retailing 6 Forest products 9 Aerospace/ defense 11 Energy 12 Capital goods 14 Computers/ communications 16 Entertainment/information 17 Consumer nondurables Electric utilities 18 19 A recently hired investment analyst has been assigned to examine the industries and the compensations of the CEOs. To start the analysis, he produces the following histogram of industry codes. 200 150 :'3 0 50 0.00 3.75 7.50 11.25 15.00 18.75 industry Code
F' a) What might account for the gaps seen in the histogram? b) Is the histogram unimodal? c) What advice might you give the analyst about the appropriateness of this display? 25. Productivity study. The National Center for Productivity releases information on the efficiency of workers. In a recent report, they included the following graph showing a rapid rise in productivity. What questions do you have about this display? 4 3.5 f e 3 2.5 c " Cl <t Chapter 4 Displaying Quantitative Data 53 Assaults KilledInjured (per 00) (per 00) Bureau of Alcohol, 31.1 2.2 Tobacco, and Firearms (BArF) Capitol Police 5.0 3.6 Customs Service 9.7 5.1 Drug Enforcement 17.9 1.1 Agency (DEA) Federal Bureau of 3.9 1.2 Investigation (FBI) Jmmjgration and 14.1 2.5 Naturalization Services (INS) Internal Revenue 1.7 0.2 Service (IRS) U.S. Marshal Service 9.7 3.0 National Park Service 38.7 15.0 Postal Service 5.7 2.9 Secret Service 9.7 3.0 26. Productivity revisited. A second report by the National Center for Productivity analyzed the relationship between productivity and wages. Comment on the graph they used.,"*"productivity... Wages a) Create a visual display of these data. b) Describe these data (shape, center, spread, unusual features). c) Which agencies are outliers? 28. Cholesterol. A study examining the health risks of smoking measured the cholesterol levels of people who had smoked for at least 25 years and people of similar ages who had smoked for no more than 5 years and then stopped. Create histograms for both groups and write a brief report comparing their cholesterol levels. 27. Law enforcement. Some federal employees have the au thority to carry firearms and make arrests. Obviously some danger is associated with these jobs, but how much? The table in the next column summarizes the rates of assault and injury (or death) for these employees for 5 years, 19951999. Smokers Exsmokers 225 211 209 284 250 134 300 258 216 196 288 249 213 3 250 200 209 280 175 174 328 225 256 243 200 160 188 321 213 246 225 237 213 257 292 232 267 232 216 200 271 227 216 243 200 155 238 163 263 216 271 230 309 ' 192 242 249 183 280 217 305 242 267 243 287 217 246 351 217 267 218 200 280 209 217 183 228 29. MPG. A consumer organization compared gas mileage figures for several models of cars made in the United
54 Part 1 Exploring and Understanding Data States with autos manufactured in other countries. The data are shown in the table. U.S. Models Others 16.9 16.2 15.5 20.3 19.2 31.5 18.5 30.5 30.0 21.5 30.9 31.9 20.6 37.3 20.8 27.5 18.6 27.2 18.1 34.1 17.0 35.1 17.6 29.5 16.5 31.8 18.2 22.0 26.5 17.0 21.9 21.6 27.4 28.4 28.8 26.8 33.5 34.2 league ball park. Some believe that the thinner air makes it harder for pitchers to throw curve balls and easier for batters to hit the ball a long way. Do you see any evidence that the 14 runs scored per game there is unusually high? Explain. 31. Nuclear power. For a while in the 20th century many nuclearpowered electrical generating plants were built, but then growing environmental concerns and construction costs led to increasing reliance on other forms of energy. The table shows the dates of completion (in months after January 1967) and costs (in thousands of dollars per megawatt) of 12 nuclear generators. Time of completion <months after lan 1, 1967) 2 3 12 17 19 21 26 30 32 41 47 Construction cost ($00!mW) 35 28 32 60 56 63 62 81 84 79 88 80 a) Create a backtoback stemandleaf display for these data. b) Write a few sentences comparing the distributions, 30. Baseball. American League baseball teams play their games with the designated hitter rule, meaning that pitchers do not bat. The League believes that replacing the pitcher, traditionally a weak hitter, with another player in the batting order produces more runs and generates more interest among fans. Below are the average number of runs scored in American League and National League stadiums for the first half of the 2001 season. American National 11.1.8.8.3 14.0 11.6.4.3.3.1.0 9.5 lo.2 9.5 9.5 9.5 9.4 9.3 9.2 9.2 9.5 9.1 8.8 8.4 9.0 8.3 8.3 8.2 8.l 7.9 a) Create a backtoback stemandleaf display of these data. b) Write a few sentences comparing the average number of runs scored per game in the two leagues. (Remember: shape, center, spread, unusual features!) c) Coors Field, in Denver, stands a mile above sea level, an altitude far greater than that of any other major aj Create a stemandieaf display of the costs. b) Describe the distribution. c) Create a timeplot of the costs. d) What information about the construction of nuclear plants can you see from the timeplot that is not obvious in the stemandieaf display? 32. Drunk driving. Accidents involving drunk drivers account for about 40% of all deaths on the nation's highways. The table tracks the number of alcoholrelated fatalities for 20 years. Deaths Deaths Year (thousands) Year (thousands) 1982 25.2 1992 17.9 1983 23.6 1993 17.5 1984 23.8 1994 16.6 1985 22.7 1995 17.2 1986 24.0 1996 17.2 1987 23.6 1997 16.5 1988 236 1998 16.0 1989 22.4 1999 16.0 1990 22.0 2000 16.7 1991 19.9 2001 16.7
a) Create a stemandieaf display or a histogram of these data. b) Create a timeplot. c) Using features apparent in the stemandieaf display (or histogram) and the timeplot, write a few sentences about deaths caused by drunk driving. 33. Assets. Here is a histogram of the assets (in millions of dollars) of 79 companies chosen from the Forbes list of the nation's top corporations. 50 40 ::l. 30 E a a 20 o 20000 Assets 40000 a) What aspect of this distribution makes it difficult to summarize, or to discuss, center and spread? b) Here are the same data after reexpressions as the square root of assets and the logarithm of assets. Which reexpression do you prefer? Why? Chapter 4 Displaying Quantitative Data 55 c) In the square root reexpression, what does the value 50 actually indicate about the company's assets? d) In the logarithm reexpression, what does the value 3 actually indicate about the company's assets? 34. Rainmakers. The table lists the amount of rainfall (in acrefeet) from 26 clouds seeded with silver iodide. 2745 200 1697 198 1656 129 978 119 703 118 489 115 430 92 334 40 302 32 274 31 274 17 255 7 242 4 a) Why is "acrefeet" a good way to measure the amount of precipitation produced by cloud seeding? b) Plot these data, and describe the distribution. c) Create a reexpression of these data that produces a more advantageous distribution. d) Explain what your reexpressed scale means. 0 75 150 225 " Assets 8 m c 6 E a '' 4 0 2 2.25 3.00 3.75 4.50 Log (Assets)