Name: Answer the following questions NEATLY. Show all necessary work directly on the exam. Scratch paper will be discarded unread. One point each part unless otherwise marked. 1. Owners of an exercise gym believe that a Normal model is useful in projecting the number of clients who will exercise in their gym each week. They use a mean of 800 clients and a standard deviation of 90 clients. (a) Draw and clearly label this model. (b) What is the first quartile of the weekly number of clients? Answer: 739. 2. Dogs 3. (a) The SPCA collects the following data about the dogs they house. Which is categorical? (Choose the best answer) A. age B. number of days housed C. veterinary costs D. breed E. weight (b) Which of the variables in the previous question, collected for only German Shepherds is most likely to be described by a Normal model? (Choose the best answer) A. veterinary costs B. weight C. age D. number of days housed E. breed
A statistics teacher kept track of the number of emails each student sent over the course of one term. Answer the following questions based on the histogram. (a) Which is largest? (Choose the best answer) A. Mean B. Median C. Location of the Mode (b) Which is smallest? Circle one. (Choose the best answer) A. Location of the Mode B. Median C. Mean (c) Which measure of center is best for the data? Circle one. (Choose the best answer) A. Median because it is closer to the higher numbers of emails B. Mean because it is closer to the higher numbers of emails C. Median because it is closer to the lower numbers of emails D. Mean because it is closer to the lower number of emails (d) Which measure of spread is best for the data? Circle one. (Choose the best answer) A. Range because the histogram is unimodal B. IQR because the data is skew C. Standard deviation because the data potentially has outliers (e) Based on the histogram, create a sketch (by hand) of the corresponding box plot. Keys: Stretches from 0 to 8, left whisker shorter than right, median line on box closer to low side (f) Which is true of the data shown in the histogram? Circle one of A-E below. I. The mean and median are approximately equal II. The data is symmetric III. The median and IQR to summarize this data are better than the mean and standard deviation. (Choose the best answer) A. I only B. III only C. I, II, and III D. I and III E. I and II 4. Suppose that a Normal model describes the acidity (ph) of rainwater, and that water tested after last week s storm had a z-score of 1.8. This means that the acidity of that rain (Choose the best answer) A. had a ph 1.8 higher than that of average rainwater B. had a ph 1.8 time that of average rainwater C. had a ph of 1.8 D. had a ph 1.8 standard deviations higher than that of average rainwater E. varied with a standard deviation of 1 Page 2
5. 192 students in an Intro Stats course were asked to describe their politics as Liberal, Moderate, or Conservative. Here are the results: L M C Female 35 36 6 Male 50 44 21 (a) What percent of conservative students are female? 6/27 =.222 (a) (b) What percent of female students are conservative? 6/77 = 0.078 (b) (c) What percent of the class is female? 77/192 =.401 (c) (d) What percent of all students in the class are females who consider themselves conservative? 6/192 = 0.031 (d) (e) Below is a graph of the data above, using percents. Based on this graph, can we conclude that the variables are dependent? Why or why not? Yes. The distribution of gender is NOT the same for all political orientations. 6. A taxi company monitoring the safety of its cabs kept track of the number of miles tires had been driven (in thousands) and the depth of the tread remaining (in mm). Their data are displayed in the scatterplot. They found the equation of the least squares regression line to be tread = 36 0.6miles, with r 2 = 0.74. Page 3
(a) Draw the regression line on the graph. (b) What is the explanatory variable? (b) Miles (c) Explain (in context) what the y-intercept of the line means. For a taxi that has not been driven any miles, the starting tread depth is 36mm. (d) Explain (in context) what the slope of the line means. For every extra thousand miles a taxi is driven, the tire tread depth decreases by 0.6 mm. (e) Explain (in context) what R 2 means. 74% of the variation in tread depth is explained by the linear relationship with miles driven. (f) The correlation r = r 2 = 0.74 = 0.86. Since the slope is negative, we have r = 0.86. (f) (g) Based on our regression, can we conclude that driving more miles will lead to a decrease in tire tread depth? Explain No. Despite the fact that commonsense might suggest this, we are not able to draw causal conclusions from a regression analysis. (h) What is the best predicted tread depth for a car driven 40 thousand miles? tread = 36.6(40) = 12mm (i) In this context, what does a negative residual mean? Page 4
A taxi has a smaller tread depth for the miles driven than was predicted by the model. 7. The boxplots show prices of used cars (in thousands of dollars) advertised for sale at three different car dealers. For each question below, choose one from Ace, BuyIt, or Carz. (a) Which dealer has the lowest median price? BuyIt (a) (b) Which dealer has the smallest price range? (b) Ace (c) Which dealer s prices have the smallest IQR, and what is it? (2 points) (c) Carz, 4 or 5 thousand (d) Which dealer offers the cheapest car offered? (d) Carz (e) Which dealer generally sells cars the cheapest? (e) BuyIt 8. Light bulbs are measured in lumens (light output), watts (energy used), and hours (life). A standard white light bulb has a mean life of 675 hours and a standard deviation of 50 hours. A soft white light bulb has a mean life of 700 hours and a standard deviation of 35 hours. At a local science competition, both light bulbs lasted 750 hours. Which light bulb s life span was better? (Choose the best answer) Page 5
A. Relative to each other, the light bulbs performed the same. B. Relatively, the soft white light bulb performed better. C. There is no basis for comparison, since they are different kinds of light bulbs. D. There is not enough information for comparison. E. Relatively, the standard white light bulb performed better. 9. Data Analysis. Data are from the US Department of Health and Human Services, National Center for Health Statistics, Third National Health and Nutrition Examination Survey and includes health data for 80 people. The variables are: GENDER is the person 92 s gender, AGE is in years, HT is height in inches, WT is weight in pounds, WAIST is circumference in cm, PULSE is pulse rate in beats per minute, SYS is systolic blood pressure in mmhg, DIAS is diastolic blood pressure in mmhg, CHOL is cholesterol in mg, BMI is body mass index, LEG is upper leg length in cm, ELBOW is elbow breadth in cm, WRIST is wrist breadth in cm, ARM is arm circumference in cm. The data and this prompt can be found at http://www.canyons.edu/faculty/morrowa/140/mozart/. Use Word to answer the following questions. Print your solutions when you are ready. Your final write-ups should include ONLY the graphs/statistics that are relevant. Suggested Discussion Points: Describe the distribution (shape, center, spread, other features) If it is appropriate, fit an appropriate model (Normal model, linear model). Provide evidence (appropriate graphs and statistics) for all of your findings. 3 points each (a) Analyze/summarize the weight of individuals in the study. Variable: WT. A correct solution will include: Graph: Boxplot/histogram. Page 6
Description of Shape: As shown in the histogram and boxplot, the distribution of weights is approximately symmetric, although it could be said there is a slight skew to the right. The distribution is unimodal, with a mode around 160 pounds. The histogram does not show any gaps, but the boxplot shows an outlier on the high end. Center: The mean weight is 159.39 pounds. The median is 161 pounds. Because the data is roughly symmetric, the center is best measured with the mean. Spread: The standard deviation is 34.88, and the IQR is 44.75. Because the data is roughly symmetric, the spread is best measured by the standard deviation. Normal?: The normal model is appropriate, and this can be verified by the probability plot shown below. (b) Analyze/summarize the gender of individuals in the study. Variable: GENDER. GENDER Percent female 50.00 male 50.00 A correct solution will include: Graph: Bar chart (relative or frequency is okay) Table: Relative or frequency Comment on Most : There were equal numbers of men and women in the study. (c) Analyze/summarize the relationship between gender and weight of individuals in the study. Variables: WT, GENDER. Page 7
Variable GENDER Mean StDev Median IQR WT female 146.22 37.62 135.80 47.17 male 172.55 26.33 169.95 38.60 A correct solution will include: Graph: Side-by-side boxplots and histograms of weight (separated by gender). Comparison of the Shape: Weights of females are skewed to the right, where as weights of males are more symmetric. Weights of males and females are both unimodal. The males do not show outliers in weights, but the females show 2 on the high end. Comparison of Center: Males have a higher mean and median weight, and so generally weigh more. Comparison of Spread: Females, however, have larger spread in weight as measured by both the standard deviation and IQR. (d) Analyze/summarize the relationship between weight and waist size of individuals in the study. Does weight have an influence on waste size? Variables: WT, WAIST. A correct solution will include: Graph: Scatterplot with waist on the y and weight on the x. Description of scatterplot: There appears to be a strong positive linear pattern. Correlation: r =.908, which is pretty low Linear Regression: W AIST = 33.2 + 0.345W T Analyze Fit: The residual plot shows no significant pattern, indicating that the pattern in the original scatterplot is linear. R 2 = 82.5%, which is fairly high, indicating that the linear relationship is strong. Page 8