Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal Calories: The number of calories per serving Protein: The number of grams of protein per serving Fat: The number of grams of fat per serving Fiber: The number of grams of fiber per serving Sodium: The number of milligrams (mg) of sodium per serving Carbo: The number of grams of carbohydrates per serving Sugars: The number of grams of sugars per serving Vitamins: The percentage of the recommended daily allowance (RDA) of vitamins per serving Shelf: 1 indicates that the cereal appears on the lowest shelf in the store indicates that the cereal does not appear on the lowest shelf in the store rating: An overall healthiness rating for the cereal. The higher the rating, the healthier the cereal. Some observations from the data set follow: name calories protein fat sodium fiber carbo sugars vitamins Shelf rating Product_19 1 3 32 1 2 3 1 41.54 Cheerios 11 6 2 29 2 17 1 25 5.765 Corn_Flakes 1 2 29 1 21 2 25 45.863 Rice_Krispies 11 2 29 22 3 25 4.56 Corn_Chex 11 2 28 22 3 25 41.445 The Excel output below gives information about the sodium content in the 77 cereals. Use this to answer the following questions sodium Mean 159.6753 Standard Error 9.553577 Median 18 Mode Standard Deviation 83.8323 Sample Variance 727.854 Kurtosis -.34524 Skewness -.57571 Range 32 Minimum Maximum 32 Sum 12295 Count 77 Confidence Level(9.%) 15.9814 sodium Min Q1 13 Median 18 Q3 21 Max 32 Outliers 1
1. Describe the shape of the distribution of sodium contents in the 77 breakfast cereals. The distribution is slightly skewed to the left and contains 9 outliers. These outliers all appear as one point on the boxplot because each of the 9 outlying cereals contain mg of sodium per serving. 2. What is the median sodium content in the cereals? What does this value represent? The median sodium content in the cereals is 18 mg. This implies that 5% of the cereals in the sample have less than 18 mg. of sodium per serving. Likewise, 5% of the cereals in the sample have more than 18 mg. of sodium per serving. 3. The 25% of the cereals that contain the most sodium contain at least how much sodium per serving? This value would be 75 th percentile or the 3 rd quartile. The 25% of the cereals with most sodium contain at least 21 mg per serving. 4. What is the standard deviation of the sodium contents? What does this value represent? The standard deviation of the sodium contents is 83.83. This is a measure of variability in the sample. Specifically it measures the spread of the observations around the sample mean. 5. Assume that this represents a random sample of 77 cereals from the population of all breakfast cereals. Conduct a hypothesis test to determine if the mean sodium content in all cereals is greater than 14 mg. per serving. State the null and alternative hypothesis, the test statistic, p- value or an approximate p-value, and the decision and conclusion. Use α =.1 Ho: µ = 14 Ha: µ > 14 x µ 159.6753 14 Test statistic: t = = = 2.6 s 83.8323 n 77 Degrees of freedom: n-1 = 76 p-value: use approximate degrees of freedom of 8 on the t-table. Note that the computed test statistic falls between the critical values of 1.99 and 2.88 on the t-table. This implies that the p-value falls in the range.2 < p-value <.25. Decision: Since the p-value is greater than α, we will not reject the null hypothesis. There is not sufficient evidence at the 1% level of significance to conclude that the mean sodium content in all cereals is greater than 14 mg per serving. 6. What is the IQR of the sample? What does this value represent? The IQR gives the range of the middle 5% of the sample. It is the difference between the third and first quartiles and is given by Q3-Q1 = 21-13 = 8. The following Excel output gives information about the healthiness ratings of cereals that appear on the low shelf in the store compared to the ratings of cereals that do not appear on the low shelf in the store. The output was generated using α =.5. Use this output to answer the following questions. Assume that the data represent random samples from the populations of all cereals on the low shelf and those not on the low shelf in the store. 2
7. What is the sample variance of the healthiness rating of cereals that do not appear on the low shelf? s 2 = 17.85 8. Suppose you wish to conduct a hypothesis test to determine if cereals on the low shelf have a lower average healthiness rating than those appearing on higher shelves. State the null and alternative hypothesis to test this claim. H : µ low = µ hi H a : µ low < µ hi 9. State the test statistic, p-value, decision, and conclusion to the hypothesis test in the previous question. Use α =.5 Test statistic: -3.14 p-value:.2 Decision: Since the p-value is less than α, reject H. There is sufficient evidence to conclude that cereals on the low shelf have lower average healthiness ratings than those that do not appear on the low shelf. 1. Compute and interpret a 95% confidence interval to estimate the difference in the population mean healthiness ratings between cereals that appear on the lower shelf and those on higher shelves. 2 2 s1 s2 194.685 ( x 1 x2 ) ± t * + = 1.578 ± 2.32 + n n 21 1 2 = -1.578 ± 2.32(3.51) = -1.578 ± 7.132 17.85 56 With 95% confidence, on average cereals on the low shelf in the grocery store have a rating of between 3.45 and 17.71 points lower than cereals on higher shelves. 3
11. What is the margin of error for the confidence interval computed in the previous question? The margin of error for the interval computed above is 7.132 Suppose that the 77 cereals represent a random sample of all breakfast cereals. 21 of the cereals contain more than 1 grams of sugar per serving. Use this information to answer the following questions. 12. Compute a 99% confidence interval to estimate the true proportion of breakfast cereals that contain more than 1 grams of sugar per serving. Interpret the interval. x + 2 21+ 2 p = = =.284 n + 4 77 + 4 (.155,.413) ( p ) ( ) * p 1.284 1.284 p ± z =.284 ± 2.576 n + 4 77 + 4 ( ) =.284 ± 2.576.51 =.284 ±.1291 = We are 99% confident that the true population proportion of all breakfast cereals that contain more than 1 grams of sugar per serving is between 16% and 41%. 13. A consumer health advocacy group states that more than one quarter of all breakfast cereals contain more than 1 grams of sugar per serving. State the null and alternative hypothesis to test this claim. Ho: p =.25 Ha: p >.25 14. For the test in the previous question, state the test statistic, p-value, decision and conclusion. Use α =.1 x 21 pˆ = = =.2727 n 77 Test statistic: pˆ p.2727.25 z = = p 1 p.25 1.25 ( ) ( ) n 77.227 =.4935 =.46 p-value:.3228 Decision: Since the p-value is greater than α, do not reject Ho. There is not enough evidence at the 1% level of significance to conclude that more than one quarter of all breakfast cereals contain more than 1 grams of sugar. 4
The following table gives a breakdown of the shelf on which the cereal appears (shelf = 1 indicates the low shelf, shelf = indicates a higher shelf), and the manufacturer of the cereal. Self = 1 Shelf = Row totals General Mills 7 15 22 Kellogg 7 16 23 Nabisco 2 4 6 Quaker 3 5 8 Other 2 16 18 Column totals 21 56 77 15. Use this table information to test for the independence between the two categorical variables, shelf and manufacturer. State the null and alternative hypothesis, compute the test statistic, and give an approximate p-value for the test. State your decision and conclusion based on α =.5. Ho: The shelf on which a cereal appears is independent of the manufacturer. Ha: The shelf on which a cereal appears depends on the manufacturer. Table of expected cell counts: Table of ( actual expected )2 expected Self = 1 Shelf = Row totals General Mills 6 16 22 Kellogg 6.27 16.73 23 Nabisco 1.64 4.36 6 Quaker 2.18 5.82 8 Other 4.91 13.9 18 Column totals 21 56 77 Self = 1 General Mills.166667.625 Kellogg.84321.31621 Nabisco.888.333 Quaker.36818.11557 Other 1.72396.646465 Shelf = Row totals Column totals 3.2484652 Test statistic: 3.248 Degrees of freedom: (5-1)(2-1) = 4 p-value: The closest critical value on the chi square table with 4 degrees of freedom is 5.39 which has a tail probability of.25. Our computed test statistic is 3.248 which gives an upper tail probability that is larger than.25. Thus, our p-value is larger than.25. Decision: Since p-value > α, we do not reject Ho. There is not enough evidence at the 5% level of significance to conclude that the shelf on which a cereal appears is dependent upon the manufacturer. 16. Of those cereals on the low shelf, what percentage is made by Nabisco? 2/21 =.95 = 9.5% 5
Use the multiple regression output below to answer the following questions. The output reflects the regression of the healthiness rating (Y) on the number of calories, fat, and fiber grams per serving as well as the shelf on which the cereal appears. SUMMARY OUTPUT: Regression using PredInt.xls Regression Statistics Multiple R.8284 R Square.6863 Adjusted R Square.6689 Standard Error 8.834 Observations 77 ANOVA df SS MS F Significance (p-value) for F Regression 41292.232573.58 39.3788. Residual 72474.56765.34121 Total 76 14996.8197.3263 Dependent (Criterion) Variable: rating Coef-ficients Standard Error t Stat P-value (2-tails) Lower 95% Upper 95% X Values for Prediction Intercept 77.76 6.263 12.416. 65.276 9.245 calories -.337.59-5.753. -.454 -.22 12 fat -2.571 1.84-2.372.2-4.732 -.41 1 fiber 2.324.436 5.328. 1.455 3.194 5 Shelf -5.414 2.185-2.477.16-9.771-1.58 Confidence Level Prediction Interval for a Single Observation Predicted 46.376.95 of rating, with the X Values that you Standard Error 8.299 enter in the yellow boxes. Lower 95% 29.833 Upper 95% 62.919 Confidence Interval for Expected rating Fit 46.376 while holding X constant at the values that you Standard Error 1.878 enter in the yellow boxes. Lower 95% 42.632 Upper 95% 5.12 17. What is R 2? What does this value mean?.6863. This means that 68.63% of the observed variation in the healthiness ratings can be explained by the calories, fat, and fiber per serving in addition to the shelf on which the cereal appears. 18. Estimate the healthiness rating of a cereal with 1 calories, 2 grams of fat, grams of fiber per serving that appears on the low shelf. y ˆ = 77.76.337*1 2.571* 2 2.324* 5.414*1 = 33.54 19. Test to determine if the number of fat grams per serving is a significant linear predictor of the healthiness rating. State the null and alternative hypothesis, test statistic, p-value, decision and conclusion. Use α =.5. Ho: β = Ha: β 6
Test statistic: -2.372 p-value:.2 Decision: Since p-value < α, reject Ho. There is enough evidence at the 5% level of significance to conclude that the number of fat grams is a significant linear predictor of the healthiness rating of breakfast cereals. 2. State and interpret the 95% confidence interval for estimating the population slope coefficient of the variable fiber. The 95% confidence interval is given by (1.455, 3.194). We are 95% confident that a one gram increase in fiber per serving gives an increase in the population average cereal rating of between 1.455 and 3.194 points when comparing cereals with the same number of calories and fat grams per serving that appear on the same shelf. 21. State and interpret the 95% confidence interval for estimating the population slope coefficient of the variable shelf. The 95% confidence interval is given by (-9.771, -1.58). When comparing cereals with the same number of calories, fat, and fiber per serving, cereals on the low shelf have a population average rating of between 1.58 and 9.771 points lower than cereals on higher shelves. 22. Interpret the slope coefficient for the variable calories. For each additional calorie per serving contained in a breakfast cereal, the predicted average rating decreases by.337 points when comparing cereals with the same amount of fat and fiber per serving that appear on the same shelf in the grocery store. 7