1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years of operation. To test the validity of this claim, a government testing agency selected a random sample of 100 sets and found that 14 sets required some repair within the first two years of operation. 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 2. What is the standard error of this confidence interval? ˆp(1 ˆp).14(1.14) SE = = = 0.0347 n 100 3. What is the margin of error? ME = CV SE = 1.96 0.0347 = 0.068 4. Set up a 95% confidence interval estimate of the population proportion of TV sets that need repair in the first two years of operation? (0.07199, 0.20801) 5. What conclusion can we draw from this confidence interval? Since 0.1 is within the confidence interval, we can conclude that the company s brochure is correct. 6. Interpret the 95% confidence interval. We are 95% confident that the true population proportion is between 8 and 21 percent. 7. What sample size should be taken if the agency wants 95% confidence when the margin of error is 0.05? n = ( CV ME )2 (ˆp(1 ˆp)) = ( 1.96 0.05 )2 (.14(1.14) = 185.01 186 1
2.2 CI 2-independent samples Scenario 2 The purchasing director for an industrial factory is investigating the possibility of purchasing a new milling machine. She determines that the new machine will be purchased if there is evidence that the parts produced a higher breaking strength than those from the old machine. The sample standard deviation of the breaking strength for the old machine is 10 kilograms and for the new machine is 9 kilograms. A sample of 25 parts taken from the old machine indicated a sample mean of 65 kilograms, whereas a similar sample of 25 from the new machine indicated a sample mean of 72 kilograms. 1. What are the degrees of freedom? DF = n 1 + n 2 2 = 25 + 25 2 = 48 2. What is the critical value for this 95% confidence interval? CV = t 0.025,48 = invt (.025, 48) = ±2.0106 3. What is the standard error of this confidence interval? Since ME = CV SE, we can solve for SE = ME = 5.41 = 2.6907 CV 2.0106 4. What is the margin of error? ME = 1.5899+12.41 2 = 5.41005 5. Set up a 95% confidence interval of the population difference between the two means? (-12.41, -1.5899) 6. What conclusion can we draw from this confidence interval? Since zero is not within the interval, we can conclude that the new machine has a higher breaking strength than the old machine. The purchasing director should purchase the new machine. 7. Interpret the 95% confidence interval. We are 95% confident that the true mean difference is between -12.4 and -1.6. 2
2.3 CI 1 sample T Scenario 3 Suppose an independent testing agency has been contracted to determine whether the contracting company should use a gasoline additive to increase gasoline mileage of its vehicles. The current gasoline mileage for it vehicles is 18.5 mpg. A random sample of 30 vehicles from the company s fleet produced a sample average of 19.34 mpg and a sample standard deviation of 5.2 mpg. 1. What are the degrees of freedom? DF = n 1 = 30 1 = 29 2. What is the critical value for this 95% confidence interval? CV = t.025,29 = invt (.025, 29) = ±2.0452 3. What is the standard error of this confidence interval? SE = 5.2 30 = 0.9494 4. What is the margin of error? ME = CV SE = 2.0452 0.9494 = 1.9417 5. Set up a 95% confidence interval of the population average of the of MPG with gasoline additive? (17.398, 21.282) 6. What conclusion can we draw from this confidence interval? The MPG does not significantly change when the additive was placed in the gasoline. 7. Interpret the 95% confidence interval. We are 95% confident that the true mean is between 17.4 and 21.3. 8. What sample size should be taken if the agency wants 95% confidence when the margin of error is 1.5? CV SD n = ( ME )2 = ( 1.96 5.2 ) 2 = 46.17 47 1.5 3
2.4 CI paired t Scenario 4 Suppose a shoe company wants to test material for the soles of shoes. For each pair of shoes the new material is placed on one shoe and the old material is placed on the other shoe. After a given period of time a random sample of 10 pairs of shoes is selected. The wear is measured on a 10 point scale (higher is better) with the following results. The average of the differences is 0.3 and it standard deviation is 1.767. 1. What are the degrees of freedom? DF = n 1 = 10 1 = 9 2. What is the critical value for this 95% confidence interval? CV = t.025,9 = invt (.025, 9) = ±2.2622 3. What is the standard error of this confidence interval? SE = SD n = 1.767 10 = 0.5588 4. What is the margin of error? ME = CV SE = 2.2622.5588 = 1.2641 5. Set up a 95% confidence interval of the population difference of paired observations of shoe soles? (-0.964, 1.564) 6. What conclusion can we draw from this confidence interval? Since zero is within the confidence interval, we can conclude that there is no difference between the new material and the old material. 7. Interpret the 95% confidence interval. We are 95% confident that the true average difference is between -0.9 and 1.6. 8. What sample size should be taken if the agency wants 95% confidence when the margin of error is 0.6? CV SD n = ( ME )2 = ( 1.96 1.767 ) 2 = 33.3 34 0.6 4
2.5 hypotheses test 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years of operation. To test the validity of this claim, a government testing agency selected a random sample of 100 sets and found that 14 sets required some repair within the first two years of operation. The company uses a 5% level of significance. 1. How many tails have for this test? one-tailed which is upper-tail 2. What are the hypotheses? H 0 : p 0.1 vs. H 1 : p > 0.1 3. What is the standard error of the proportion? p(1 p).1(1.1) SE = = = 0.03 n 100 4. What is the test statistic? z = 1.3333 5. What is the p-value? p-value = 0.0912; do not reject H 0 6. What conclusion can we draw from this test? There is no evidence to reject the company s claim. 7. What is the critical value? z.05 = invnorm(.05) = 1.645 5
2.6 hypotheses test 2-independent samples Scenario 2 The purchasing director for an industrial factory is investigating the possibility of purchasing a new milling machine. She determines that the new machine will be purchased if there is evidence that the parts produced a higher breaking strength than those from the old machine. The sample standard deviation of the breaking strength for the old machine is 10 kilograms and for the new machine is 9 kilograms. A sample of 25 parts taken from the old machine indicated a sample mean of 65 kilograms, whereas a similar sample of 25 from the new machine indicated a sample mean of 72 kilograms. The director uses a 5% level of significance. 1. How many tails have for this test? one tailed test 2. What are the hypotheses? H 0 : µ o µ n vs. H 1 : µ o < µ n 3. What is the test statistic? t = 2.6015 4. What are the degrees of freedom? DF = n 1 + n 2 2 = 25 + 25 2 = 48 5. What is the p-value? p-value = 0.0062 6. Should you reject the null hypothesis (decision)? Yes 7. What conclusion can we draw from this test? There is evidence that the mean breaking strength of the new machine greater than the old machine. 8. What is the critical value? CV = t.05,48 = invt (.05, 48) = 1.6772 6
2.7 Hypotheses testing 1 sample T Scenario 3 Suppose an independent testing agency has been contracted to determine whether the contracting company should use a gasoline additive. The current gasoline mileage for it vehicles is 18.5 mpg. A random sample of 30 vehicles from the company s fleet produced a sample average of 19.34 mpg and a sample standard deviation of 5.2 mpg. Is there evidence that putting an additive into the gasoline of the company vehicles will improve the performance (i.e., MPG) of the company vehicles. The company uses a 5% level of significance. 1. How many tails have for this test? upper one-tailed test 2. What are the hypotheses? H 0 : µ 18.5 vs. H 1 : µ > 18.5 3. What is the test statistic? t = 0.8848 4. What are the degrees of freedom? DF = n 1 = 30 1 = 29 5. What is the p-value? p-value = 0.1918 6. Should you reject the null hypothesis (decision)? Do not reject H 0 7. What conclusion can we draw from this test? There is no evidence that the additive actual improved gasoline mileage. 8. What is the critical value? CV = t.05,29 = invt (.95, 29) = 1.6991 7
2.8 Hypotheses test paired t Scenario 4 Suppose a shoe company wants to test material for the soles of shoes. For each pair of shoes the new material is placed on one shoe and the old material is placed on the other shoe. After a given period of time a random sample of 10 pairs of shoes is selected. The wear is measured on a 10 point scale (higher is better) with the following results. The average of the differences is 0.3 and it standard deviation is 1.767. Is there evidence the new sole material is different from the current sole material? 1. How many tails have for this test? This is a two-tailed test. 2. What are the hypotheses? H 0 : µ d = 0 vs. H 1 : µ d 0 3. What is the test statistic? t = 0.5369 4. What are the degrees of freedom? DF = n 1 = 10 1 = 9 5. What is the p-value? p-value = 0.6044 6. Should you reject the null hypothesis (decision)? Do not reject H 0 7. What conclusion can we draw from this test? There is no evidence that the new sole material is different from the current sole material. 8. What is the critical value? CV = t.025,9 = invt (.025, 9) = ±2.2622 8
2.9 χ 2 -test Scenario 5 Suppose the head of the HR division of a mid-sized company wants to determine if she should let Red Cross have a give blood day in the company cafeteria. She take a random sample of size 49. The follow contingency table is constructed. Blood Donor Status Yes No Total Men 5 17 22 Women 7 20 27 Total 12 37 49 1. What are the hypotheses? H 0 : p y = p n vs. H 1 : p y p n 2. What is the test statistic? χ 2 = 0.0671 3. What are the degrees of freedom? DF = (#r 1)(#c 1) = (2 1)(2 1) = 1 4. What is the p-value? p-value = 0.7957 5. Should you reject the null hypothesis (decision)? Is p-value < α? No; do not reject H 0 6. What conclusion can we draw from this test? There is evidence that status and gender are independent. 7. What is the expected value for cell row 2 column 2? E 2,2 = 20.388 9
2.10 SLR Scenario 6 A statistician for an American automobile manufacturer would like to develop a statistical model for predicting delivery time (the days between initiating the order to the actual delivery of the new car) of custom-ordered new automobile. The statistician believes there is a linear relationship between the number of options ordered on a car and the delivery time. A random sample of 16 cars is selected with the following results. Options Ordered vs Delivery Time Regression Statistics Multiple R 0.9785 R square 0.9575 Adj R sq 0.9545 Standard error 3.0446 Observations 16 Delivery Time 30 40 50 60 70 5 10 15 20 25 Residuals -4-2 0 2 4 Residuals vs Fitted 10 13 3 30 40 50 60 70 Options Ordered Fitted values lm(time ~ Options) ANOVA df SS MS F Significance F Regression 1 2927.23 2927.23 315.8 0 Residual 14 129.77 9.27 Total 15 3057.00 Coefficients Coefficient Std error t Stat p-value Low 95% Up 95% intercept 21.9254 1.5908 13.7823 0.0 18.51 25.34 optionsordered 2.0687 0.1164 17.7707 0.0 1.819 2.3184 1. Identify which variable is the X, independent, or explanatory variable. Options is the independent variable. 2. Identify which variable is the Y, dependent, or response variable. Time is the dependent variable. 3. Describe the pattern of points as they appear on the graph. As options increases, time increases. 4. What kind of relationship do you see? The relationship is positive and linear. 10
5. Are there any outliers? There are no apparent outliers. 6. Describe the strength and direction of the correlation. The strength of the correlation is strong (r =.98) and the direction is positive. 7. Compare this relationship with the pattern of points on the scatter diagram between the two variables. They are in agreement. 8. Write the specific estimated regression equation for this problem. time = b 0 + b 1 (options) = 21.9254 + 2.0687 options 9. Using the estimated regression equation predict the average delivery time for the average car with 16 options ordered. time = 21.9254 + 2.0687 16 = 55.02 10. Is the previous prediction extrapolation? No, since the minimum options is 3 and the maximum options is 25. 11. Interpret the slope estimate, that is, explain what is means in terms of this problem. As options increases by one, time increases by 2.07 days (i.e., value of the slope). 12. Determine the coefficient of determination or how much variation in delivery time is accounted for by this regression model? Express your answer as a percent. What measure did you use to answer this question? Coefficient of determination = r 2 = 95.75%. 13. What is the standard error of the estimated regression line? Include the unit of measurement in your answer. s = 3.0446 days. 14. Using a 5% level of significance, is there evidence of a linear relationship between delivery time and options ordered? Be sure to state the hypotheses, test statistic, p-value, and the conclusion. H 0 : β = 0 vs. H 1 : β 0 t = 17.7707 p-value = 0 There is evidence that the slope is not zero. 11
15. Give a 95% confidence interval for the true (i.e., population) slope. (1.819, 2.3184) is a 95% confidence interval. 16. If the original correlation coefficient between these two variables were not known, how could it be calculated using the statistics in the regression output? How do you determine the sign of the correlation coefficient? r = r 2. The sign of r is determined by the sign of the slope. 17. Describe what you see on the residual plot. There appears to be a slight pattern. 18. For the data set, look at the 9 th pair of observations (Options, Time) or (12, 44). Calculate the residual, i.e., e i = Y i Ŷi. e 9 = 44 (21.9254 + 2.0687 12) = 44 46.7498 = 2.7498 19. Is the model a good fit for the data? Be sure to state your decision and give the reasons that support your decision. Consider the following: r 2 =.9785 s = 3.0446days Rejected H 0 of the slope. Review the scatter plot 12
2.11 MLR Scenario 7 Suppose a consumer organization wanted to develop a model to predict gasoline mileage as measured by miles per gallon (MPG) based on the horsepower of the car s engine and the weight of the car. A sample of 50 recent car models was selected, with the results summarized below. Regression Statistics Multiple R 0.8657 R square 0.7494 Adj R sq 0.7388 Standard error 4.1766 Observations 50 Correlation Coefficient MPG HP WT MPG 1 HP -0.7882 1 WT -0.8248 0.7419 1 Descriptive Statistics MPG Horsepower Weight Mean 28.5 90.8 2756.5 Std Err 1.16 3.85 89.81 Std Dev 8.17 27.26 635.05 Variance 66.77 743.04 403289.76 Minimum 15.5 48 1755 Maximum 46.6 165 4360 Sum 1427.1 4542 137826 Count 50 50 50 Min - Max x-variable Min Max HP 48 165 WT 1755 4360 ANOVA df SS MS F Significance F Regression 2 2451.97 1225.99 70.2813 0 Residual 47 819.87 17.44 Total 49 3271.84 Coefficients Coefficient Std error t Stat p-value Low 95% Up 95% intercept 58.1508 2.6582 21.8780 0.0 52.81 63.50 Horsepower -0.1175 0.0326-3.6003 0.0008-0.1832-0.0519 Weight -0.0069 0.0014-4.9035 0.0-0.0097-0.0041 1. Identify which variables are the X, independent, or explanatory variables. Horsepower (HP) and weight (WT) are the explanatory variables. 2. Identify which variable is the Y, dependent, or response variable. Miles per gallon (MPG) is the response variable. 13
3. Describe the strength and direction of the correlation. Correlation coefficient between MPG and HP is -.7882 Correlation coefficient between MPG and WT is -.7247 Correlation coefficient between WT and HP is.7419 4. Write the specific estimated regression equation for this problem. MP G = 58.1508 0.1175 HP 0.0069 W T 5. Using the estimated regression equation predict the average MPG for a car that has 60 HP and weighs 2000 lbs. MP G = 58.1508 0.1175 60 0.0069 2000 = 37.3mpg 6. Is the previous prediction extrapolation? No; since HP = 60 is between 48 and 165 and WT = 2000 is between 1755 and 4360. 7. Interpret the slope estimate, that is, explain what is means in terms of this problem. Holding WT constant, as HP increasing be one, MPG decreases by.1175. Holding HP constant, as WT increasing be one, MPG decreases by.0069. 8. Determine the coefficient of multiple determination or how much variation in MPG is accounted for by this regression model? Express your answer as a percent. What measure did you use to answer this question? r 2 = 74.9% 9. What is the standard error of the estimated regression line? Include the unit of measurement in your answer. s = 4.1766mpg. 10. Using a 5% level of significance, is there evidence of a linear relationship between MPG and the explanatory variables? Be sure to state the hypotheses, test statistic, p-value, and the conclusion. H 0 : β 1 = β 2 = 0 vs. H 1 : at least one β i 0 where i = (1, 2) 11. Give a 95% confidence interval for the true (i.e., population) slope of MPG and HP. A 95% confidence interval for MPG and HP is (-.1832, -.0519). 14
12. For the data set, look at the 1 st set of observations (MPG, HP, WT) or (43.1, 48, 1985). Calculate the residual, i.e., e i = Y i Ŷi. e 1 = 43.1 (58.1508.1175 48.0069 1985) = 43.1 38.8143 = 4.2857 13. Is the model a good fit for the data? Be sure to state your decision and give the reasons that support your decision. r 2 =.7494 s = 4.1766 Rejected H 0 Questions Questions? 15