1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Similar documents
Final Exam Practice Problem Answers

Regression Analysis: A Complete Example

Premaster Statistics Tutorial 4 Full solutions

Multiple Linear Regression

Univariate Regression

STAT 350 Practice Final Exam Solution (Spring 2015)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Two-sample hypothesis testing, II /16/2004

Elementary Statistics Sample Exam #3

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Data Analysis Tools. Tools for Summarizing Data

Simple linear regression

Week TSX Index

Chapter 7: Simple linear regression Learning Objectives

Estimation of σ 2, the variance of ɛ

Interaction between quantitative predictors

Hypothesis testing - Steps

Factors affecting online sales

August 2012 EXAMINATIONS Solution Part I

Chapter 23. Inferences for Regression

Part 2: Analysis of Relationship Between Two Variables

Simple Linear Regression Inference

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

Regression step-by-step using Microsoft Excel

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Chapter 4 and 5 solutions

DATA INTERPRETATION AND STATISTICS

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

3.4 Statistical inference for 2 populations based on two samples

Independent t- Test (Comparing Two Means)

Two Related Samples t Test

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

2. Simple Linear Regression

2013 MBA Jump Start Program. Statistics Module Part 3

One-Way Analysis of Variance (ANOVA) Example Problem

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 7 Section 1 Homework Set A

Using Excel for inferential statistics

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

Exercise 1.12 (Pg )

Point Biserial Correlation Tests

Example: Boats and Manatees

MULTIPLE REGRESSION EXAMPLE

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

Bowerman, O'Connell, Aitken Schermer, & Adcock, Business Statistics in Practice, Canadian edition

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Hypothesis Testing --- One Mean

5. Linear Regression

Recall this chart that showed how most of our course would be organized:

How To Run Statistical Tests in Excel

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

Confidence Intervals for the Difference Between Two Means

The Dummy s Guide to Data Analysis Using SPSS

Using R for Linear Regression

Statistics Review PSY379

t Tests in Excel The Excel Statistical Master By Mark Harmon Copyright 2011 Mark Harmon

The importance of graphing the data: Anscombe s regression examples

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Comparing Multiple Proportions, Test of Independence and Goodness of Fit

Difference of Means and ANOVA Problems

Causal Forecasting Models

Module 5: Multiple Regression Analysis

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

1 Simple Linear Regression I Least Squares Estimation

How Does My TI-84 Do That

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Chapter Study Guide. Chapter 11 Confidence Intervals and Hypothesis Testing for Means

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

November 08, S8.6_3 Testing a Claim About a Standard Deviation or Variance

EXCEL Analysis TookPak [Statistical Analysis] 1. First of all, check to make sure that the Analysis ToolPak is installed. Here is how you do it:

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Additional sources Compilation of sources:

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

Relationships Between Two Variables: Scatterplots and Correlation

Section 13, Part 1 ANOVA. Analysis Of Variance

General Method: Difference of Means. 3. Calculate df: either Welch-Satterthwaite formula or simpler df = min(n 1, n 2 ) 1.

1.1. Simple Regression in Excel (Excel 2010).

11. Analysis of Case-control Studies Logistic Regression

One-Way Analysis of Variance

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

Correlation and Simple Linear Regression

Section 1: Simple Linear Regression

SPSS Guide: Regression Analysis

Statistical Models in R

Case Study in Data Analysis Does a drug prevent cardiomegaly in heart failure?

Transcription:

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years of operation. To test the validity of this claim, a government testing agency selected a random sample of 100 sets and found that 14 sets required some repair within the first two years of operation. 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 2. What is the standard error of this confidence interval? ˆp(1 ˆp).14(1.14) SE = = = 0.0347 n 100 3. What is the margin of error? ME = CV SE = 1.96 0.0347 = 0.068 4. Set up a 95% confidence interval estimate of the population proportion of TV sets that need repair in the first two years of operation? (0.07199, 0.20801) 5. What conclusion can we draw from this confidence interval? Since 0.1 is within the confidence interval, we can conclude that the company s brochure is correct. 6. Interpret the 95% confidence interval. We are 95% confident that the true population proportion is between 8 and 21 percent. 7. What sample size should be taken if the agency wants 95% confidence when the margin of error is 0.05? n = ( CV ME )2 (ˆp(1 ˆp)) = ( 1.96 0.05 )2 (.14(1.14) = 185.01 186 1

2.2 CI 2-independent samples Scenario 2 The purchasing director for an industrial factory is investigating the possibility of purchasing a new milling machine. She determines that the new machine will be purchased if there is evidence that the parts produced a higher breaking strength than those from the old machine. The sample standard deviation of the breaking strength for the old machine is 10 kilograms and for the new machine is 9 kilograms. A sample of 25 parts taken from the old machine indicated a sample mean of 65 kilograms, whereas a similar sample of 25 from the new machine indicated a sample mean of 72 kilograms. 1. What are the degrees of freedom? DF = n 1 + n 2 2 = 25 + 25 2 = 48 2. What is the critical value for this 95% confidence interval? CV = t 0.025,48 = invt (.025, 48) = ±2.0106 3. What is the standard error of this confidence interval? Since ME = CV SE, we can solve for SE = ME = 5.41 = 2.6907 CV 2.0106 4. What is the margin of error? ME = 1.5899+12.41 2 = 5.41005 5. Set up a 95% confidence interval of the population difference between the two means? (-12.41, -1.5899) 6. What conclusion can we draw from this confidence interval? Since zero is not within the interval, we can conclude that the new machine has a higher breaking strength than the old machine. The purchasing director should purchase the new machine. 7. Interpret the 95% confidence interval. We are 95% confident that the true mean difference is between -12.4 and -1.6. 2

2.3 CI 1 sample T Scenario 3 Suppose an independent testing agency has been contracted to determine whether the contracting company should use a gasoline additive to increase gasoline mileage of its vehicles. The current gasoline mileage for it vehicles is 18.5 mpg. A random sample of 30 vehicles from the company s fleet produced a sample average of 19.34 mpg and a sample standard deviation of 5.2 mpg. 1. What are the degrees of freedom? DF = n 1 = 30 1 = 29 2. What is the critical value for this 95% confidence interval? CV = t.025,29 = invt (.025, 29) = ±2.0452 3. What is the standard error of this confidence interval? SE = 5.2 30 = 0.9494 4. What is the margin of error? ME = CV SE = 2.0452 0.9494 = 1.9417 5. Set up a 95% confidence interval of the population average of the of MPG with gasoline additive? (17.398, 21.282) 6. What conclusion can we draw from this confidence interval? The MPG does not significantly change when the additive was placed in the gasoline. 7. Interpret the 95% confidence interval. We are 95% confident that the true mean is between 17.4 and 21.3. 8. What sample size should be taken if the agency wants 95% confidence when the margin of error is 1.5? CV SD n = ( ME )2 = ( 1.96 5.2 ) 2 = 46.17 47 1.5 3

2.4 CI paired t Scenario 4 Suppose a shoe company wants to test material for the soles of shoes. For each pair of shoes the new material is placed on one shoe and the old material is placed on the other shoe. After a given period of time a random sample of 10 pairs of shoes is selected. The wear is measured on a 10 point scale (higher is better) with the following results. The average of the differences is 0.3 and it standard deviation is 1.767. 1. What are the degrees of freedom? DF = n 1 = 10 1 = 9 2. What is the critical value for this 95% confidence interval? CV = t.025,9 = invt (.025, 9) = ±2.2622 3. What is the standard error of this confidence interval? SE = SD n = 1.767 10 = 0.5588 4. What is the margin of error? ME = CV SE = 2.2622.5588 = 1.2641 5. Set up a 95% confidence interval of the population difference of paired observations of shoe soles? (-0.964, 1.564) 6. What conclusion can we draw from this confidence interval? Since zero is within the confidence interval, we can conclude that there is no difference between the new material and the old material. 7. Interpret the 95% confidence interval. We are 95% confident that the true average difference is between -0.9 and 1.6. 8. What sample size should be taken if the agency wants 95% confidence when the margin of error is 0.6? CV SD n = ( ME )2 = ( 1.96 1.767 ) 2 = 33.3 34 0.6 4

2.5 hypotheses test 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years of operation. To test the validity of this claim, a government testing agency selected a random sample of 100 sets and found that 14 sets required some repair within the first two years of operation. The company uses a 5% level of significance. 1. How many tails have for this test? one-tailed which is upper-tail 2. What are the hypotheses? H 0 : p 0.1 vs. H 1 : p > 0.1 3. What is the standard error of the proportion? p(1 p).1(1.1) SE = = = 0.03 n 100 4. What is the test statistic? z = 1.3333 5. What is the p-value? p-value = 0.0912; do not reject H 0 6. What conclusion can we draw from this test? There is no evidence to reject the company s claim. 7. What is the critical value? z.05 = invnorm(.05) = 1.645 5

2.6 hypotheses test 2-independent samples Scenario 2 The purchasing director for an industrial factory is investigating the possibility of purchasing a new milling machine. She determines that the new machine will be purchased if there is evidence that the parts produced a higher breaking strength than those from the old machine. The sample standard deviation of the breaking strength for the old machine is 10 kilograms and for the new machine is 9 kilograms. A sample of 25 parts taken from the old machine indicated a sample mean of 65 kilograms, whereas a similar sample of 25 from the new machine indicated a sample mean of 72 kilograms. The director uses a 5% level of significance. 1. How many tails have for this test? one tailed test 2. What are the hypotheses? H 0 : µ o µ n vs. H 1 : µ o < µ n 3. What is the test statistic? t = 2.6015 4. What are the degrees of freedom? DF = n 1 + n 2 2 = 25 + 25 2 = 48 5. What is the p-value? p-value = 0.0062 6. Should you reject the null hypothesis (decision)? Yes 7. What conclusion can we draw from this test? There is evidence that the mean breaking strength of the new machine greater than the old machine. 8. What is the critical value? CV = t.05,48 = invt (.05, 48) = 1.6772 6

2.7 Hypotheses testing 1 sample T Scenario 3 Suppose an independent testing agency has been contracted to determine whether the contracting company should use a gasoline additive. The current gasoline mileage for it vehicles is 18.5 mpg. A random sample of 30 vehicles from the company s fleet produced a sample average of 19.34 mpg and a sample standard deviation of 5.2 mpg. Is there evidence that putting an additive into the gasoline of the company vehicles will improve the performance (i.e., MPG) of the company vehicles. The company uses a 5% level of significance. 1. How many tails have for this test? upper one-tailed test 2. What are the hypotheses? H 0 : µ 18.5 vs. H 1 : µ > 18.5 3. What is the test statistic? t = 0.8848 4. What are the degrees of freedom? DF = n 1 = 30 1 = 29 5. What is the p-value? p-value = 0.1918 6. Should you reject the null hypothesis (decision)? Do not reject H 0 7. What conclusion can we draw from this test? There is no evidence that the additive actual improved gasoline mileage. 8. What is the critical value? CV = t.05,29 = invt (.95, 29) = 1.6991 7

2.8 Hypotheses test paired t Scenario 4 Suppose a shoe company wants to test material for the soles of shoes. For each pair of shoes the new material is placed on one shoe and the old material is placed on the other shoe. After a given period of time a random sample of 10 pairs of shoes is selected. The wear is measured on a 10 point scale (higher is better) with the following results. The average of the differences is 0.3 and it standard deviation is 1.767. Is there evidence the new sole material is different from the current sole material? 1. How many tails have for this test? This is a two-tailed test. 2. What are the hypotheses? H 0 : µ d = 0 vs. H 1 : µ d 0 3. What is the test statistic? t = 0.5369 4. What are the degrees of freedom? DF = n 1 = 10 1 = 9 5. What is the p-value? p-value = 0.6044 6. Should you reject the null hypothesis (decision)? Do not reject H 0 7. What conclusion can we draw from this test? There is no evidence that the new sole material is different from the current sole material. 8. What is the critical value? CV = t.025,9 = invt (.025, 9) = ±2.2622 8

2.9 χ 2 -test Scenario 5 Suppose the head of the HR division of a mid-sized company wants to determine if she should let Red Cross have a give blood day in the company cafeteria. She take a random sample of size 49. The follow contingency table is constructed. Blood Donor Status Yes No Total Men 5 17 22 Women 7 20 27 Total 12 37 49 1. What are the hypotheses? H 0 : p y = p n vs. H 1 : p y p n 2. What is the test statistic? χ 2 = 0.0671 3. What are the degrees of freedom? DF = (#r 1)(#c 1) = (2 1)(2 1) = 1 4. What is the p-value? p-value = 0.7957 5. Should you reject the null hypothesis (decision)? Is p-value < α? No; do not reject H 0 6. What conclusion can we draw from this test? There is evidence that status and gender are independent. 7. What is the expected value for cell row 2 column 2? E 2,2 = 20.388 9

2.10 SLR Scenario 6 A statistician for an American automobile manufacturer would like to develop a statistical model for predicting delivery time (the days between initiating the order to the actual delivery of the new car) of custom-ordered new automobile. The statistician believes there is a linear relationship between the number of options ordered on a car and the delivery time. A random sample of 16 cars is selected with the following results. Options Ordered vs Delivery Time Regression Statistics Multiple R 0.9785 R square 0.9575 Adj R sq 0.9545 Standard error 3.0446 Observations 16 Delivery Time 30 40 50 60 70 5 10 15 20 25 Residuals -4-2 0 2 4 Residuals vs Fitted 10 13 3 30 40 50 60 70 Options Ordered Fitted values lm(time ~ Options) ANOVA df SS MS F Significance F Regression 1 2927.23 2927.23 315.8 0 Residual 14 129.77 9.27 Total 15 3057.00 Coefficients Coefficient Std error t Stat p-value Low 95% Up 95% intercept 21.9254 1.5908 13.7823 0.0 18.51 25.34 optionsordered 2.0687 0.1164 17.7707 0.0 1.819 2.3184 1. Identify which variable is the X, independent, or explanatory variable. Options is the independent variable. 2. Identify which variable is the Y, dependent, or response variable. Time is the dependent variable. 3. Describe the pattern of points as they appear on the graph. As options increases, time increases. 4. What kind of relationship do you see? The relationship is positive and linear. 10

5. Are there any outliers? There are no apparent outliers. 6. Describe the strength and direction of the correlation. The strength of the correlation is strong (r =.98) and the direction is positive. 7. Compare this relationship with the pattern of points on the scatter diagram between the two variables. They are in agreement. 8. Write the specific estimated regression equation for this problem. time = b 0 + b 1 (options) = 21.9254 + 2.0687 options 9. Using the estimated regression equation predict the average delivery time for the average car with 16 options ordered. time = 21.9254 + 2.0687 16 = 55.02 10. Is the previous prediction extrapolation? No, since the minimum options is 3 and the maximum options is 25. 11. Interpret the slope estimate, that is, explain what is means in terms of this problem. As options increases by one, time increases by 2.07 days (i.e., value of the slope). 12. Determine the coefficient of determination or how much variation in delivery time is accounted for by this regression model? Express your answer as a percent. What measure did you use to answer this question? Coefficient of determination = r 2 = 95.75%. 13. What is the standard error of the estimated regression line? Include the unit of measurement in your answer. s = 3.0446 days. 14. Using a 5% level of significance, is there evidence of a linear relationship between delivery time and options ordered? Be sure to state the hypotheses, test statistic, p-value, and the conclusion. H 0 : β = 0 vs. H 1 : β 0 t = 17.7707 p-value = 0 There is evidence that the slope is not zero. 11

15. Give a 95% confidence interval for the true (i.e., population) slope. (1.819, 2.3184) is a 95% confidence interval. 16. If the original correlation coefficient between these two variables were not known, how could it be calculated using the statistics in the regression output? How do you determine the sign of the correlation coefficient? r = r 2. The sign of r is determined by the sign of the slope. 17. Describe what you see on the residual plot. There appears to be a slight pattern. 18. For the data set, look at the 9 th pair of observations (Options, Time) or (12, 44). Calculate the residual, i.e., e i = Y i Ŷi. e 9 = 44 (21.9254 + 2.0687 12) = 44 46.7498 = 2.7498 19. Is the model a good fit for the data? Be sure to state your decision and give the reasons that support your decision. Consider the following: r 2 =.9785 s = 3.0446days Rejected H 0 of the slope. Review the scatter plot 12

2.11 MLR Scenario 7 Suppose a consumer organization wanted to develop a model to predict gasoline mileage as measured by miles per gallon (MPG) based on the horsepower of the car s engine and the weight of the car. A sample of 50 recent car models was selected, with the results summarized below. Regression Statistics Multiple R 0.8657 R square 0.7494 Adj R sq 0.7388 Standard error 4.1766 Observations 50 Correlation Coefficient MPG HP WT MPG 1 HP -0.7882 1 WT -0.8248 0.7419 1 Descriptive Statistics MPG Horsepower Weight Mean 28.5 90.8 2756.5 Std Err 1.16 3.85 89.81 Std Dev 8.17 27.26 635.05 Variance 66.77 743.04 403289.76 Minimum 15.5 48 1755 Maximum 46.6 165 4360 Sum 1427.1 4542 137826 Count 50 50 50 Min - Max x-variable Min Max HP 48 165 WT 1755 4360 ANOVA df SS MS F Significance F Regression 2 2451.97 1225.99 70.2813 0 Residual 47 819.87 17.44 Total 49 3271.84 Coefficients Coefficient Std error t Stat p-value Low 95% Up 95% intercept 58.1508 2.6582 21.8780 0.0 52.81 63.50 Horsepower -0.1175 0.0326-3.6003 0.0008-0.1832-0.0519 Weight -0.0069 0.0014-4.9035 0.0-0.0097-0.0041 1. Identify which variables are the X, independent, or explanatory variables. Horsepower (HP) and weight (WT) are the explanatory variables. 2. Identify which variable is the Y, dependent, or response variable. Miles per gallon (MPG) is the response variable. 13

3. Describe the strength and direction of the correlation. Correlation coefficient between MPG and HP is -.7882 Correlation coefficient between MPG and WT is -.7247 Correlation coefficient between WT and HP is.7419 4. Write the specific estimated regression equation for this problem. MP G = 58.1508 0.1175 HP 0.0069 W T 5. Using the estimated regression equation predict the average MPG for a car that has 60 HP and weighs 2000 lbs. MP G = 58.1508 0.1175 60 0.0069 2000 = 37.3mpg 6. Is the previous prediction extrapolation? No; since HP = 60 is between 48 and 165 and WT = 2000 is between 1755 and 4360. 7. Interpret the slope estimate, that is, explain what is means in terms of this problem. Holding WT constant, as HP increasing be one, MPG decreases by.1175. Holding HP constant, as WT increasing be one, MPG decreases by.0069. 8. Determine the coefficient of multiple determination or how much variation in MPG is accounted for by this regression model? Express your answer as a percent. What measure did you use to answer this question? r 2 = 74.9% 9. What is the standard error of the estimated regression line? Include the unit of measurement in your answer. s = 4.1766mpg. 10. Using a 5% level of significance, is there evidence of a linear relationship between MPG and the explanatory variables? Be sure to state the hypotheses, test statistic, p-value, and the conclusion. H 0 : β 1 = β 2 = 0 vs. H 1 : at least one β i 0 where i = (1, 2) 11. Give a 95% confidence interval for the true (i.e., population) slope of MPG and HP. A 95% confidence interval for MPG and HP is (-.1832, -.0519). 14

12. For the data set, look at the 1 st set of observations (MPG, HP, WT) or (43.1, 48, 1985). Calculate the residual, i.e., e i = Y i Ŷi. e 1 = 43.1 (58.1508.1175 48.0069 1985) = 43.1 38.8143 = 4.2857 13. Is the model a good fit for the data? Be sure to state your decision and give the reasons that support your decision. r 2 =.7494 s = 4.1766 Rejected H 0 Questions Questions? 15