Final Exam Practice Problem Answers



Similar documents
1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Week TSX Index

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

Regression step-by-step using Microsoft Excel

Elementary Statistics Sample Exam #3

Regression Analysis: A Complete Example

Premaster Statistics Tutorial 4 Full solutions

Multiple Linear Regression

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Exercise 1.12 (Pg )

5. Linear Regression

Chapter 7 Section 1 Homework Set A

Simple Linear Regression Inference

Part 2: Analysis of Relationship Between Two Variables

Descriptive Statistics

Estimation of σ 2, the variance of ɛ

2013 MBA Jump Start Program. Statistics Module Part 3

Statistics I for QBIC. Contents and Objectives. Chapters 1 7. Revised: August 2013

STAT 350 Practice Final Exam Solution (Spring 2015)

Univariate Regression

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

3.4 Statistical inference for 2 populations based on two samples

Data Analysis Tools. Tools for Summarizing Data

Additional sources Compilation of sources:

STATS8: Introduction to Biostatistics. Data Exploration. Babak Shahbaba Department of Statistics, UCI

Mean = (sum of the values / the number of the value) if probabilities are equal

How To Test For Significance On A Data Set

1.5 Oneway Analysis of Variance

Independent t- Test (Comparing Two Means)

1 Simple Linear Regression I Least Squares Estimation

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Two Related Samples t Test

Example: Boats and Manatees

12: Analysis of Variance. Introduction

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Dongfeng Li. Autumn 2010

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Simple linear regression

Chapter 7: Simple linear regression Learning Objectives

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp

AMS 7L LAB #2 Spring, Exploratory Data Analysis

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

Chicago Booth BUSINESS STATISTICS Final Exam Fall 2011

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Section 7.1. Introduction to Hypothesis Testing. Schrodinger s cat quantum mechanics thought experiment (1935)

DATA INTERPRETATION AND STATISTICS

Two-Sample T-Tests Assuming Equal Variance (Enter Means)

MTH 140 Statistics Videos

Module 5: Multiple Regression Analysis

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

How Far is too Far? Statistical Outlier Detection

CHAPTER 11 CHI-SQUARE AND F DISTRIBUTIONS

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Using R for Linear Regression

Section 13, Part 1 ANOVA. Analysis Of Variance

Difference of Means and ANOVA Problems

5/31/2013. Chapter 8 Hypothesis Testing. Hypothesis Testing. Hypothesis Testing. Outline. Objectives. Objectives

Chapter 23. Inferences for Regression

2. Filling Data Gaps, Data validation & Descriptive Statistics

When to use Excel. When NOT to use Excel 9/24/2014

Factors affecting online sales

Chapter 3 Quantitative Demand Analysis

Statistics Review PSY379

The Dummy s Guide to Data Analysis Using SPSS

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

SPSS Guide: Regression Analysis

2. What is the general linear model to be used to model linear trend? (Write out the model) = or


Simple Regression Theory II 2010 Samuel L. Baker

One-Way Analysis of Variance (ANOVA) Example Problem

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

THE FIRST SET OF EXAMPLES USE SUMMARY DATA... EXAMPLE 7.2, PAGE 227 DESCRIBES A PROBLEM AND A HYPOTHESIS TEST IS PERFORMED IN EXAMPLE 7.

HYPOTHESIS TESTING (ONE SAMPLE) - CHAPTER 7 1. used confidence intervals to answer questions such as...

CALCULATIONS & STATISTICS

Least Squares Estimation

UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014

Chapter 7. One-way ANOVA

An Introduction to Statistics Course (ECOE 1302) Spring Semester 2011 Chapter 10- TWO-SAMPLE TESTS

Stats Review Chapters 9-10

Chapter 8: Hypothesis Testing for One Population Mean, Variance, and Proportion

Chapter 7 Section 7.1: Inference for the Mean of a Population

Generalized Linear Models

Non-Parametric Tests (I)

Center: Finding the Median. Median. Spread: Home on the Range. Center: Finding the Median (cont.)

Statistics 100 Sample Final Questions (Note: These are mostly multiple choice, for extra practice. Your Final Exam will NOT have any multiple choice!

Chapter 5 Analysis of variance SPSS Analysis of variance

Practice problems for Homework 12 - confidence intervals and hypothesis testing. Open the Homework Assignment 12 and solve the problems.

II. DISTRIBUTIONS distribution normal distribution. standard scores

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Is it statistically significant? The chi-square test

Predictor Coef StDev T P Constant X S = R-Sq = 0.0% R-Sq(adj) = 0.

Transcription:

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal Calories: The number of calories per serving Protein: The number of grams of protein per serving Fat: The number of grams of fat per serving Fiber: The number of grams of fiber per serving Sodium: The number of milligrams (mg) of sodium per serving Carbo: The number of grams of carbohydrates per serving Sugars: The number of grams of sugars per serving Vitamins: The percentage of the recommended daily allowance (RDA) of vitamins per serving Shelf: 1 indicates that the cereal appears on the lowest shelf in the store indicates that the cereal does not appear on the lowest shelf in the store rating: An overall healthiness rating for the cereal. The higher the rating, the healthier the cereal. Some observations from the data set follow: name calories protein fat sodium fiber carbo sugars vitamins Shelf rating Product_19 1 3 32 1 2 3 1 41.54 Cheerios 11 6 2 29 2 17 1 25 5.765 Corn_Flakes 1 2 29 1 21 2 25 45.863 Rice_Krispies 11 2 29 22 3 25 4.56 Corn_Chex 11 2 28 22 3 25 41.445 The Excel output below gives information about the sodium content in the 77 cereals. Use this to answer the following questions sodium Mean 159.6753 Standard Error 9.553577 Median 18 Mode Standard Deviation 83.8323 Sample Variance 727.854 Kurtosis -.34524 Skewness -.57571 Range 32 Minimum Maximum 32 Sum 12295 Count 77 Confidence Level(9.%) 15.9814 sodium Min Q1 13 Median 18 Q3 21 Max 32 Outliers 1

1. Describe the shape of the distribution of sodium contents in the 77 breakfast cereals. The distribution is slightly skewed to the left and contains 9 outliers. These outliers all appear as one point on the boxplot because each of the 9 outlying cereals contain mg of sodium per serving. 2. What is the median sodium content in the cereals? What does this value represent? The median sodium content in the cereals is 18 mg. This implies that 5% of the cereals in the sample have less than 18 mg. of sodium per serving. Likewise, 5% of the cereals in the sample have more than 18 mg. of sodium per serving. 3. The 25% of the cereals that contain the most sodium contain at least how much sodium per serving? This value would be 75 th percentile or the 3 rd quartile. The 25% of the cereals with most sodium contain at least 21 mg per serving. 4. What is the standard deviation of the sodium contents? What does this value represent? The standard deviation of the sodium contents is 83.83. This is a measure of variability in the sample. Specifically it measures the spread of the observations around the sample mean. 5. Assume that this represents a random sample of 77 cereals from the population of all breakfast cereals. Conduct a hypothesis test to determine if the mean sodium content in all cereals is greater than 14 mg. per serving. State the null and alternative hypothesis, the test statistic, p- value or an approximate p-value, and the decision and conclusion. Use α =.1 Ho: µ = 14 Ha: µ > 14 x µ 159.6753 14 Test statistic: t = = = 2.6 s 83.8323 n 77 Degrees of freedom: n-1 = 76 p-value: use approximate degrees of freedom of 8 on the t-table. Note that the computed test statistic falls between the critical values of 1.99 and 2.88 on the t-table. This implies that the p-value falls in the range.2 < p-value <.25. Decision: Since the p-value is greater than α, we will not reject the null hypothesis. There is not sufficient evidence at the 1% level of significance to conclude that the mean sodium content in all cereals is greater than 14 mg per serving. 6. What is the IQR of the sample? What does this value represent? The IQR gives the range of the middle 5% of the sample. It is the difference between the third and first quartiles and is given by Q3-Q1 = 21-13 = 8. The following Excel output gives information about the healthiness ratings of cereals that appear on the low shelf in the store compared to the ratings of cereals that do not appear on the low shelf in the store. The output was generated using α =.5. Use this output to answer the following questions. Assume that the data represent random samples from the populations of all cereals on the low shelf and those not on the low shelf in the store. 2

7. What is the sample variance of the healthiness rating of cereals that do not appear on the low shelf? s 2 = 17.85 8. Suppose you wish to conduct a hypothesis test to determine if cereals on the low shelf have a lower average healthiness rating than those appearing on higher shelves. State the null and alternative hypothesis to test this claim. H : µ low = µ hi H a : µ low < µ hi 9. State the test statistic, p-value, decision, and conclusion to the hypothesis test in the previous question. Use α =.5 Test statistic: -3.14 p-value:.2 Decision: Since the p-value is less than α, reject H. There is sufficient evidence to conclude that cereals on the low shelf have lower average healthiness ratings than those that do not appear on the low shelf. 1. Compute and interpret a 95% confidence interval to estimate the difference in the population mean healthiness ratings between cereals that appear on the lower shelf and those on higher shelves. 2 2 s1 s2 194.685 ( x 1 x2 ) ± t * + = 1.578 ± 2.32 + n n 21 1 2 = -1.578 ± 2.32(3.51) = -1.578 ± 7.132 17.85 56 With 95% confidence, on average cereals on the low shelf in the grocery store have a rating of between 3.45 and 17.71 points lower than cereals on higher shelves. 3

11. What is the margin of error for the confidence interval computed in the previous question? The margin of error for the interval computed above is 7.132 Suppose that the 77 cereals represent a random sample of all breakfast cereals. 21 of the cereals contain more than 1 grams of sugar per serving. Use this information to answer the following questions. 12. Compute a 99% confidence interval to estimate the true proportion of breakfast cereals that contain more than 1 grams of sugar per serving. Interpret the interval. x + 2 21+ 2 p = = =.284 n + 4 77 + 4 (.155,.413) ( p ) ( ) * p 1.284 1.284 p ± z =.284 ± 2.576 n + 4 77 + 4 ( ) =.284 ± 2.576.51 =.284 ±.1291 = We are 99% confident that the true population proportion of all breakfast cereals that contain more than 1 grams of sugar per serving is between 16% and 41%. 13. A consumer health advocacy group states that more than one quarter of all breakfast cereals contain more than 1 grams of sugar per serving. State the null and alternative hypothesis to test this claim. Ho: p =.25 Ha: p >.25 14. For the test in the previous question, state the test statistic, p-value, decision and conclusion. Use α =.1 x 21 pˆ = = =.2727 n 77 Test statistic: pˆ p.2727.25 z = = p 1 p.25 1.25 ( ) ( ) n 77.227 =.4935 =.46 p-value:.3228 Decision: Since the p-value is greater than α, do not reject Ho. There is not enough evidence at the 1% level of significance to conclude that more than one quarter of all breakfast cereals contain more than 1 grams of sugar. 4

The following table gives a breakdown of the shelf on which the cereal appears (shelf = 1 indicates the low shelf, shelf = indicates a higher shelf), and the manufacturer of the cereal. Self = 1 Shelf = Row totals General Mills 7 15 22 Kellogg 7 16 23 Nabisco 2 4 6 Quaker 3 5 8 Other 2 16 18 Column totals 21 56 77 15. Use this table information to test for the independence between the two categorical variables, shelf and manufacturer. State the null and alternative hypothesis, compute the test statistic, and give an approximate p-value for the test. State your decision and conclusion based on α =.5. Ho: The shelf on which a cereal appears is independent of the manufacturer. Ha: The shelf on which a cereal appears depends on the manufacturer. Table of expected cell counts: Table of ( actual expected )2 expected Self = 1 Shelf = Row totals General Mills 6 16 22 Kellogg 6.27 16.73 23 Nabisco 1.64 4.36 6 Quaker 2.18 5.82 8 Other 4.91 13.9 18 Column totals 21 56 77 Self = 1 General Mills.166667.625 Kellogg.84321.31621 Nabisco.888.333 Quaker.36818.11557 Other 1.72396.646465 Shelf = Row totals Column totals 3.2484652 Test statistic: 3.248 Degrees of freedom: (5-1)(2-1) = 4 p-value: The closest critical value on the chi square table with 4 degrees of freedom is 5.39 which has a tail probability of.25. Our computed test statistic is 3.248 which gives an upper tail probability that is larger than.25. Thus, our p-value is larger than.25. Decision: Since p-value > α, we do not reject Ho. There is not enough evidence at the 5% level of significance to conclude that the shelf on which a cereal appears is dependent upon the manufacturer. 16. Of those cereals on the low shelf, what percentage is made by Nabisco? 2/21 =.95 = 9.5% 5

Use the multiple regression output below to answer the following questions. The output reflects the regression of the healthiness rating (Y) on the number of calories, fat, and fiber grams per serving as well as the shelf on which the cereal appears. SUMMARY OUTPUT: Regression using PredInt.xls Regression Statistics Multiple R.8284 R Square.6863 Adjusted R Square.6689 Standard Error 8.834 Observations 77 ANOVA df SS MS F Significance (p-value) for F Regression 41292.232573.58 39.3788. Residual 72474.56765.34121 Total 76 14996.8197.3263 Dependent (Criterion) Variable: rating Coef-ficients Standard Error t Stat P-value (2-tails) Lower 95% Upper 95% X Values for Prediction Intercept 77.76 6.263 12.416. 65.276 9.245 calories -.337.59-5.753. -.454 -.22 12 fat -2.571 1.84-2.372.2-4.732 -.41 1 fiber 2.324.436 5.328. 1.455 3.194 5 Shelf -5.414 2.185-2.477.16-9.771-1.58 Confidence Level Prediction Interval for a Single Observation Predicted 46.376.95 of rating, with the X Values that you Standard Error 8.299 enter in the yellow boxes. Lower 95% 29.833 Upper 95% 62.919 Confidence Interval for Expected rating Fit 46.376 while holding X constant at the values that you Standard Error 1.878 enter in the yellow boxes. Lower 95% 42.632 Upper 95% 5.12 17. What is R 2? What does this value mean?.6863. This means that 68.63% of the observed variation in the healthiness ratings can be explained by the calories, fat, and fiber per serving in addition to the shelf on which the cereal appears. 18. Estimate the healthiness rating of a cereal with 1 calories, 2 grams of fat, grams of fiber per serving that appears on the low shelf. y ˆ = 77.76.337*1 2.571* 2 2.324* 5.414*1 = 33.54 19. Test to determine if the number of fat grams per serving is a significant linear predictor of the healthiness rating. State the null and alternative hypothesis, test statistic, p-value, decision and conclusion. Use α =.5. Ho: β = Ha: β 6

Test statistic: -2.372 p-value:.2 Decision: Since p-value < α, reject Ho. There is enough evidence at the 5% level of significance to conclude that the number of fat grams is a significant linear predictor of the healthiness rating of breakfast cereals. 2. State and interpret the 95% confidence interval for estimating the population slope coefficient of the variable fiber. The 95% confidence interval is given by (1.455, 3.194). We are 95% confident that a one gram increase in fiber per serving gives an increase in the population average cereal rating of between 1.455 and 3.194 points when comparing cereals with the same number of calories and fat grams per serving that appear on the same shelf. 21. State and interpret the 95% confidence interval for estimating the population slope coefficient of the variable shelf. The 95% confidence interval is given by (-9.771, -1.58). When comparing cereals with the same number of calories, fat, and fiber per serving, cereals on the low shelf have a population average rating of between 1.58 and 9.771 points lower than cereals on higher shelves. 22. Interpret the slope coefficient for the variable calories. For each additional calorie per serving contained in a breakfast cereal, the predicted average rating decreases by.337 points when comparing cereals with the same amount of fat and fiber per serving that appear on the same shelf in the grocery store. 7