AP Statistics Solutions to Packet 14

Size: px
Start display at page:

Download "AP Statistics Solutions to Packet 14"

Transcription

1 AP Statistics Solutions to Packet 4 Inference for Regression Inference about the Model Predictions and Conditions

2 HW #,, 6, 7 4. AN ETINCT BEAST, I Archaeopteryx is an extinct beast having feathers like a bird but teeth and a long bony tail like a reptile. Here are the lengths in centimeters of the femur (a leg bone) and the humerus (a bone in the upper arm) for the five fossil specimens that preserve both bones: Femur: Humerus: The strong linear relationship between the lengths of the two bones helped persuade scientists that all five specimens belong to the same species. (a) Examine the data. Make a scatterplot with femur length as the explanatory variable. Use your calculator to obtain the correlation r and the equation of the least-squares regression line. Do you think the femur length will allow good prediction of humerus length? The correlation is r = 0.994, and linear regression gives yˆ = x The scatterplot below shows a strong, positive, linear relationship, which is confirmed by r. (b) Explain in words what the slope β of the true regression line says about Archaeopteryx. What is the estimate of β from the data? What is your estimate of the intercept α of the true regression line? β represents how much we can expect the humerus length to increase when femur length increases by cm, b (the estimate of β ) is.969, and the estimate of α is a =.660. (c) Calculate the residuals for the five data points. Check that their sum is 0 (up to roundoff error.) Use the residuals to estimate the standard deviation σ in the regression model. You have now estimated all three parameters in the model. The residuals are 0.86, 0.668,.045, 0.940, and 0.90; the sum is (but carrying a different number of digits might change this). Squaring and summing the residuals gives.79, so that s =.79/=.98.

3 4. BACKPACKS Body weights and backpack weights were collected for eight students. Weight (lbs): Backpack weight (lbs): These data were entered into a statistics package and least-squares regression of backpack weight on body weight was requested. Here are the results: Predictor Constant BodyWT Coef Stdev t-ratio 4.. P s =.70 R-sq = 6.% R-sq(adj) = 57.0% (a) What is the equation of the least-squares line? (Hint: Look for the column Coef. What is the intercept? What is the slope? backpack weight = (body weight). The intercept is 6.65 and the slope is (b) The model for regression inference has three parameters, which we call α, β, and σ. Can you determine the estimates for α and β from the computer output? What are they? The estimate for α is the intercept of the least-squares line, that is, The estimate for β is the slope of the least-squares line, that is, (c) The computer output reports that s =.70. This is an estimate of the parameter σ. Use the formula for s to verify the computer s value of s. The estimate for σ is s resid s = = =.695 n 6

4 4.6 AN ETINCT BEAST, II Refer to exercise 4.. Below is part of the output from the S-PLUS statistical software when we regress the length y of the humerus on the length x of the femur. Coefficients: (Intercept) Femur Value Std. Error t value Pr(> t ) (a) What is the equation of the least-squares regression line? yˆ = x (b) We left out the t statistic for testing H 0 : β = 0 and its P-value. Use the output to find t. b.969 t = = = SE b (c) How many degrees of freedom does t have? Use Table C to approximate the P-value of t against the one-sided alternative H a : β > 0. df = ; since t >.9, we know that p < (d) Write a sentence to describe your conclusions about the slope of the true regression line. There is very strong evidence that β > 0, that is, that the line is useful for predicting the length of the humerus given the length of the femur. (e) Determine a 99% confidence interval for the true slope of the regression line. (Show your calculation.) Interpret the interval. For df =, the critical value for a 99% confidence interval is t * = The interval is.969 ± (5.84)(0.075) or.969 ± 0.49, that is, to.659. We are 99% confident that the true slope of the LSRL of the length of humerus on the length of femur is between and.66. 4

5 4.7 JET SKIS, I Data for the number of jet skis in use and number of fatalities for the years 987 to 000 are given below. Year Number in use Accidents Fatalities ,756 6,88 78,50 4,76 05,95 7,8 454, , , , ,6,5,650,6,00 4,08 4,00 (a) Formulate null and alternative hypotheses about the slope of the true regression line. State a onesided alternative hypothesis. H0: β = 0 (there is no association between number of jet skis in use and number of fatalities). Ha: β > 0 (there is a positive association between number of jet skis in use and number of fatalities) (b) What conditions or assumptions are necessary in order to perform a linear regression test of significance? Are these reasonable assumptions in this situation? y responses are independent not given, proceed with caution. True relationship is linear yes σ is constant yes y varies normally - yes (c) Perform a linear regression t test. Report the t statistic, the degrees of freedom, and the P-value. Write your conclusion in plain language. LinRegTTest (TI-84) reports that t = 7.6 with df = 8. The P-value is With the earlier caveat, there is sufficient evidence to reject H0 and conclude that there is an association between year and number of fatalities. As the number of jet skis in use increases, the number of fatalities increases. (d) Determine a 98% confidence interval for the true slope of the regression line. (Show your calculation.) Write your conclusion in plain language. The confidence interval takes the form b ± t * SEb. With t * =.84, and SEb = , the 98% confidence interval is approximately ( , ). We are 98% confident that the true slope of the LSRL of fatalities on number of jet skis in use in thousands is between 0.04 and

6 HW # 9, DOES FAST DRIVING WASTE FUEL? The table below gives data on the fuel consumption of a small car at various speeds from 0 to 50 kilometers per hour. Is there evidence of straight-line dependence between speed and fuel use? Make a scatterplot and use it to explain the result of your test. Speed (km/h) Fuel used (liters/00km) Speed (km/h) Fuel used (liters/00km) Regression of fuel consumption on speed gives b = , SEb = 0.04, and t = 0.6. With df =, we see that p > (0.5) = 0.50 (software reports 0.54), so we have no evidence to suggest a straight-line relationship. While the relationship between these two variables is very strong, it is definitely not linear. 6

7 4.0 The table below presents data on the relationship between the speed of runners (x, in feet per second) and the number of steps y that they take in a second. Speed (ft/s): Steps per second: Here is part of the Data Desk regression output for these data: R-squared = 99.8% s = with 7 = 5 degrees of freedom Variable Constant Speed Coefficient s.e. of coefficient t-ratio Prob < < (a) How can you tell from this output, even without the scatterplot, that there is a very strong straightline relationship between running speed and steps per second? r is very close to, which means that nearly all the variation in steps per second is accounted for by foot speed. Also, the P-value for β is small. (b) What parameter in the regression model gives the rate at which steps per second increase as running speed increases? Find and interpret a 99% confidence interval for this rate. β (the slope) is this rate; the estimate is listed as the coefficient of Speed, Using a t(5) distribution the confidence interval is ± (4.0)(0.006) = to We are 99% confident that the true slope of the LSRL of steps per second on running speed is between 0.07 and

8 4. THE LEANING TOWER OF PISA The Leaning Tower of Pisa leans more as time passes. Here are measurements of the lean of the tower of the years 975 to 987. The lean is the distance between where a point on the tower would be if the tower were straight and where it actually is. The distances are tenths of a millimeter is excess of.9 meters. For example, the 975 lean, which was.964 meters, appears in the table as 64. We use only the last two digits of the year as our time variable. Year: Lean: Here is part of the output from the Data Desk regression procedure with year as the explanatory variable and lean as the response variable: Variable Constant year Coefficient s.e. of coefficient t-ratio prob 0.0 < (a) Plot the data. Briefly describe the shape, strength, and direction of the relationship. The tower is tilting at a steady rate. The plot (below) shows a strong positive linear relationship. (b) The main purpose of the study is to estimate how fast the tower is tilting. What parameter in the regression model gives the rate at which the tilt is increasing, in tenths of a millimeter per year? β (the slope) is this rate; the estimate is listed as the coefficient of year : (c) We want a 95% confidence interval for this rate. How many degrees of freedom does t have? Find the critical value t* and the confidence interval. Interpret the interval. df = ; t * =.0; ± (.0)(0.099) = to We are 95% confident that the true slope of the LSRL of tilt on year is between 8.6 and 0.0 8

9 4. THE GENTLE MANATEE The relationship between the number of powerboats registered and the number of manatees killed each year was explored in Chapter. We will revisit the data below: Year Powerboat registrations (000) Manatees killed Year Powerboat registrations (000) Manatees killed We conducted inference on the manatee data earlier, but was this prudent? Check the conditions, and report your interpretations. The major difficulty is that the observations are not independent. The number of powerboat registrations for any year is related to the number of registrations for the previous year. The other conditions can be assumed to be satisfied. The true relationship is linear The standard deviation of the response about the true line is the sam everywhere The response varies normally about the true regression line PISA, PISA! In Exercise 4. we regressed the lean of the Leaning Tower of Pisa on year to estimate the rate at which the tower is tilting. Here are the residuals from that regression, in order by years across the rows: Use the residuals to check the regression conditions, and describe your findings. Is the regression in Exercise 4. trustworthy? The number of points is so small that it is hard to judge much from the stemplot. The scatterplot of residuals vs. year does not suggest any problems. The regression in Exercise 4. should be fairly reliable. 9

10 4.5 DO HEAVIER PEOPLE BURN MORE ENERGY? Metabolic rate, the rate at which the body consumes energy, is important in studies of weight gain, dieting, and exercise. Lean body mass is an important influence on metabolic rate. Men and women show a similar pattern, so we will ignore gender. Here are the data on mass (in kilograms) and metabolic rate (in calories): Mass: Rate: Mass: Rate: Use your calculator or software to analyze these data. Make a scatterplot and find the least-squares line. Give a 90% confidence interval for the slope β and explain clearly what your interval says about the relationship between lean body mass and metabolic rate. Find the residuals and examine them. Are the conditions for regression inference met? The scatterplot (below) shows a positive association. The regression line is yˆ = x the linear relationship with body mass accounts for r = 74.8% of the variation in metabolic rate. Minitab output (on the next page) reports b = and SEb =.786; with df = 7, the critical value is t * =.740, so the 90% confidence interval for β is ± (.740)(.786) = 0.9 to.47 cal/kg. For each additional kilogram of mass, metabolic rate increases by about 0 to calories. The residuals are listed on the next page (in order, down the columns). A stemplot (on the next page) suggests that the distribution of residuals is right-skewed, and the largest residual may be an outlier. A scatterplot (on the next page) of the residuals against the explanatory variable gives some hint that the variation about the line is not constant (in violation of the regression assumptions). However, the three highest residuals account for most of that impression (as well as the skewness of the distribution), so these three individuals may need to be examined further. 0

11

12 HW #4 9, 4.9 BEAVERS AND BEETLES Ecologists sometimes find rather strange relationships in our environment. One study seems to show that beavers benefit beetles. The researchers laid out circular plots, each four meters in diameter, in an area where beavers were cutting down cottonwood trees. In each plot, they measured the number of stumps from trees cut by beavers and the number of cluster of beetle larvae. Here are the data: Stumps: Beetle Larvae: Stumps: Beetle Larvae: (a) Make a scatterplot that shows how the number of beaver-caused stumps influences the number of beetle larvae clusters. What does your plot show? Stumps (the explanatory variable) should be on the horizontal axis; the plot shows a positive linear association. (b) Here is the Minitab regression output for these data: Predictor Constant Stumps Coef Stdev.85.6 T P s = 6.49 R-sq = 8.9% Find the least-squares regression line and draw in on your plot. What percent of the observed variation in beetle larvae counts can be explained by straight-line dependence on beaver stump counts? The regression line is yˆ = x. Regression on stump counts explains 8.9% of the variation in the number of beetle larvae. (c) Is there strong evidence that beaver stumps help explain beetle larvae counts? Give appropriate statistical evidence to support your conclusion. Our hypotheses are H0: β = 0 versus Ha: β 0, and the test statistic is t = 0.47 (df = ). The output shows p = 0.000, so we know that p < ; we have strong evidence that beaver stump counts help explain beetle larvae counts.

13 4. WEEDS AMONG THE CORN Lamb s quarter is a common weed that interferes with the growth of corn. An agriculture researcher planted corn at the same rate in 6 small plots of ground, then weeded the plots by hand to allow a fixed number of lamb s quarter plants to grow in each meter of corn row. No other weeds were allowed to grow. Here are the yields of corn (bushels per acre) in each of the plots: Weeds per meter Corn yield Weeds per meter Corn yield Weeds per meter Corn yield Weeds per meter Corn yield Use your calculator or software to analyze these data. (a) Make a scatterplot and find the least-squares line. What percent of the observed variation in corn yield can be explained by a linear relationship between yield and weeds per meter? Scatterplot below. Regression gives yˆ = x; the linear relationship explains about r = 0.9% of the variation in yield. (b) Is there good evidence that more weeds reduce corn yield? The t statistic for testing H0: β = 0 vs. Ha : β < 0 is t =.9; with df = 4, the P-value is We have some evidence that weeds influence corn yields, but it is not strong enough to meet the usual standards of statistical significance. (c) Explain from your findings in (a) and (b) why you expect predictions based on this regression to be quite imprecise. Predict the mean corn yield under these experimental conditions when there are 6 weeds per meter of row. The small value of r and the lack of significance of the t test indicate that this regression has little predictive use. When x = 6, y ˆ = 59.9 bu/acre; the 95% confidence interval with t * =.45 and : SE µ ˆ =.54 is 59.9 ± (.45) (.54). The width of this interval is another indication that the model has little practical use.

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management KSTAT MINI-MANUAL Decision Sciences 434 Kellogg Graduate School of Management Kstat is a set of macros added to Excel and it will enable you to do the statistics required for this course very easily. To

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Module 7 Test Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. You are given information about a straight line. Use two points to graph the equation.

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

MTH 140 Statistics Videos

MTH 140 Statistics Videos MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) All but one of these statements contain a mistake. Which could be true? A) There is a correlation

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

11. Analysis of Case-control Studies Logistic Regression

11. Analysis of Case-control Studies Logistic Regression Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5 Using Your TI-83/84 Calculator: Linear Correlation and Regression Elementary Statistics Dr. Laura Schultz This handout describes how to use your calculator for various linear correlation and regression

More information

Chapter 7 Section 7.1: Inference for the Mean of a Population

Chapter 7 Section 7.1: Inference for the Mean of a Population Chapter 7 Section 7.1: Inference for the Mean of a Population Now let s look at a similar situation Take an SRS of size n Normal Population : N(, ). Both and are unknown parameters. Unlike what we used

More information

Name: Date: Use the following to answer questions 3-4:

Name: Date: Use the following to answer questions 3-4: Name: Date: 1. Determine whether each of the following statements is true or false. A) The margin of error for a 95% confidence interval for the mean increases as the sample size increases. B) The margin

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics UNIVERSITY OF DUBLIN TRINITY COLLEGE Faculty of Engineering, Mathematics and Science School of Computer Science & Statistics BA (Mod) Enter Course Title Trinity Term 2013 Junior/Senior Sophister ST7002

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

(More Practice With Trend Forecasts)

(More Practice With Trend Forecasts) Stats for Strategy HOMEWORK 11 (Topic 11 Part 2) (revised Jan. 2016) DIRECTIONS/SUGGESTIONS You may conveniently write answers to Problems A and B within these directions. Some exercises include special

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394

BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394 BA 275 Review Problems - Week 5 (10/23/06-10/27/06) CD Lessons: 48, 49, 50, 51, 52 Textbook: pp. 380-394 1. Does vigorous exercise affect concentration? In general, the time needed for people to complete

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

4. Multiple Regression in Practice

4. Multiple Regression in Practice 30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

More information

Homework 8 Solutions

Homework 8 Solutions Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

AP Statistics. Chapter 4 Review

AP Statistics. Chapter 4 Review Name AP Statistics Chapter 4 Review 1. In a study of the link between high blood pressure and cardiovascular disease, a group of white males aged 35 to 64 was followed for 5 years. At the beginning of

More information

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015 Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Analysis of Variance ANOVA

Analysis of Variance ANOVA Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

More information

An analysis method for a quantitative outcome and two categorical explanatory variables.

An analysis method for a quantitative outcome and two categorical explanatory variables. Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

More information

Example G Cost of construction of nuclear power plants

Example G Cost of construction of nuclear power plants 1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Chapter 7 Section 1 Homework Set A

Chapter 7 Section 1 Homework Set A Chapter 7 Section 1 Homework Set A 7.15 Finding the critical value t *. What critical value t * from Table D (use software, go to the web and type t distribution applet) should be used to calculate the

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Math 108 Exam 3 Solutions Spring 00

Math 108 Exam 3 Solutions Spring 00 Math 108 Exam 3 Solutions Spring 00 1. An ecologist studying acid rain takes measurements of the ph in 12 randomly selected Adirondack lakes. The results are as follows: 3.0 6.5 5.0 4.2 5.5 4.7 3.4 6.8

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

2 Sample t-test (unequal sample sizes and unequal variances)

2 Sample t-test (unequal sample sizes and unequal variances) Variations of the t-test: Sample tail Sample t-test (unequal sample sizes and unequal variances) Like the last example, below we have ceramic sherd thickness measurements (in cm) of two samples representing

More information

Introduction. Chapter 14: Nonparametric Tests

Introduction. Chapter 14: Nonparametric Tests 2 Chapter 14: Nonparametric Tests Introduction robustness outliers transforming data other standard distributions nonparametric methods rank tests The most commonly used methods for inference about the

More information

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7 Using Your TI-83/84/89 Calculator: Linear Correlation and Regression Dr. Laura Schultz Statistics I This handout describes how to use your calculator for various linear correlation and regression applications.

More information

STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico. Fall 2013

STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico. Fall 2013 STAT 145 (Notes) Al Nosedal anosedal@unm.edu Department of Mathematics and Statistics University of New Mexico Fall 2013 CHAPTER 18 INFERENCE ABOUT A POPULATION MEAN. Conditions for Inference about mean

More information

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question. ch12 practice test 1) The null hypothesis that x and y are is H0: = 0. 1) 2) When a two-sided significance test about a population slope has a P-value below 0.05, the 95% confidence interval for A) does

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

You have data! What s next?

You have data! What s next? You have data! What s next? Data Analysis, Your Research Questions, and Proposal Writing Zoo 511 Spring 2014 Part 1:! Research Questions Part 1:! Research Questions Write down > 2 things you thought were

More information

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams?

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams? 1. Phone surveys are sometimes used to rate TV shows. Such a survey records several variables listed below. Which ones of them are categorical and which are quantitative? - the number of people watching

More information

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices: Doing Multiple Regression with SPSS Multiple Regression for Data Already in Data Editor Next we want to specify a multiple regression analysis for these data. The menu bar for SPSS offers several options:

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

More information

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference)

Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Chapter 45 Two-Sample T-Tests Allowing Unequal Variance (Enter Difference) Introduction This procedure provides sample size and power calculations for one- or two-sided two-sample t-tests when no assumption

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

Pearson s Correlation

Pearson s Correlation Pearson s Correlation Correlation the degree to which two variables are associated (co-vary). Covariance may be either positive or negative. Its magnitude depends on the units of measurement. Assumes the

More information

Statistics 2014 Scoring Guidelines

Statistics 2014 Scoring Guidelines AP Statistics 2014 Scoring Guidelines College Board, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks of the College Board. AP Central is the official online home

More information

Relationships Between Two Variables: Scatterplots and Correlation

Relationships Between Two Variables: Scatterplots and Correlation Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)

More information

How Far is too Far? Statistical Outlier Detection

How Far is too Far? Statistical Outlier Detection How Far is too Far? Statistical Outlier Detection Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 30-325-329 Outline What is an Outlier, and Why are

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information