Chapter 23. Inferences for Regression

Similar documents

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Simple linear regression

Simple Linear Regression, Scatterplots, and Bivariate Correlation

SPSS Guide: Regression Analysis

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Example: Boats and Manatees

Exercise 1.12 (Pg )

Doing Multiple Regression with SPSS. In this case, we are interested in the Analyze options so we choose that menu. If gives us a number of choices:

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Chapter 7: Simple linear regression Learning Objectives

An SPSS companion book. Basic Practice of Statistics

2. Simple Linear Regression

Regression Analysis: A Complete Example

SPSS TUTORIAL & EXERCISE BOOK

Summarizing and Displaying Categorical Data

Table of Contents. Preface

Data Analysis Tools. Tools for Summarizing Data

Correlation and Regression

Directions for using SPSS

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

KSTAT MINI-MANUAL. Decision Sciences 434 Kellogg Graduate School of Management

Bill Burton Albert Einstein College of Medicine April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Data analysis and regression in Stata

Chapter 7 Section 7.1: Inference for the Mean of a Population

The Chi-Square Test. STAT E-50 Introduction to Statistics

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

Correlation and Regression Analysis: SPSS

2013 MBA Jump Start Program. Statistics Module Part 3

January 26, 2009 The Faculty Center for Teaching and Learning

MTH 140 Statistics Videos

There are six different windows that can be opened when using SPSS. The following will give a description of each of them.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Statistics 151 Practice Midterm 1 Mike Kowalski

Regression step-by-step using Microsoft Excel

An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Multiple Regression. Page 24

Two Related Samples t Test

Statistical Analysis Using SPSS for Windows Getting Started (Ver. 2014/11/6) The numbers of figures in the SPSS_screenshot.pptx are shown in red.

STAT 350 Practice Final Exam Solution (Spring 2015)

How To Run Statistical Tests in Excel

ABSORBENCY OF PAPER TOWELS

The Dummy s Guide to Data Analysis Using SPSS

An analysis method for a quantitative outcome and two categorical explanatory variables.

SPSS Explore procedure

Below is a very brief tutorial on the basic capabilities of Excel. Refer to the Excel help files for more information.

Final Exam Practice Problem Answers

2. What is the general linear model to be used to model linear trend? (Write out the model) = or

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Regression and Correlation

GeoGebra Statistics and Probability

EXCEL Tutorial: How to use EXCEL for Graphs and Calculations.

11. Analysis of Case-control Studies Logistic Regression

Scatter Plot, Correlation, and Regression on the TI-83/84

Fairfield Public Schools

Data Analysis. Using Excel. Jeffrey L. Rummel. BBA Seminar. Data in Excel. Excel Calculations of Descriptive Statistics. Single Variable Graphs

SPSS Tests for Versions 9 to 13

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

Curve Fitting in Microsoft Excel By William Lee

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

An introduction to IBM SPSS Statistics

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Part 2: Analysis of Relationship Between Two Variables

Using R for Linear Regression

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

Chapter 3. The Normal Distribution

Pearson's Correlation Tests

SPSS Resources. 1. See website (readings) for SPSS tutorial & Stats handout

How To Write A Data Analysis

Chapter 5 Analysis of variance SPSS Analysis of variance

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

USING A TI-83 OR TI-84 SERIES GRAPHING CALCULATOR IN AN INTRODUCTORY STATISTICS CLASS

Point Biserial Correlation Tests

Using Microsoft Excel to Plot and Analyze Kinetic Data

Forecasting in STATA: Tools and Tricks

Stata Walkthrough 4: Regression, Prediction, and Forecasting

Minitab Tutorials for Design and Analysis of Experiments. Table of Contents

Simple Regression Theory II 2010 Samuel L. Baker

Lin s Concordance Correlation Coefficient

SPSS Manual for Introductory Applied Statistics: A Variable Approach

1.1. Simple Regression in Excel (Excel 2010).

Name: Date: Use the following to answer questions 2-3:

Univariate Regression

MULTIPLE REGRESSION EXAMPLE

Getting Started with Minitab 17

Main Effects and Interactions

Simple Linear Regression Inference

Scatter Plots with Error Bars

Data Analysis for Marketing Research - Using SPSS

Coins, Presidents, and Justices: Normal Distributions and z-scores

A full analysis example Multiple correlations Partial correlations

Simple Predictive Analytics Curtis Seare

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Data Mining Techniques Chapter 5: The Lure of Statistics: Data Mining Using Familiar Tools

Calculator Notes for the TI-Nspire and TI-Nspire CAS

Transcription:

Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily stimulated than others. This may be a sign of higher IQ. Child development researchers explored the relationship between the crying of infants four to ten days old and their later IQ test scores. A snap of a rubber band on the sole of the foot caused the infants to cry. The researchers recorded the crying and measured its intensity by the number of peaks in the most active 20 seconds. They later measured the children s IQ at age three years using the Stanford- Binet IQ test. Do children with higher crying counts tend to have higher IQ? 1. Create a scatterplot: a. Open data set ta23-01.por. b. Click Graphs, scroll to Legacy Dialogs then Scatter/Dot. c. Click on Simple Scatter, then click on Define. d. Move IQ into the Y Axis box since IQ is the response variable e. Move Crycount into the X Axis box since Crycount is the explanatory variable. 149

f. Click OK. The scatterplot will appear in the output window. Inferences for Regression 150

151 Chapter 23 2. Find the least-squares regression line. a. Click Analyze. Scroll to Regression then Linear. b. Move IQ to the Dependent box. c. Move Crycount to the Independent box. d. Click OK.

Inferences for Regression 152 The least squares regression line is given by yˆ = 91.268+ 1. 493 x. The slope of the least squares regression line, 1.493, is found in the Coefficients table under the B column in the row for Crycount. The y-intercept of the least squares regression line, 91.268, is also found in the Coefficients table under the B column in the Constant row. Example 23.7: Beer and blood alcohol The Problem: The EESEE story Blood Alcohol Content describes a study in which 16 student volunteers at the Ohio State University drank a randomly assigned number of cans of beer. Thirty minutes later, a police officer measured their blood alcohol content (BAC) in grams of alcohol per deciliter of blood. The students were equally divided between men and women and differed in weight and usual drinking habits. Because of this variation, many students don t believe that number of drinks predicts blood alcohol well. Steven thinks he can drive legally 30 minutes after he finishes 5 beers. The legal limit for driving is BAC 0.08 in all states. We want to predict Steve s blood alcohol content, using no information except that he drinks 5 beers. 1. Regress BAC on number of beers. a. Open the data set eg23-07.por. b. Click Analyze, scroll down to Regression, then click Linear. c. Move BAC to the Dependent box. d. Move Beers to the Independent box.

153 Chapter 23 2. Display predicted values and residuals. a. Click the Save button at the right of the window. b. Under Predicted Values select Unstandardized. c. Under Residuals select Unstandardized. d. Click Continue. 3. Create a 95% Confidence Interval. a. Click the Statistics button. b. Under Regression Coefficients select Confidence Intervals.

Inferences for Regression 154 c. Click Continue. d. Click OK. A new window will pop up with the output.

155 Chapter 23 e. The predicted values and residuals can be seen on the data sheet as 2 new columns PRE_1 and RES_1. Notice that the predicted BAC when a person drinks 5 beers is 0.07712.

Inferences for Regression 156 Chapter 23 Exercises 23.1 Ebola and gorillas. 23.3 Great Arctic rivers. 23.5 Great Arctic rivers: testing. 23.7 Ebola and gorillas: testing correlation. 23.9 Ebola and gorillas: estimating slope. 23.11 Great Arctic rivers: estimating slope. 23.29 Manatees: conditions for inference. 23.33 Predicting tropical storms. 23.41 Squirrels and their food supply. 23.43 Beavers and beetles.

453 Chapter 23 SPSS Solutions 23.1 Use Graphs, Legacy Dialogs, Scatter/Dot to create a plot of the data. The plot is strongly linear and increasing. We could use Analyze, Correlate, Bivariate to find the correlation, but we also want to find the regression equation, so use Analyze, Regression to compute the regression equation (we ll use the square root of r 2 as the correlation). We can have SPSS find the residuals for us by clicking Save and checking the box for Unstandardized residuals.

454 Model R R Square Model Summary b Adjusted R Square Std. Error of the Estimate 1.962 a.926.908 4.903 a. Predictors: (Constant), Distance b. Dependent Variable: Days Coefficients a Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) -8.088 5.917-1.367.243 Distance 11.263 1.591.962 7.080.002 a. Dependent Variable: Days The correlation is r = 0.962 a very strong relationship. Our estimated slope of b = 11.263 says the virus takes about 11.263 days for each additional home range it must travel. The estimated intercept is a = 8.088. The standard deviation around the regression line is s = 4.90345 (labeled as Std. Error of the Estimate). To sum (or find the mean of) the residuals (created as variable RES_1) use Analyze, Descriptive Statistics, Descriptives. With a mean of 0.0000000, the sum must be 0. Descriptive Statistics N Minimum Maximum Mean Std. Deviation Unstandardized Residual 6-4.70175 6.03509.0000000 4.38578245 Valid N (listwise) 6

455 23.3 We define a scatterplot of the data in ta23-02. To add the regression line in the graph, double-click for the Chart Editor, then click Elements, Fit line at total. There is an increasing trend in the graph, with lots of scatter. SPSS gives r 2 = 0.112; the relationship is fairly weak. Only 11% of the variation in discharge is explained by time (year); there certainly are other factors involved. Use Analyze, Regression, Linear to fit the regression. Looking ahead, we have asked for confidence intervals for the coefficients using the Statistics button. Unstandardized Coefficients Coefficients a Standardized Coefficients Model B Std. Error Beta t Sig. 1 95% Confidence Interval for B Lower Bound Upper Bound (Constant) -2056.769 1384.687-1.485.143-4824.720 711.181 Year 1.966.704.334 2.794.007.559 3.373 a. Dependent Variable:

456 Model R R Square Model Summary Adjusted R Square Std. Error of the Estimate 1.334 a.112.097 104.003 a. Predictors: (Constant), The regression equation is Discharge = 2057 + 1.97*Year. The regression standard error is s = 104.003. 23.5 From the SPSS output in Exericise 23.3, the test statistic is t = 2.794 with (twosided) P-value 0.007. The one-sided P-value is 0.0035. Since this P-value is less than any standard α, we reject a null hypothesis of no relationship and conclude that these data do show an increase in Arctic river discharge (supporting the global warming hypothesis). 23.7 Refer to the solution for Exercise 23.1 In the SPSS results, we were given t = 7.08 and P = 0.002. Since this is a two-sided P-value, divide by 2. The one-sided P-value is 0.001. Minitab gives the same (two-sided) P-value for the correlation if you use Stat, Basic Statistics, Correlation. If we use Analyze, Correlate, Bivariate, and ask for the one-sided P-value, we have the same result. Correlations Distance Days Distance Days Pearson Correlation 1.000.962 ** Sig. (1-tailed).001 N 6.000 6 Pearson Correlation.962 ** 1.000 Sig. (1-tailed).001 N 6 6.000 **. Correlation is significant at the 0.01 level (1-tailed).

457 23.9 SPSS will find the 95% confidence interval if you redo the regression and click Statistics, then check the box to ask for confidence intervals for the coefficients. Unstandardized Coefficients Coefficients a Standardized Coefficients Model B Std. Error Beta t Sig. 1 95% Confidence Interval for B Lower Bound Upper Bound (Constant) -8.088 5.917-1.367.243-24.516 8.341 Distance 11.263 1.591.962 7.080.002 6.846 15.680 a. Dependent Variable: Days We have the same output as in Exercise 23.1, with the addition of confidence bounds at the right side. Based on this data, Ebola takes between 6.85 and 15.68 days to travel one home range, with 95% confidence. However, the problem asked for 90% confidence. For this, we use Transform, Compute Variable to find t*, then compute the interval by hand. The interval is 11.263 ± 2.132*1.591 = (7.871, 14.655). Based on this data, Ebola takes between 7.87 and 14.66 days to travel one home range, with 90% confidence. 23.11 SPSS gives only 95% confidence intervals for regression parameters. We saw in Exercise 23.3 that a 95% confidence interval for the slope is from 0.559 to 3.373.

458 However, this question asks for a 90% confidence interval. We use Transform, Compute Variable to find t* (degrees of freedom are n 2), then compute the interval by hand. The confidence interval is calculated as b± t* SE( b), giving 1.9662 ±1.670*0.7037, or (0.791, 3.141). We are 90% confident that arctic river discharge increases between 0.791 and 3.141 cubic kilometers per year. Since the low end is positive, we re convinced that discharge is increasing over time. 23.29 To make the stemplot, use Analyze, Descriptive Statistics, Explore. Stem-and-Leaf Plot Frequency Stem & Leaf 1.00-1. 5 1.00-1. 0 6.00-0. 566899 7.00-0. 0111223 8.00 0. 01112244 5.00 0. 55788 1.00 1. 3 1.00 1. 7 Stem width: 10.00 Each leaf: 1 case(s) This plot is pretty symmetric and bell-shaped with no outliers. The Normal assumption is reasonable for these residuals. To make the scatterplot, use Graphs, Legacy Dialogs, Scatter/Dot. Use Residual on the y axis and Boats on the x axis. To add the residual = 0 line, double click in the graph for the Chart Editor, then click Options, Y axis reference line. Close the properties window and the Chart editor. The plot is random (no discernable pattern), so the regression model is reasonable. While pollution may have caused some manatee deaths, the data are labeled as manatees killed by boats, so pollution would not explain more of these deaths.

23.33 We create a scatterplot of Dr. Gray s predictions against actual storms and compute the regression. 459

460 There is a positive relationship seen in the graph; however, there are a couple of years in which he predicted a large number of storms and the actual number was much less. Part (b) asks for a 95% confidence interval for the mean number of storms when Dr. Gray predicts 16 storms. To do this, add a forecast value of 16 in the spreadsheet (SPSS will only create prediction and confidence intervals for values in the spreadsheet), then in the Analyze, Regression, Linear dialog box, click Save and check the box for Mean Prediction Intervals. Model R R Square Model Summary b Adjusted R Square Std. Error of the Estimate 1.529 a.280.247 4.086 a. Predictors: (Constant), b. Dependent Variable: Model 1 Coefficients a Unstandardized Coefficients Standardized Coefficients B Std. Error Beta (Constant) 1.803 3.587.503.620 t Sig. Forecast.903.309.529 2.923.008 a. Dependent Variable: The regression equation is ActualStorms = 1.80 + 0.90*Predicted. With a t statistic of 2.923 and (two-sided) P-value of 0.008 (so, the one-sided P-value is 0.004), the relationship is significantly positive. Return to the data spreadsheet. SPSS has created 95% confidence intervals for each value of Forecast. At the bottom, we see the values for the interval of interest.

461 We predict the mean number of actual storms for years when Professor Gray predicts 16 will be between 12.78 and 19.73, with 95% confidence. If you wanted values for a particular year, you would have checked the box for Individual Prediction Intervals in the Save dialog box. 23.41 We create a scatterplot of the Cones as the X axis variable and Offspring as the Y axis variable using Graphs, Legacy Dialogs, Scatter/Dot. The pattern is roughly linear (there is a fair amount of scatter) and increasing more cones seem to be associated with more offspring.

462 Use Analyze, Regression, Linear to find linear regression and measures of association. We will want to examine the residuals for adequacy of the regression, so click Save and check the box for Unstandardized Residuals. You can also ask for a histogram of the standardized residuals these are z-scores (and a Normal plot of them) in the Plots box. The regression equation is Offspring = 1.415 + 0.44*Cones. The relationship is fairly strong r = 0.756; the cone index explains r 2 = 57.2% of the variation in offspring. Model R R Square Model Summary Adjusted R Square Std. Error of the Estimate 1.756 a.572.542.60031 a. Predictors: (Constant), Cones Coefficients a Standardized Unstandardized Coefficients Coefficients Model B Std. Error Beta t Sig. 1 (Constant) 1.415.252 5.619.000 Cones.440.102.756 4.328.001 a. Dependent Variable: Offspring The relationship is indeed statistically significant; we have t = 4.328 with (two-sided) P = 0.001, so the one-sided P-value is 0.0005.

There are some gaps in the histogram, but with the imposed density curve, the Normal assumption seems reasonable. Note the mean of these is (essentially) 0. Create a scatterplot of the saved residuals against the cone index using Graphs, Legacy Dialogs, Scatter/Dot. This plot shows no definite pattern, so our inference is reliable. 463

464 23.43 Open data file ex05-51. First, create a scatterplot of the data using Graphs, Legacy Dialogs, Scatter/Dot. Enter Stumps as the X variable and Larvae as the Y variable. Give your graph an appropriate title using Titles. We see that these data indicate that there are more beetle larvae with more stumps. Use Analyze, Regression, Linear to fit the line using Stumps as the Independent and Larvae as the Dependent. We d like a 95% confidence interval for the slope (how many more clusters accompany each additional stump), so click Statistics, and check the box for Confidence Intervals. Model R R Square Model Summary Adjusted R Square Std. Error of the Estimate 1.918 a.843.835 6.455 a. Predictors: (Constant), Stumps Unstandardized Coefficients Coefficients a Standardized Coefficients Model B Std. Error Beta t Sig. 1 95% Confidence Interval for B Lower Bound Upper Bound (Constant) -1.286 2.853 -.451.657-7.220 4.647 Stumps 11.894 1.136.916 10.467.000 9.531 14.257 a. Dependent Variable:

465 The regression equation is Larvae = 1.29 + 11.89* Stumps. The relationship is strong; the regression model explains 84.3% of the variability in larvae (the correlation is r = r =.843 =0.918). We are confident that more stumps lead to more larvae because the 95% confidence for the slope is between 9.53 and 14.26 which is well above 0. Our scatterplot of the residuals against Stumps (the predictor variable) indicates no discernable pattern, so this regression model is reasonable.