Directions: Answer the following questions on another sheet of paper

Similar documents
Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Exercise 1.12 (Pg )

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Chapter 7: Simple linear regression Learning Objectives

We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation and Regression

Homework 8 Solutions

Simple linear regression

Relationships Between Two Variables: Scatterplots and Correlation

Example: Boats and Manatees

What does the number m in y = mx + b measure? To find out, suppose (x 1, y 1 ) and (x 2, y 2 ) are two points on the graph of y = mx + b.

Section 3 Part 1. Relationships between two numerical variables

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

The importance of graphing the data: Anscombe s regression examples

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

Univariate Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

The correlation coefficient

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared.

Session 7 Bivariate Data and Analysis

Section 1.5 Linear Models

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

EXPERIMENT 3 Analysis of a freely falling body Dependence of speed and position on time Objectives

MTH 140 Statistics Videos

CURVE FITTING LEAST SQUARES APPROXIMATION

MULTIPLE REGRESSION EXAMPLE

a) Find the five point summary for the home runs of the National League teams. b) What is the mean number of home runs by the American League teams?

Correlation key concepts:

1/27/2013. PSY 512: Advanced Statistics for Psychological and Behavioral Research 2

FREE FALL. Introduction. Reference Young and Freedman, University Physics, 12 th Edition: Chapter 2, section 2.5

1.2 GRAPHS OF EQUATIONS. Copyright Cengage Learning. All rights reserved.

Chapter 23. Inferences for Regression

Scatter Plot, Correlation, and Regression on the TI-83/84

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

Dealing with Data in Excel 2010

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Scatter Plots with Error Bars

Describing Relationships between Two Variables

The Dummy s Guide to Data Analysis Using SPSS

Statistics 151 Practice Midterm 1 Mike Kowalski

A. Test the hypothesis: The older you are, the more money you earn. Plot the data on the scatter plot below, choosing appropriate scales and labels.

Foundations for Functions

2. Simple Linear Regression

Coordinate Plane, Slope, and Lines Long-Term Memory Review Review 1

Regression Analysis: A Complete Example

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

Linear Regression. use waist

the Median-Medi Graphing bivariate data in a scatter plot

Homework #1 Solutions

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

table to see that the probability is (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: = 1.

MEASURES OF VARIATION

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Copyright 2013 by Laura Schultz. All rights reserved. Page 1 of 7

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Simple Predictive Analytics Curtis Seare

ch12 practice test SHORT ANSWER. Write the word or phrase that best completes each statement or answers the question.

Father s height (inches)

A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Physics Lab Report Guidelines

Association Between Variables

Module 3: Correlation and Covariance

Simple Linear Regression

Elementary Statistics Sample Exam #3

Tutorial for the TI-89 Titanium Calculator

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y X

2013 MBA Jump Start Program. Statistics Module Part 3

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Simple Regression Theory II 2010 Samuel L. Baker

containing Kendall correlations; and the OUTH = option will create a data set containing Hoeffding statistics.

The Correlation Coefficient

STAT 350 Practice Final Exam Solution (Spring 2015)

Worksheet A5: Slope Intercept Form

Statistics E100 Fall 2013 Practice Midterm I - A Solutions

CALCULATIONS & STATISTICS

Descriptive Statistics

17. SIMPLE LINEAR REGRESSION II

Once saved, if the file was zipped you will need to unzip it. For the files that I will be posting you need to change the preferences.

HSPA 10 CSI Investigation Height and Foot Length: An Exercise in Graphing

II. DISTRIBUTIONS distribution normal distribution. standard scores

Chapter 9 Descriptive Statistics for Bivariate Data

Curve Fitting in Microsoft Excel By William Lee

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Graphing Linear Equations

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

Name: Date: Use the following to answer questions 2-3:

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Stat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb

Part 2: Analysis of Relationship Between Two Variables

Module 5: Multiple Regression Analysis

ALGEBRA I (Common Core) Thursday, January 28, :15 to 4:15 p.m., only

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

USING A TI-83 OR TI-84 SERIES GRAPHING CALCULATOR IN AN INTRODUCTORY STATISTICS CLASS

Transcription:

Module 3 Review Directions: Answer the following questions on another sheet of paper Questions 1-16 refer to the following situation: Is there a relationship between crime rate and the number of unemployment rate among young men? The data below is for several US cities, collected by the FBI's Uniform Crime Report and other government agencies. The first column is the unemployment rate (as a percentage) for men aged 16-24. The second column is the crime rate (number of offenses reported to police per million population) city unemployment rate crime rate Allentown 23.1 195 Busytown 9.4 105 Charleston 14.3 187 Daisyville 12.5 165 Easyville 11.2 124 1. What is the explanatory variable? 2. What is the response variable? 3. Make a scatterplot, then describe it. 4. Compute (with a calculator) the mean and standard deviation of the unemployment rate 5. Compute (with a calculator) the mean and standard deviation of the crime rate 6. Compute the correlation (r), then explain what the result means. 7. Compute the least-squares regression line 8. In the least-squares regression line, (from problem 7), which number represents the slope? Explain the meaning of this number, in the context of the problem. Include units in your answer. 9. In the least-squares regression line, (from problem 7), which number represents the y-intercept? Explain the meaning of this number, in the context of the problem. Include units in your answer. 10. Suppose the unemployment rate for young men of another city, Fairview, is 13.6. What would you predict the crime rate to be, according to the least-squares regression line? 11. How confident are you of your prediction in question 10? Explain your answer. 12. Suppose Fairview's actual crime rate is 154 incidents per million people. What is the residual? Explain what a negative residual would mean. Explain what a positive residual would mean. 13. If you had to compute the SSE by hand, show what the first two rows of the table would look like.

14. Does the scatterplot, regression line, etc. indicate that the unemployment rate among young males *causes* crime? Why or why not? If it doesn't, what does the scatterplot tell you about the relationship between unemployment and crime? 15. What are some confounding variables in this situation? 16. Give an example of extrapolation in this situation. Explain why the extrapolation is not necessarily valid. 17. (multiple choice) Measurements on young children in Mumbai, India, found this least-squares regression line for predicting the height y from armspan x: y = 6.4 + 0.93x. All measurements are in centimeters. How much, on the average, does height increase for each additional centimeter of armspan? a) 0.15 cm b) 6.40 cm c) 2.00 cm d) 0.93 cm e) 7.33 cm Questions 18-19 refer to the following scatterplot: 100 90 80 70 60 50 40 30 20 10 0 0 5 10 15 18. Describe the scatterplot: 19. In the scatterplot from the previous question, suppose r = 0.7. If an outlier was added to the scatterplot, at the point (10, 10), would r increase, decrease or stay about the same? 20. A researcher wanted to find the relationship between a weight and height of middle-aged women. Suppose the mean height was 168 cm, the standard deviation of the height was 4.5 cm. Suppose the mean weight was 58 kg and the standard deviation of the weight was 5.1 kg. Suppose r = 0.6153. a) Find the equation of the best-fit line. Let x = height and y = weight b) If a woman was 174 cm tall, use the best-fit line from part (a) to predict her weight. c) How reliable is your prediction from part (b)? Explain using concepts about correlation.

21. Match the following graphs to the following correlations (one value of r will not be used) a) r = 0.97 b)r = -0.52 c) r = 0.76 d) r = 0.04 e) r = -0.96 1 2 3 4 22. (multiple choice) The points on a scatterplot lie very close to the line y = 4-3x. The correlation between x and y is close to... a) -3 b) -1 c) 0 d) 1 e) 4 23. Below is a scatterplot. a)use the scatterplot below to estimate the slope of the least squares regression line b) Use the scatterplot below to estimate the y-intercept c) use parts (a) and (b) to write out the equation of least squares regression line

Answers 1. Unemployment rate. This is the variable being used to try to explain and/or predict the crime rate. 2. crime rate. This is the variable in response to the unemployment rate. 3. The direction is positive. There is a moderate amount of scatter. The form looks curvilinear, but there are only 5 points, so it s hard to establish any definite trends. 4. Mean unemployment rate: x = 14.1 percent SD of unemployment rate: s x = 5.34 percentage points 5. Mean of crime rate: y =155.2 incidents per million people SD of crime rate: s y = 39.32 incidents per million people 6. r = 0804. (rounded off). This means the correlation is fairly strong and the direction is positive. Below ae the calculations: unemployment rate (x) crime rate (y) 23.1 195 9 39.8 358.2 9.4 105-4.7-50.2 235.94 14.3 187 0.2 31.8 6.36 12.5 165-1.6 9.8-15.68 11.2 124-2.9-31.2 90.48 675.3 total s 39.32 7. b r y 0.804 5.92 s 5.34 x a y bx 155.2 (5.92)(14.1) 71.7 0.804 The least-squares regression line is (using symbols) yˆ 71.7 5.92x Or (using words) predicted crime rate = 71.1 + 5.92(unemployment rate) divide by 5.34, then by 39.32, then by 4

8. The slope is 5.92. It means each time the unemployment increases by one percentage point, the crime rate will increase by 5.92 incidents per million people. 9. The y-intercept is 71.1. This means if the unemployment rate was zero, then the predicted crime rate would be 71.1 incidents per million people. 10. yˆ 71.7 5.92x = 71.7 + (5.92)(13.6) = 152.2 The predicted crime rate would be 152.2 incidents per million people. 11. By looking at the scatterplot, there is a moderate to amount of scatter in the plot. The prediction would be somewhat accurate. However (this is just a side comment) the sample size of only 5 cities is not very large. With a larger sample we could have a more confident prediction. 12. Residual = y yˆ = 154.0 152.2 = 1.8 incidents per million people. The residual is positive, which means: The actual data is 1.8 units higher than the predicted amount. A negative residual would mean the actual data is lower than the predicted amount. 13. x y y residual 23.1 195 208.5-13.5 9.4 105 127.4-22.4 Residual squared 182.25 501.76 14. No, because correlation does not imply causation. There could be other variables that cause both the unemployment rate and the crime rate to increase together. However, the scatterplot DOES tell us that there is some kind of relationship between unemployment rate and crime rate. 15. There are many possible answers. Here are some examples: level of education of the young men, drug use, disabilities. Anything that increases both the unemployment and the crime rate would be a confounding variable. 16. Any example where the unemployment rate is outside the range of the given data (less than 9% or more than 23%) would be extrapolation. This is not necessarily valid because there is no data to show the crime rate will continue to increase in the same way. 17. d 18. Direction is positive, strength is moderate to strong, form is linear 19. It would decrease because the outlier would create a greater amount of scatter and a weaker correlation. r is sensitive to outliers. s 5.1 20. a) b r y 0.6153 0.69734 s 4.5 x a y ax 58 0.69734(168) 59.15 yˆ 59.15 0.69734x b) yˆ 59.15 0.69734x = -59.15+0.69734*174 = 62.18 kg c) This is not too reliable because r is low. 21. 1. e 2. d 3. a 4. c

22. d 23. a) slope is about -0.8 b) y-intercept is about 85 c) yˆ 85 0.8x