# Regression. In this class we will:

Save this PDF as:

Size: px
Start display at page:

## Transcription

1 AMS 5 REGRESSION

2 Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be formalized by regression methods. In this class we will: Consider the definition of simple linear regression Find a method to predict an individual value Use the normal curve to estimate the percentile ranks Describe the regression effect Compute the regression errors and its RMS Study the behavior of regression errors

3 Regression The regression method describes how one variable depends on another. The Northern California temperature data have average altitude of 3,524 feet and a SD of 1,839 feet; average temperature of 70.3 degrees and SD 6.5 degrees. The correlation between temperature and altitude is

4 Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be formalized by regression methods. In this class we will: Consider the definition of simple linear regression Find a method to predict an individual value Use the normal curve to estimate the percentile ranks Describe the regression effect Compute the regression errors and its RMS Study the behavior of regression errors

5 Regression The cloud of points shows a mild negative association between the two variables, as does the value of r. Can we use the values of altitude to estimate the average values of temperature?

6 Regression How does the regression line work? Associated with an increase of one SD in x there is an increase of r SDs in y on average. Clearly, if the correlation coefficient is negative, then the average value of y decreases as x increases. In the temperature and altitude example, an increase of height of 1,839 feet produces a increase of = degrees in the average temperature.

7 Regression How do we use the method to predict an individual value? If we consider two variables x and y and we want to predict the value of y for a specific value of x, we use the average value of y that corresponds to the value of x according to the regression method. Example: The first year GPAs and the Math SAT for the students of a university produce the following data average SAT score = 550, SD = 80 average 1st-year GPA = 2.6, SD = 0.6 r = 0.40 We want to predict the 1st-year GPA of a student with a SAT score of 650.

8 Regression The student's SAT score in standard units is = so the score is 1.25 SDs above average. An increase of one SD above the average SAT score produces an increase of 0,4 0,6 GPA points. This implies that our student will have an increase of = 0.3 points of GPA above average. Since the average GPA is 2.6, the predicted GPA is = 2.9 This is the average GPA that we expect for students with STA scores around 650.

9 Regression WARNING: You can use the regression method on new subjects provided that they are similar to the ones that were used to produce the averages, SDs and r used in the regression method. In the previous example the method will not be valid for students of a different institution.

10 Regression We can use the regression method and the normal curve to produce estimates of the percentile ranks. Example: In the previous example suppose a student has a percentile rank of 90% for the SAT scores. That is, only 10% of the scores are higher than his. What is the predicted percentile rank for the 1st-year GPA of this student? Using the normal curve we have that a 90% probability corresponds to z score of 1.3. This means that the student's SAT score is 1.3 SDs above average. This corresponds to being SDs above the average GPA and this corresponds to an accumulated probability, under the normal curve, of approximately 69%.

11 Regression So the percentile rank on 1st-year GPA of a student with a percentile rank on SAT score of 90% is predicted to be 69%. Notice that the student with a SAT percentile rank of 90% was `pulled down' to only 69% by the regression method. Why is that? Suppose the correlation was perfect, r = 1, then 90% will convert to 90%. The other extreme is that there is no correlation, so, in the absence of any information, the best guess is the median or 50% percentile. The regression method produces a rank that is somewhere between these two extremes.

12 Example The shoe size and the heights of 14 men are recorded. The shoe size average is with a SD of The average height is inches with a SD of 2.45 inches. The correlation is What is the average height of a man that uses shoes of size 11.5? We convert 11.5 to standard units = so the shoe size is units above average. This means that the height will be = 1.95 inches above average. So the average height of a man with shoe size 11.5 will be = inches.

13 Regression effect Galton, a British statistician, studied the relationship between the height of the fathers and the sons in 1,078 families. He noticed that tall fathers tended to have shorter sons and short fathers tended to have taller sons. He termed this fact regression to mediocrity. This is where the term regression comes from. Example: Children are tested for IQ before and after taking a preschool program. In both cases the scores average 100 and the SD is 15. So, on average, there seems to be no effect. Nevertheless children below average in the first test had an average gain of 5 IQ and those above average had an average loss of 5 IQ. This is regression effect.

14 Regression effect A model for the test-retest situation is observed test score = true score + chance error Suppose that the chance error can be either positive or negative. Suppose that the true scores in the population follow the normal curve with an average of 100 and a SD of 15. Consider the children who scored 140 on the first test. There are two possibilities: true score below 140, with a positive chance error true score above 140, with a negative chance error Which one is more likely? According to the normal curve, the first possibility is more likely, since the mean is 100 and so the interval above 140 has less probability than the one below 140. Under this scenario, the second test is more likely to produce a value below 140.

15 Regression effect A symmetric situation is valid for those scoring, say, 80 IQ. It is likely that the true test is above 80 with a negative chance error, and so the second score is likely to be above the first. In other words, if a students scores above average in the first test, it is likely that the true score is lower than the observed one. If the student takes the test again, chances are that the second score will be lower than the first. A symmetric situation is true for a person scoring below average in the first test. This explains the regression effect.

16 Regression errors The regression method can be used to predict y from x. But actual values differ from predictions. These are the regression errors. error = actual value of y - predicted value of y Some of the errors defined in this way are positive and some are negative. Reflecting the fact that some observations are above and some are below the regression line. How do we measure the error in a regression? The overall size of the error is measured using the root-meansquare (RMS), as we did to obtain the SD. This is equal to where N is the number of points in the scatter diagram.

17 Regression errors What if we ignore the values of x? Then our prediction for y is the average of y. In this case the RMS error coincides with the SD of y.

18 Computing the RMS error We saw that the error that corresponds to a prediction where the values of x are ignored corresponds to the SD of y. The overall size of the error for a regression using x has to be smaller than the SD. How much smaller? 2 RMS error= 1 r SD of y We observe the following features The units of the RMS error are the same as the units of the variable being predicted. Perfect correlation corresponds to zero RMS error. Zero correlation corresponds to maximum RMS error (equal to SD of y).

19 Computing the RMS error Example 1: In the California temperature example we had that the SD of y is 6.5 degrees and the correlation is -0.76, then degrees 4.22 degrees So, in this case, knowing the altitude reduces the SD from 6.5 to 4.22 degrees. Example 2: In the shoe sizes examples we had that the SD of y is 2.45 inches and the correlation is 0.93, then inches 0.90 inches So we observe that, knowing the shoe size produces a dramatic reduction of the SD from 2.45 to 0.90.

20 Plotting the residuals Prediction errors are usually called residuals. It is important to explore the graphical properties of residuals to find out about the goodness of the fit by the regression line. In a residual plot the x coordinates are the same as for the original data. The y coordinates correspond to the values of the residuals. So there is one point for each point in the original scatter diagram.

21 Thus, if everything is OK with the regression line, we expect to see a cloud of points around the zero line in the y axis. Plotting the residuals We expect to see no trends or clusters in the residuals There should be about the same number of positive as negative residuals A histogram of the residuals should look symmetric around zero

22 Problem The following results are taken from a study of about 1,000 families: average height of husband 68 inches, SD 2.7 inches average height of wife 63 inches, SD 2.5, r 0.25 Predict the height of a wife when the height of her husband is inches The husband is 4 inches above average height. This is 4/2.7 = 1.5 SD above the average. So the wife is predicted to have r 1.5 = this corresponds to = 1 inch inches This the husband is right on the average, so the wife will be right on the average as well.

23 Prediction for data in a vertical strip Example: A law school finds the following relationship between LSAT scores and first-year scores average LSAT score = 162, SD = 6 average first-year score = 68, SD = 10, r=0.60 Q: About what percentage of the students had first-year scores over 75? A: We use the normal curve approximation. Converting to standard units = 0.7 this corresponds to a right hand tail of 14% under the normal curve.

24 Prediction for data in a vertical strip Q: Of the students who scored 165 on the LSAT, about what percentage had first-year scores over 75? A: We first convert to standard units for the x variable: = 0.5 then convert to standard units for the y variable r 0.5= = 0.3 which corresponds to = 3 points above average or 68+3 = 71. Since the data corresponding to a strip are a smaller and more homogeneous sample, the corresponding SD will be smaller. How much smaller?

25 Prediction for data in a vertical strip Example: A law school finds the following relationship between LSAT scores and first-year scores average LSAT score = 162, SD = 6 average first-year score = 68, SD = 10, r=0.60 Q: About what percentage of the students had first-year scores over 75? A: We use the normal curve approximation. Converting to standard units = 0.7 this corresponds to a right hand tail of 14% under the normal curve.

26 Prediction for data in a vertical strip We expect the dispersion in the y variable to be about the same for each vertical strip. This is given by the RMS error, thus the new SD is r SD of y= = 8 points This new SD can be used to convert to standard units = and, using the normal curve, we obtain an area of 31% above 0.5. This is the percentage of students scoring more than 75 in the first year among those who scored 165 in the LSAT. Notice that this percentage is higher than the 14% we obtained before. This is because we have focus on a smaller portion of the sample, obtaining a smaller SD.

27 Prediction for data in a vertical strip In summary, when considering data for a vertical strip: Convert to standard units in the x variable. Obtain the predicted value of the y variable. Calculate the SD for the y variable in the strip using RMS error. Convert to standard units in the y variable and use the normal curve.

28 Slope and intercept All lines can be determined by a slope and an intercept. The intercept is the height of the line when x = 0. The slope is the rate at which y increases, per unit increase in x. If the slope is negative then y decreases as x increases.

29 Slope and intercept How do you get the slope of a regression line? Example: A sample of 555 California men age in 1993 was surveyed to find out about education and income. The data are summarized by average education 12.5 years; SD 4 years average income \$21,500; SD \$16,000; r 0.35 This means that, for every increase of one SD in education, there is an increase of r SD in income. Thus, 4 extra years of education are worth an extra 0.35 \$16,000 = \$5,600 of income. So, each extra year is worth 0.35 \$16, this, is the slope of the regression line. = \$1, 400

30 Slope and intercept The intercept of the regression line is given by the value of y when x = 0. This is 12.5 years below average in education. Since each year costs \$1,400, a man with no education should have an income which is below average by 12.5 years \$1,400 per year = \$17,500 since the average income is \$21,500, the income of a man with no education is \$21,500 -\$17,500 = \$4,000. This is the intercept of the regression line. This corresponds to the change in y associated with one unit increase in x.

31 Slope and intercept This is given by average of y - slope average of x The equation for the regression line is called the regression equation and can be written as y= slope x+ intercept So, for our example, we have that predicted income = \$1,400 per year education + \$4,000

32 Slope and intercept Q: What is the predicted income of a man with an education of 15 years? A: Using the regression equation we have y = \$1, \$4,000 = \$25,000 we can plug in any value of education and obtain the expected income for that level of education. Warning: It is usually a bad idea to use the regression line for extrapolations.

33 Example Back to our shoe size example. The shoe size and the heights of 14 men are recorded. The shoe size average is with a SD of The average height is inches with a SD of 2.45 inches. The correlation is r SD of height The slope of the regression line is = = 1.88 SD of shoe size 1.21 To obtain the intercept we consider a show size of zero. This is units below average and so will correspond to a height that is = inches below average. So it corresponds to a height of = inches. The regression line is height = 1.88 shoe size inches Q: What is the predicted height of a man with a show size of 9? A: Using the regression equation we have inches = inches

34 Least Square Consider a cloud of points produced by obtaining the scatter diagram of observations corresponding to two variables x and y. There are many lines that we can draw through the cloud. Which is the straight line that fits the points best? The regression line is a possible solution to this problem. This is the reason why the regression line is called the least squares line.

35 Least Square Example: Let b be the length of a spring with no load. If a load x is attached to the spring the stretch is proportional to x. Thus the length of the string is y = mx + b. where m and b are constants that depend on the string. An experiment is run to determine the constants for a given spring, the data are shown in the table. The correlation coefficient is r = 0.999, so the points are very close to straight line. But they are not exactly on a straight line. This is probably due to measurement error. The regression line for these data produces estimates of b and m, given, respectively, by the intercept and the slope of the line. The values are m 0.5c per kg, and b cm. These are the least squares estimates of m and b.

36 Problem Find the regression equation for predicting final score from midterm score, based on the following information: average midterm score = 70, SD = 10 average final score = 55, SD = 20, r = 0.60 The slope of the line can be obtained as r SD of final = = 1.2 SD of midterm 10 A score of 0 in the midterm will correspond to a final score that is = 84 units below average. So the intercept is = -29 units of the final score. Thus, the regression equation is final score = 1.2 midterm score - 29

### . 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

### Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

### Descriptive statistics; Correlation and regression

Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

### Correlation & Regression, II. Residual Plots. What we like to see: no pattern. Steps in regression analysis (so far)

Steps in regression analysis (so far) Correlation & Regression, II 9.07 4/6/2004 Plot a scatter plot Find the parameters of the best fit regression line, y =a+bx Plot the regression line on the scatter

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### 17.0 Linear Regression

17.0 Linear Regression 1 Answer Questions Lines Correlation Regression 17.1 Lines The algebraic equation for a line is Y = β 0 + β 1 X 2 The use of coordinate axes to show functional relationships was

### Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

### Chapter 11: r.m.s. error for regression

Chapter 11: r.m.s. error for regression Context................................................................... 2 Prediction error 3 r.m.s. error for the regression line...............................................

### Chapter 5: The normal approximation for data

Chapter 5: The normal approximation for data Context................................................................... 2 Normal curve 3 Normal curve.............................................................

### c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

### The Correlation Coefficient

The Correlation Coefficient Lelys Bravo de Guenni April 22nd, 2015 Outline The Correlation coefficient Positive Correlation Negative Correlation Properties of the Correlation Coefficient Non-linear association

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Chapter 10 - Practice Problems 1

Chapter 10 - Practice Problems 1 1. A researcher is interested in determining if one could predict the score on a statistics exam from the amount of time spent studying for the exam. In this study, the

### AP Statistics Semester Exam Review Chapters 1-3

AP Statistics Semester Exam Review Chapters 1-3 1. Here are the IQ test scores of 10 randomly chosen fifth-grade students: 145 139 126 122 125 130 96 110 118 118 To make a stemplot of these scores, you

### Correlation key concepts:

CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

### The correlation coefficient

The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

### Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Applied Data Analysis. Fall 2015

Applied Data Analysis Fall 2015 Course information: Labs Anna Walsdorff anna.walsdorff@rochester.edu Tues. 9-11 AM Mary Clare Roche maryclare.roche@rochester.edu Mon. 2-4 PM Lecture outline 1. Practice

### Elementary Statistics. Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination

Scatter Plot, Regression Line, Linear Correlation Coefficient, and Coefficient of Determination What is a Scatter Plot? A Scatter Plot is a plot of ordered pairs (x, y) where the horizontal axis is used

### DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

### 4. Describing Bivariate Data

4. Describing Bivariate Data A. Introduction to Bivariate Data B. Values of the Pearson Correlation C. Properties of Pearson's r D. Computing Pearson's r E. Variance Sum Law II F. Exercises A dataset with

### Statistiek II. John Nerbonne. March 24, 2010. Information Science, Groningen Slides improved a lot by Harmut Fitz, Groningen!

Information Science, Groningen j.nerbonne@rug.nl Slides improved a lot by Harmut Fitz, Groningen! March 24, 2010 Correlation and regression We often wish to compare two different variables Examples: compare

### Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

### Name: Date: Use the following to answer questions 2-3:

Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

### AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

### MEASURES OF VARIATION

NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

### Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables Scatterplot; Roles of Variables 3 Features of Relationship Correlation Regression Definition Scatterplot displays relationship

### Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

### Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

### Homework 8 Solutions

Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

### 2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### Unit 7: Normal Curves

Unit 7: Normal Curves Summary of Video Histograms of completely unrelated data often exhibit similar shapes. To focus on the overall shape of a distribution and to avoid being distracted by the irregularities

### CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

### Mind on Statistics. Chapter 3

Mind on Statistics Chapter 3 Section 3.1 1. Which one of the following is not appropriate for studying the relationship between two quantitative variables? A. Scatterplot B. Bar chart C. Correlation D.

### Chapter 3: Data Description Numerical Methods

Chapter 3: Data Description Numerical Methods Learning Objectives Upon successful completion of Chapter 3, you will be able to: Summarize data using measures of central tendency, such as the mean, median,

### Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

### , has mean A) 0.3. B) the smaller of 0.8 and 0.5. C) 0.15. D) which cannot be determined without knowing the sample results.

BA 275 Review Problems - Week 9 (11/20/06-11/24/06) CD Lessons: 69, 70, 16-20 Textbook: pp. 520-528, 111-124, 133-141 An SRS of size 100 is taken from a population having proportion 0.8 of successes. An

### Lesson Lesson Outline Outline

Lesson 15 Linear Regression Lesson 15 Outline Review correlation analysis Dependent and Independent variables Least Squares Regression line Calculating l the slope Calculating the Intercept Residuals and

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Sect The Slope-Intercept Form

Concepts # and # Sect. - The Slope-Intercept Form Slope-Intercept Form of a line Recall the following definition from the beginning of the chapter: Let a, b, and c be real numbers where a and b are not

### CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

### Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

### CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

### Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

### Statistics E100 Fall 2013 Practice Midterm I - A Solutions

STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 1 OF 5 Statistics E100 Fall 2013 Practice Midterm I - A Solutions 1. (16 points total) Below is the histogram for the number of medals won

### STAT 155 Introductory Statistics. Lecture 5: Density Curves and Normal Distributions (I)

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL STAT 155 Introductory Statistics Lecture 5: Density Curves and Normal Distributions (I) 9/12/06 Lecture 5 1 A problem about Standard Deviation A variable

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Worksheet A5: Slope Intercept Form

Name Date Worksheet A5: Slope Intercept Form Find the Slope of each line below 1 3 Y - - - - - - - - - - Graph the lines containing the point below, then find their slopes from counting on the graph!.

### Descriptive Statistics and Measurement Scales

Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

### Unit 8: Normal Calculations

Unit 8: Normal Calculations Summary of Video In this video, we continue the discussion of normal curves that was begun in Unit 7. Recall that a normal curve is bell-shaped and completely characterized

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### Linear Approximations ACADEMIC RESOURCE CENTER

Linear Approximations ACADEMIC RESOURCE CENTER Table of Contents Linear Function Linear Function or Not Real World Uses for Linear Equations Why Do We Use Linear Equations? Estimation with Linear Approximations

### Regents Exam Questions A2.S.8: Correlation Coefficient

A2.S.8: Correlation Coefficient: Interpret within the linear regression model the value of the correlation coefficient as a measure of the strength of the relationship 1 Which statement regarding correlation

### Rescaling and shifting

Rescaling and shifting A fancy way of changing one variable to another Main concepts involve: Adding or subtracting a number (shifting) Multiplying or dividing by a number (rescaling) Where have you seen

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### Appendix E: Graphing Data

You will often make scatter diagrams and line graphs to illustrate the data that you collect. Scatter diagrams are often used to show the relationship between two variables. For example, in an absorbance

### Unit 11: Fitting Lines to Data

Unit 11: Fitting Lines to Data Summary of Video Scatterplots are a great way to visualize the relationship between two quantitative variables. For example, the scatterplot of temperatures and coral reef

### Chapter 8 Graphs and Functions:

Chapter 8 Graphs and Functions: Cartesian axes, coordinates and points 8.1 Pictorially we plot points and graphs in a plane (flat space) using a set of Cartesian axes traditionally called the x and y axes

### Logo Symmetry Learning Task. Unit 5

Logo Symmetry Learning Task Unit 5 Course Mathematics I: Algebra, Geometry, Statistics Overview The Logo Symmetry Learning Task explores graph symmetry and odd and even functions. Students are asked to

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### The Normal Distribution

Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution

### The aspect of the data that we want to describe/measure is the degree of linear relationship between and The statistic r describes/measures the degree

PS 511: Advanced Statistics for Psychological and Behavioral Research 1 Both examine linear (straight line) relationships Correlation works with a pair of scores One score on each of two variables ( and

### Simple Regression Theory I 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY I 1 Simple Regression Theory I 2010 Samuel L. Baker Regression analysis lets you use data to explain and predict. A simple regression line drawn through data points In Assignment

### Infinite Algebra 1 supports the teaching of the Common Core State Standards listed below.

Infinite Algebra 1 Kuta Software LLC Common Core Alignment Software version 2.05 Last revised July 2015 Infinite Algebra 1 supports the teaching of the Common Core State Standards listed below. High School

### The Cartesian Plane The Cartesian Plane. Performance Criteria 3. Pre-Test 5. Coordinates 7. Graphs of linear functions 9. The gradient of a line 13

6 The Cartesian Plane The Cartesian Plane Performance Criteria 3 Pre-Test 5 Coordinates 7 Graphs of linear functions 9 The gradient of a line 13 Linear equations 19 Empirical Data 24 Lines of best fit

### Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

### ! x sum of the entries

3.1 Measures of Central Tendency (Page 1 of 16) 3.1 Measures of Central Tendency Mean, Median and Mode! x sum of the entries a. mean, x = = n number of entries Example 1 Find the mean of 26, 18, 12, 31,

### Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed

### Describing Relationships between Two Variables

Describing Relationships between Two Variables Up until now, we have dealt, for the most part, with just one variable at a time. This variable, when measured on many different subjects or objects, took

### DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 seema@iasri.res.in 1. Descriptive Statistics Statistics

### 6 3 The Standard Normal Distribution

290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

### Chapter 4: Average and standard deviation

Chapter 4: Average and standard deviation Context................................................................... 2 Average vs. median 3 Average.................................................................

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Scatter Plots with Error Bars

Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

### WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6

WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent before-thefact, expected values. In particular, the beta coefficient used in

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner)

Hints for Success on the AP Statistics Exam. (Compiled by Zack Bigner) The Exam The AP Stat exam has 2 sections that take 90 minutes each. The first section is 40 multiple choice questions, and the second

### Lesson 4 Measures of Central Tendency

Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

### AP Statistics 2001 Solutions and Scoring Guidelines

AP Statistics 2001 Solutions and Scoring Guidelines The materials included in these files are intended for non-commercial use by AP teachers for course and exam preparation; permission for any other use

### Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### Elementary Statistics

Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

### Numerical Summarization of Data OPRE 6301

Numerical Summarization of Data OPRE 6301 Motivation... In the previous session, we used graphical techniques to describe data. For example: While this histogram provides useful insight, other interesting

### Descriptive Statistics

Descriptive Statistics Descriptive statistics consist of methods for organizing and summarizing data. It includes the construction of graphs, charts and tables, as well various descriptive measures such

### Teaching & Learning Plans. The Correlation Coefficient. Leaving Certificate Syllabus

Teaching & Learning Plans The Correlation Coefficient Leaving Certificate Syllabus The Teaching & Learning Plans are structured as follows: Aims outline what the lesson, or series of lessons, hopes to

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

### Chapter 15 Multiple Choice Questions (The answers are provided after the last question.)

Chapter 15 Multiple Choice Questions (The answers are provided after the last question.) 1. What is the median of the following set of scores? 18, 6, 12, 10, 14? a. 10 b. 14 c. 18 d. 12 2. Approximately

### with functions, expressions and equations which follow in units 3 and 4.

Grade 8 Overview View unit yearlong overview here The unit design was created in line with the areas of focus for grade 8 Mathematics as identified by the Common Core State Standards and the PARCC Model

### Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables