Regression. In this class we will:

Size: px
Start display at page:

Download "Regression. In this class we will:"

Transcription

1 AMS 5 REGRESSION

2 Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be formalized by regression methods. In this class we will: Consider the definition of simple linear regression Find a method to predict an individual value Use the normal curve to estimate the percentile ranks Describe the regression effect Compute the regression errors and its RMS Study the behavior of regression errors

3 Regression The regression method describes how one variable depends on another. The Northern California temperature data have average altitude of 3,524 feet and a SD of 1,839 feet; average temperature of 70.3 degrees and SD 6.5 degrees. The correlation between temperature and altitude is

4 Regression The idea behind the calculation of the coefficient of correlation is that the scatter plot of the data corresponds to a cloud that follows a straight line. This idea can be formalized by regression methods. In this class we will: Consider the definition of simple linear regression Find a method to predict an individual value Use the normal curve to estimate the percentile ranks Describe the regression effect Compute the regression errors and its RMS Study the behavior of regression errors

5 Regression The cloud of points shows a mild negative association between the two variables, as does the value of r. Can we use the values of altitude to estimate the average values of temperature?

6 Regression How does the regression line work? Associated with an increase of one SD in x there is an increase of r SDs in y on average. Clearly, if the correlation coefficient is negative, then the average value of y decreases as x increases. In the temperature and altitude example, an increase of height of 1,839 feet produces a increase of = degrees in the average temperature.

7 Regression How do we use the method to predict an individual value? If we consider two variables x and y and we want to predict the value of y for a specific value of x, we use the average value of y that corresponds to the value of x according to the regression method. Example: The first year GPAs and the Math SAT for the students of a university produce the following data average SAT score = 550, SD = 80 average 1st-year GPA = 2.6, SD = 0.6 r = 0.40 We want to predict the 1st-year GPA of a student with a SAT score of 650.

8 Regression The student's SAT score in standard units is = so the score is 1.25 SDs above average. An increase of one SD above the average SAT score produces an increase of 0,4 0,6 GPA points. This implies that our student will have an increase of = 0.3 points of GPA above average. Since the average GPA is 2.6, the predicted GPA is = 2.9 This is the average GPA that we expect for students with STA scores around 650.

9 Regression WARNING: You can use the regression method on new subjects provided that they are similar to the ones that were used to produce the averages, SDs and r used in the regression method. In the previous example the method will not be valid for students of a different institution.

10 Regression We can use the regression method and the normal curve to produce estimates of the percentile ranks. Example: In the previous example suppose a student has a percentile rank of 90% for the SAT scores. That is, only 10% of the scores are higher than his. What is the predicted percentile rank for the 1st-year GPA of this student? Using the normal curve we have that a 90% probability corresponds to z score of 1.3. This means that the student's SAT score is 1.3 SDs above average. This corresponds to being SDs above the average GPA and this corresponds to an accumulated probability, under the normal curve, of approximately 69%.

11 Regression So the percentile rank on 1st-year GPA of a student with a percentile rank on SAT score of 90% is predicted to be 69%. Notice that the student with a SAT percentile rank of 90% was `pulled down' to only 69% by the regression method. Why is that? Suppose the correlation was perfect, r = 1, then 90% will convert to 90%. The other extreme is that there is no correlation, so, in the absence of any information, the best guess is the median or 50% percentile. The regression method produces a rank that is somewhere between these two extremes.

12 Example The shoe size and the heights of 14 men are recorded. The shoe size average is with a SD of The average height is inches with a SD of 2.45 inches. The correlation is What is the average height of a man that uses shoes of size 11.5? We convert 11.5 to standard units = so the shoe size is units above average. This means that the height will be = 1.95 inches above average. So the average height of a man with shoe size 11.5 will be = inches.

13 Regression effect Galton, a British statistician, studied the relationship between the height of the fathers and the sons in 1,078 families. He noticed that tall fathers tended to have shorter sons and short fathers tended to have taller sons. He termed this fact regression to mediocrity. This is where the term regression comes from. Example: Children are tested for IQ before and after taking a preschool program. In both cases the scores average 100 and the SD is 15. So, on average, there seems to be no effect. Nevertheless children below average in the first test had an average gain of 5 IQ and those above average had an average loss of 5 IQ. This is regression effect.

14 Regression effect A model for the test-retest situation is observed test score = true score + chance error Suppose that the chance error can be either positive or negative. Suppose that the true scores in the population follow the normal curve with an average of 100 and a SD of 15. Consider the children who scored 140 on the first test. There are two possibilities: true score below 140, with a positive chance error true score above 140, with a negative chance error Which one is more likely? According to the normal curve, the first possibility is more likely, since the mean is 100 and so the interval above 140 has less probability than the one below 140. Under this scenario, the second test is more likely to produce a value below 140.

15 Regression effect A symmetric situation is valid for those scoring, say, 80 IQ. It is likely that the true test is above 80 with a negative chance error, and so the second score is likely to be above the first. In other words, if a students scores above average in the first test, it is likely that the true score is lower than the observed one. If the student takes the test again, chances are that the second score will be lower than the first. A symmetric situation is true for a person scoring below average in the first test. This explains the regression effect.

16 Regression errors The regression method can be used to predict y from x. But actual values differ from predictions. These are the regression errors. error = actual value of y - predicted value of y Some of the errors defined in this way are positive and some are negative. Reflecting the fact that some observations are above and some are below the regression line. How do we measure the error in a regression? The overall size of the error is measured using the root-meansquare (RMS), as we did to obtain the SD. This is equal to where N is the number of points in the scatter diagram.

17 Regression errors What if we ignore the values of x? Then our prediction for y is the average of y. In this case the RMS error coincides with the SD of y.

18 Computing the RMS error We saw that the error that corresponds to a prediction where the values of x are ignored corresponds to the SD of y. The overall size of the error for a regression using x has to be smaller than the SD. How much smaller? 2 RMS error= 1 r SD of y We observe the following features The units of the RMS error are the same as the units of the variable being predicted. Perfect correlation corresponds to zero RMS error. Zero correlation corresponds to maximum RMS error (equal to SD of y).

19 Computing the RMS error Example 1: In the California temperature example we had that the SD of y is 6.5 degrees and the correlation is -0.76, then degrees 4.22 degrees So, in this case, knowing the altitude reduces the SD from 6.5 to 4.22 degrees. Example 2: In the shoe sizes examples we had that the SD of y is 2.45 inches and the correlation is 0.93, then inches 0.90 inches So we observe that, knowing the shoe size produces a dramatic reduction of the SD from 2.45 to 0.90.

20 Plotting the residuals Prediction errors are usually called residuals. It is important to explore the graphical properties of residuals to find out about the goodness of the fit by the regression line. In a residual plot the x coordinates are the same as for the original data. The y coordinates correspond to the values of the residuals. So there is one point for each point in the original scatter diagram.

21 Thus, if everything is OK with the regression line, we expect to see a cloud of points around the zero line in the y axis. Plotting the residuals We expect to see no trends or clusters in the residuals There should be about the same number of positive as negative residuals A histogram of the residuals should look symmetric around zero

22 Problem The following results are taken from a study of about 1,000 families: average height of husband 68 inches, SD 2.7 inches average height of wife 63 inches, SD 2.5, r 0.25 Predict the height of a wife when the height of her husband is inches The husband is 4 inches above average height. This is 4/2.7 = 1.5 SD above the average. So the wife is predicted to have r 1.5 = this corresponds to = 1 inch inches This the husband is right on the average, so the wife will be right on the average as well.

23 Prediction for data in a vertical strip Example: A law school finds the following relationship between LSAT scores and first-year scores average LSAT score = 162, SD = 6 average first-year score = 68, SD = 10, r=0.60 Q: About what percentage of the students had first-year scores over 75? A: We use the normal curve approximation. Converting to standard units = 0.7 this corresponds to a right hand tail of 14% under the normal curve.

24 Prediction for data in a vertical strip Q: Of the students who scored 165 on the LSAT, about what percentage had first-year scores over 75? A: We first convert to standard units for the x variable: = 0.5 then convert to standard units for the y variable r 0.5= = 0.3 which corresponds to = 3 points above average or 68+3 = 71. Since the data corresponding to a strip are a smaller and more homogeneous sample, the corresponding SD will be smaller. How much smaller?

25 Prediction for data in a vertical strip Example: A law school finds the following relationship between LSAT scores and first-year scores average LSAT score = 162, SD = 6 average first-year score = 68, SD = 10, r=0.60 Q: About what percentage of the students had first-year scores over 75? A: We use the normal curve approximation. Converting to standard units = 0.7 this corresponds to a right hand tail of 14% under the normal curve.

26 Prediction for data in a vertical strip We expect the dispersion in the y variable to be about the same for each vertical strip. This is given by the RMS error, thus the new SD is r SD of y= = 8 points This new SD can be used to convert to standard units = and, using the normal curve, we obtain an area of 31% above 0.5. This is the percentage of students scoring more than 75 in the first year among those who scored 165 in the LSAT. Notice that this percentage is higher than the 14% we obtained before. This is because we have focus on a smaller portion of the sample, obtaining a smaller SD.

27 Prediction for data in a vertical strip In summary, when considering data for a vertical strip: Convert to standard units in the x variable. Obtain the predicted value of the y variable. Calculate the SD for the y variable in the strip using RMS error. Convert to standard units in the y variable and use the normal curve.

28 Slope and intercept All lines can be determined by a slope and an intercept. The intercept is the height of the line when x = 0. The slope is the rate at which y increases, per unit increase in x. If the slope is negative then y decreases as x increases.

29 Slope and intercept How do you get the slope of a regression line? Example: A sample of 555 California men age in 1993 was surveyed to find out about education and income. The data are summarized by average education 12.5 years; SD 4 years average income $21,500; SD $16,000; r 0.35 This means that, for every increase of one SD in education, there is an increase of r SD in income. Thus, 4 extra years of education are worth an extra 0.35 $16,000 = $5,600 of income. So, each extra year is worth 0.35 $16, this, is the slope of the regression line. = $1, 400

30 Slope and intercept The intercept of the regression line is given by the value of y when x = 0. This is 12.5 years below average in education. Since each year costs $1,400, a man with no education should have an income which is below average by 12.5 years $1,400 per year = $17,500 since the average income is $21,500, the income of a man with no education is $21,500 -$17,500 = $4,000. This is the intercept of the regression line. This corresponds to the change in y associated with one unit increase in x.

31 Slope and intercept This is given by average of y - slope average of x The equation for the regression line is called the regression equation and can be written as y= slope x+ intercept So, for our example, we have that predicted income = $1,400 per year education + $4,000

32 Slope and intercept Q: What is the predicted income of a man with an education of 15 years? A: Using the regression equation we have y = $1, $4,000 = $25,000 we can plug in any value of education and obtain the expected income for that level of education. Warning: It is usually a bad idea to use the regression line for extrapolations.

33 Example Back to our shoe size example. The shoe size and the heights of 14 men are recorded. The shoe size average is with a SD of The average height is inches with a SD of 2.45 inches. The correlation is r SD of height The slope of the regression line is = = 1.88 SD of shoe size 1.21 To obtain the intercept we consider a show size of zero. This is units below average and so will correspond to a height that is = inches below average. So it corresponds to a height of = inches. The regression line is height = 1.88 shoe size inches Q: What is the predicted height of a man with a show size of 9? A: Using the regression equation we have inches = inches

34 Least Square Consider a cloud of points produced by obtaining the scatter diagram of observations corresponding to two variables x and y. There are many lines that we can draw through the cloud. Which is the straight line that fits the points best? The regression line is a possible solution to this problem. This is the reason why the regression line is called the least squares line.

35 Least Square Example: Let b be the length of a spring with no load. If a load x is attached to the spring the stretch is proportional to x. Thus the length of the string is y = mx + b. where m and b are constants that depend on the string. An experiment is run to determine the constants for a given spring, the data are shown in the table. The correlation coefficient is r = 0.999, so the points are very close to straight line. But they are not exactly on a straight line. This is probably due to measurement error. The regression line for these data produces estimates of b and m, given, respectively, by the intercept and the slope of the line. The values are m 0.5c per kg, and b cm. These are the least squares estimates of m and b.

36 Problem Find the regression equation for predicting final score from midterm score, based on the following information: average midterm score = 70, SD = 10 average final score = 55, SD = 20, r = 0.60 The slope of the line can be obtained as r SD of final = = 1.2 SD of midterm 10 A score of 0 in the midterm will correspond to a final score that is = 84 units below average. So the intercept is = -29 units of the final score. Thus, the regression equation is final score = 1.2 midterm score - 29

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)

. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches) PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Descriptive statistics; Correlation and regression

Descriptive statistics; Correlation and regression Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human

More information

Exercise 1.12 (Pg. 22-23)

Exercise 1.12 (Pg. 22-23) Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

More information

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation

Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage

More information

Chapter 11: r.m.s. error for regression

Chapter 11: r.m.s. error for regression Chapter 11: r.m.s. error for regression Context................................................................... 2 Prediction error 3 r.m.s. error for the regression line...............................................

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph.

c. Construct a boxplot for the data. Write a one sentence interpretation of your graph. MBA/MIB 5315 Sample Test Problems Page 1 of 1 1. An English survey of 3000 medical records showed that smokers are more inclined to get depressed than non-smokers. Does this imply that smoking causes depression?

More information

The Correlation Coefficient

The Correlation Coefficient The Correlation Coefficient Lelys Bravo de Guenni April 22nd, 2015 Outline The Correlation coefficient Positive Correlation Negative Correlation Properties of the Correlation Coefficient Non-linear association

More information

The correlation coefficient

The correlation coefficient The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.

DESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,

More information

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.

Pie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple. Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Applied Data Analysis. Fall 2015

Applied Data Analysis. Fall 2015 Applied Data Analysis Fall 2015 Course information: Labs Anna Walsdorff [email protected] Tues. 9-11 AM Mary Clare Roche [email protected] Mon. 2-4 PM Lecture outline 1. Practice

More information

MEASURES OF VARIATION

MEASURES OF VARIATION NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are

More information

AP Statistics Solutions to Packet 2

AP Statistics Solutions to Packet 2 AP Statistics Solutions to Packet 2 The Normal Distributions Density Curves and the Normal Distribution Standard Normal Calculations HW #9 1, 2, 4, 6-8 2.1 DENSITY CURVES (a) Sketch a density curve that

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

Unit 7: Normal Curves

Unit 7: Normal Curves Unit 7: Normal Curves Summary of Video Histograms of completely unrelated data often exhibit similar shapes. To focus on the overall shape of a distribution and to avoid being distracted by the irregularities

More information

Name: Date: Use the following to answer questions 2-3:

Name: Date: Use the following to answer questions 2-3: Name: Date: 1. A study is conducted on students taking a statistics class. Several variables are recorded in the survey. Identify each variable as categorical or quantitative. A) Type of car the student

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables

Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables Lecture 13/Chapter 10 Relationships between Measurement (Quantitative) Variables Scatterplot; Roles of Variables 3 Features of Relationship Correlation Regression Definition Scatterplot displays relationship

More information

Diagrams and Graphs of Statistical Data

Diagrams and Graphs of Statistical Data Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in

More information

Homework 8 Solutions

Homework 8 Solutions Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.

More information

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties:

Density Curve. A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: Density Curve A density curve is the graph of a continuous probability distribution. It must satisfy the following properties: 1. The total area under the curve must equal 1. 2. Every point on the curve

More information

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles.

2. Here is a small part of a data set that describes the fuel economy (in miles per gallon) of 2006 model motor vehicles. Math 1530-017 Exam 1 February 19, 2009 Name Student Number E There are five possible responses to each of the following multiple choice questions. There is only on BEST answer. Be sure to read all possible

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Statistics E100 Fall 2013 Practice Midterm I - A Solutions

Statistics E100 Fall 2013 Practice Midterm I - A Solutions STATISTICS E100 FALL 2013 PRACTICE MIDTERM I - A SOLUTIONS PAGE 1 OF 5 Statistics E100 Fall 2013 Practice Midterm I - A Solutions 1. (16 points total) Below is the histogram for the number of medals won

More information

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there

CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there CORRELATIONAL ANALYSIS: PEARSON S r Purpose of correlational analysis The purpose of performing a correlational analysis: To discover whether there is a relationship between variables, To find out the

More information

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares

Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

Descriptive Statistics and Measurement Scales

Descriptive Statistics and Measurement Scales Descriptive Statistics 1 Descriptive Statistics and Measurement Scales Descriptive statistics are used to describe the basic features of the data in a study. They provide simple summaries about the sample

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers

Correlation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables

More information

Worksheet A5: Slope Intercept Form

Worksheet A5: Slope Intercept Form Name Date Worksheet A5: Slope Intercept Form Find the Slope of each line below 1 3 Y - - - - - - - - - - Graph the lines containing the point below, then find their slopes from counting on the graph!.

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Logo Symmetry Learning Task. Unit 5

Logo Symmetry Learning Task. Unit 5 Logo Symmetry Learning Task Unit 5 Course Mathematics I: Algebra, Geometry, Statistics Overview The Logo Symmetry Learning Task explores graph symmetry and odd and even functions. Students are asked to

More information

Describing Relationships between Two Variables

Describing Relationships between Two Variables Describing Relationships between Two Variables Up until now, we have dealt, for the most part, with just one variable at a time. This variable, when measured on many different subjects or objects, took

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

The Normal Distribution

The Normal Distribution Chapter 6 The Normal Distribution 6.1 The Normal Distribution 1 6.1.1 Student Learning Objectives By the end of this chapter, the student should be able to: Recognize the normal probability distribution

More information

Scatter Plots with Error Bars

Scatter Plots with Error Bars Chapter 165 Scatter Plots with Error Bars Introduction The procedure extends the capability of the basic scatter plot by allowing you to plot the variability in Y and X corresponding to each point. Each

More information

Lesson 4 Measures of Central Tendency

Lesson 4 Measures of Central Tendency Outline Measures of a distribution s shape -modality and skewness -the normal distribution Measures of central tendency -mean, median, and mode Skewness and Central Tendency Lesson 4 Measures of Central

More information

Section 3 Part 1. Relationships between two numerical variables

Section 3 Part 1. Relationships between two numerical variables Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

with functions, expressions and equations which follow in units 3 and 4.

with functions, expressions and equations which follow in units 3 and 4. Grade 8 Overview View unit yearlong overview here The unit design was created in line with the areas of focus for grade 8 Mathematics as identified by the Common Core State Standards and the PARCC Model

More information

Chapter 4: Average and standard deviation

Chapter 4: Average and standard deviation Chapter 4: Average and standard deviation Context................................................................... 2 Average vs. median 3 Average.................................................................

More information

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.

Course Objective This course is designed to give you a basic understanding of how to run regressions in SPSS. SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. [email protected] Course Objective This course is designed

More information

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS

DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS DESCRIPTIVE STATISTICS AND EXPLORATORY DATA ANALYSIS SEEMA JAGGI Indian Agricultural Statistics Research Institute Library Avenue, New Delhi - 110 012 [email protected] 1. Descriptive Statistics Statistics

More information

Statistics. Measurement. Scales of Measurement 7/18/2012

Statistics. Measurement. Scales of Measurement 7/18/2012 Statistics Measurement Measurement is defined as a set of rules for assigning numbers to represent objects, traits, attributes, or behaviors A variableis something that varies (eye color), a constant does

More information

6 3 The Standard Normal Distribution

6 3 The Standard Normal Distribution 290 Chapter 6 The Normal Distribution Figure 6 5 Areas Under a Normal Distribution Curve 34.13% 34.13% 2.28% 13.59% 13.59% 2.28% 3 2 1 + 1 + 2 + 3 About 68% About 95% About 99.7% 6 3 The Distribution Since

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6

WEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6 WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent before-thefact, expected values. In particular, the beta coefficient used in

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Pennsylvania System of School Assessment

Pennsylvania System of School Assessment Pennsylvania System of School Assessment The Assessment Anchors, as defined by the Eligible Content, are organized into cohesive blueprints, each structured with a common labeling system that can be read

More information

Section 1.3 Exercises (Solutions)

Section 1.3 Exercises (Solutions) Section 1.3 Exercises (s) 1.109, 1.110, 1.111, 1.114*, 1.115, 1.119*, 1.122, 1.125, 1.127*, 1.128*, 1.131*, 1.133*, 1.135*, 1.137*, 1.139*, 1.145*, 1.146-148. 1.109 Sketch some normal curves. (a) Sketch

More information

Elementary Statistics

Elementary Statistics Elementary Statistics Chapter 1 Dr. Ghamsary Page 1 Elementary Statistics M. Ghamsary, Ph.D. Chap 01 1 Elementary Statistics Chapter 1 Dr. Ghamsary Page 2 Statistics: Statistics is the science of collecting,

More information

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2

Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2 Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables

More information

T O P I C 1 2 Techniques and tools for data analysis Preview Introduction In chapter 3 of Statistics In A Day different combinations of numbers and types of variables are presented. We go through these

More information

A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion

A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion A Determination of g, the Acceleration Due to Gravity, from Newton's Laws of Motion Objective In the experiment you will determine the cart acceleration, a, and the friction force, f, experimentally for

More information

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but

Test Bias. As we have seen, psychological tests can be well-conceived and well-constructed, but Test Bias As we have seen, psychological tests can be well-conceived and well-constructed, but none are perfect. The reliability of test scores can be compromised by random measurement error (unsystematic

More information

Descriptive Statistics

Descriptive Statistics Y520 Robert S Michael Goal: Learn to calculate indicators and construct graphs that summarize and describe a large quantity of values. Using the textbook readings and other resources listed on the web

More information

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability). Examples of Questions on Regression Analysis: 1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability). Then,. When

More information

Solving Quadratic Equations

Solving Quadratic Equations 9.3 Solving Quadratic Equations by Using the Quadratic Formula 9.3 OBJECTIVES 1. Solve a quadratic equation by using the quadratic formula 2. Determine the nature of the solutions of a quadratic equation

More information

What Does the Normal Distribution Sound Like?

What Does the Normal Distribution Sound Like? What Does the Normal Distribution Sound Like? Ananda Jayawardhana Pittsburg State University [email protected] Published: June 2013 Overview of Lesson In this activity, students conduct an investigation

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Descriptive Statistics

Descriptive Statistics Descriptive Statistics Descriptive statistics consist of methods for organizing and summarizing data. It includes the construction of graphs, charts and tables, as well various descriptive measures such

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs

Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Types of Variables Chapter 1: Looking at Data Section 1.1: Displaying Distributions with Graphs Quantitative (numerical)variables: take numerical values for which arithmetic operations make sense (addition/averaging)

More information

Mathematics (Project Maths Phase 1)

Mathematics (Project Maths Phase 1) 2012. M128 S Coimisiún na Scrúduithe Stáit State Examinations Commission Leaving Certificate Examination, 2012 Sample Paper Mathematics (Project Maths Phase 1) Paper 2 Ordinary Level Time: 2 hours, 30

More information

First Midterm Exam (MATH1070 Spring 2012)

First Midterm Exam (MATH1070 Spring 2012) First Midterm Exam (MATH1070 Spring 2012) Instructions: This is a one hour exam. You can use a notecard. Calculators are allowed, but other electronics are prohibited. 1. [40pts] Multiple Choice Problems

More information

Scatter Plot, Correlation, and Regression on the TI-83/84

Scatter Plot, Correlation, and Regression on the TI-83/84 Scatter Plot, Correlation, and Regression on the TI-83/84 Summary: When you have a set of (x,y) data points and want to find the best equation to describe them, you are performing a regression. This page

More information

ACTIVITY 6: Falling Objects

ACTIVITY 6: Falling Objects UNIT FM Developing Ideas ACTIVITY 6: Falling Objects Purpose and Key Question You developed your ideas about how the motion of an object is related to the forces acting on it using objects that move horizontally.

More information

Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1

Bill Burton Albert Einstein College of Medicine william.burton@einstein.yu.edu April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Bill Burton Albert Einstein College of Medicine [email protected] April 28, 2014 EERS: Managing the Tension Between Rigor and Resources 1 Calculate counts, means, and standard deviations Produce

More information

DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability

DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability DesCartes (Combined) Subject: Mathematics Goal: Statistics and Probability RIT Score Range: Below 171 Below 171 Data Analysis and Statistics Solves simple problems based on data from tables* Compares

More information

Chapter 9 Descriptive Statistics for Bivariate Data

Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)

More information

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE

LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE LAGUARDIA COMMUNITY COLLEGE CITY UNIVERSITY OF NEW YORK DEPARTMENT OF MATHEMATICS, ENGINEERING, AND COMPUTER SCIENCE MAT 119 STATISTICS AND ELEMENTARY ALGEBRA 5 Lecture Hours, 2 Lab Hours, 3 Credits Pre-

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Elasticity. I. What is Elasticity?

Elasticity. I. What is Elasticity? Elasticity I. What is Elasticity? The purpose of this section is to develop some general rules about elasticity, which may them be applied to the four different specific types of elasticity discussed in

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

Formula for linear models. Prediction, extrapolation, significance test against zero slope.

Formula for linear models. Prediction, extrapolation, significance test against zero slope. Formula for linear models. Prediction, extrapolation, significance test against zero slope. Last time, we looked the linear regression formula. It s the line that fits the data best. The Pearson correlation

More information

Chapter 27: Taxation. 27.1: Introduction. 27.2: The Two Prices with a Tax. 27.2: The Pre-Tax Position

Chapter 27: Taxation. 27.1: Introduction. 27.2: The Two Prices with a Tax. 27.2: The Pre-Tax Position Chapter 27: Taxation 27.1: Introduction We consider the effect of taxation on some good on the market for that good. We ask the questions: who pays the tax? what effect does it have on the equilibrium

More information

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions.

This unit will lay the groundwork for later units where the students will extend this knowledge to quadratic and exponential functions. Algebra I Overview View unit yearlong overview here Many of the concepts presented in Algebra I are progressions of concepts that were introduced in grades 6 through 8. The content presented in this course

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Introduction to Linear Regression

Introduction to Linear Regression 14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

More information

Association Between Variables

Association Between Variables Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi

More information

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1.

Def: The standard normal distribution is a normal probability distribution that has a mean of 0 and a standard deviation of 1. Lecture 6: Chapter 6: Normal Probability Distributions A normal distribution is a continuous probability distribution for a random variable x. The graph of a normal distribution is called the normal curve.

More information

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces

The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces The Effect of Dropping a Ball from Different Heights on the Number of Times the Ball Bounces Or: How I Learned to Stop Worrying and Love the Ball Comment [DP1]: Titles, headings, and figure/table captions

More information

Describing, Exploring, and Comparing Data

Describing, Exploring, and Comparing Data 24 Chapter 2. Describing, Exploring, and Comparing Data Chapter 2. Describing, Exploring, and Comparing Data There are many tools used in Statistics to visualize, summarize, and describe data. This chapter

More information

COMMON CORE STATE STANDARDS FOR

COMMON CORE STATE STANDARDS FOR COMMON CORE STATE STANDARDS FOR Mathematics (CCSSM) High School Statistics and Probability Mathematics High School Statistics and Probability Decisions or predictions are often based on data numbers in

More information

Slope-Intercept Equation. Example

Slope-Intercept Equation. Example 1.4 Equations of Lines and Modeling Find the slope and the y intercept of a line given the equation y = mx + b, or f(x) = mx + b. Graph a linear equation using the slope and the y-intercept. Determine

More information

PLOTTING DATA AND INTERPRETING GRAPHS

PLOTTING DATA AND INTERPRETING GRAPHS PLOTTING DATA AND INTERPRETING GRAPHS Fundamentals of Graphing One of the most important sets of skills in science and mathematics is the ability to construct graphs and to interpret the information they

More information