Sociology 6Z03 Topic 6: Least-Squares Regression
|
|
- Ophelia Gray
- 7 years ago
- Views:
Transcription
1 Sociology 6Z03 Topic 6: Least-Squares Regression John Fo McMaster University Fall 2016 John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Outline: Least-Squares Regression Introduction Review of the Equation of a Straight Line The Least-Squares Regression Line Regression vs. Correlation Detecting Problems in Least-Squares Linear Regression Interpreting Correlation and Regression John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
2 Introduction When the relationship between a response (y) and eplanatory variable () is linear, it is reasonable to try to summarize the relationship with a straight line. This lecture describes the most common method for fitting a straight line to a scatter of points called linear least-squares regression. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Review of the Equation of a Straight Line A straight line can be represented by the equation where: y = a + b a, called the y-intercept of the line, represents the y-value corresponding to an -value of 0. b, called the slope of the line, indicates how much y changes when is increased by 1. If b is positive, then the value of y increases as increases; if b is negative, then the value of y decreases as increases; if b = 0, then the line is horizontal the value of y does not change as changes. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
3 Review of the Equation of a Straight Line Positive Slope (b > 0) Negative Slope (b < 0) y y = a + b y a 1 b a 1 b y = a + b John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 The Least-Squares Regression Line How Should We Fit a Line to a Scatterplot? Unless the linear relationship between y and is perfect, which is never the case for real data, no line will go through all of the points in a scatterplot. When the linear relationship between y and is very strong, it is easy to fit a line by eye to the scatterplot of the data. This is not the case when the relationship between the variables is weaker, as is usually true for data in the social sciences. We therefore need a method of fitting a line to the scatter of points that doesn t depend upon subjective judgment. We want a line that comes as close to the points as possible. A line that comes close to the data allows us to predict values of y for specific values of. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
4 The Least-Squares Regression Line How Should We Fit a Line to a Scatterplot? Consider, for eample, the relationship between prestige and education for the Canadian occupational prestige data. Prestige 95 To find the predicted or fitted value of y for an occupation with 11 years of education, go up to the line above = 11, and then over to the y-ais to find the corresponding value of y, that is, ŷ 48.) predicted prestige for an occupation with 11 years of education Education John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 The Least-Squares Regression Line Thought Question What is the approimate predicted value of prestige ŷ for an occupation with = 15 years of education? A 25. B 50. C 70. D 90. E I don t know. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
5 The Least-Squares Regression Line How Should We Fit a Line to a Scatterplot? The predicted value of y is represented by ŷ (called y-hat ), because the predicted and observed y-values will generally differ. In the Canadian prestige data, for eample, there are a few occupations with about 11 years of education. Some have observed prestige values a bit above the line, and some have observed values a bit below the line. For each observation, the difference between the observed and predicted y-value, representing the error in prediction for that observation, is called the residual (literally, what is left over): residual = observed value predicted value = y ŷ John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 The Least-Squares Regression Line How Should We Fit a Line to a Scatterplot? Notice that the residuals are the vertical distances between the points and the line. We want a line that makes the residuals as small as possible. Prestige y predicted ^ y residual y ^y observed y ^y = Education John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
6 The Least-Squares Regression Line The Method of Least Squares (LS) Making the residuals small would be easy if there were just two points we could simply pass a line between the two points. When there are many points, there are several different ways to proceed. The most common method of fitting a line is called the method of least-squares (developed independently by Gauss and the French mathematician Legendre at the end of the 18th century), which finds the line with the smallest possible sum of squared residuals: Choose a and b to minimize residual 2 i The residuals are squared before adding them up to prevent positive residuals (corresponding to points above the line) from canceling out negative residuals (points below the line). Squaring makes all of the residuals positive. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 The Least-Squares Regression Line Least-Absolute-Values (LAV) Regression Another way to make all of the residuals positive is to take their absolute values: Choose a and b to minimize residual i This approach has its strong points for eample, it produces values of a and b that are more resistant to outliers than those produced by least-squares regression but it is more difficult mathematically. Finding a and b to minimize the sum of squared residuals is analogous to using the mean to represent the centre of a distribution, while finding a and b to minimize the sum of absolute residuals is analogous to using the median. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
7 The Least-Squares Regression Line Finding the LS Coefficients The least-squares line has the equation ŷ = a + b with slope and intercept b = ( i )(y i y) ( i ) 2 a = y b = r s y s where r is the correlation between y and ; s y is the standard deviation of the response variable y; s is the standard deviation of the eplanatory variable ; and y and are the means of the two variables. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 The Least-Squares Regression Line Finding the LS Coefficients Calculating the least-squares coefficients a and b according to these formulas is a lot of work, even when the number of observations n is not very large. But we can leave the work to the computer. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
8 The Least-Squares Regression Line Finding the LS Coefficients Here, for eample, is the calculation of the least-squares line for the regression of prestige (y) on education (). Starting with the correlation, standard deviations, and means of the two variables, we get r = s = s y = = y = b = r s y = s = a = y b = = John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 The Least-Squares Regression Line Finding the LS Coefficients The fitted regression equation is therefore ŷ = Note that the origin (0, 0) does not appear in the scatterplot, and that we cannot see the intercept a = in the graph. Prestige ^y = Education John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
9 The Least-Squares Regression Line Interpreting the LS Intercept a = is the predicted prestige score for an occupation with 0 years of average education. In this instance, we should not interpret the value of a literally, because 1 none of the 102 occupations in the dataset has less than 6 years of average education; and 2 the prestige scores cannot be negative. As mentioned, because the aes in the scatterplot do not start at the origin [the point (0, 0)], the intercept does not appear on this graph (but see the net slide). In general, even when a linear regression does a good job of summarizing the relationship between y and within the observed range of the data, it is dangerous to etrapolate this relationship beyond the range of the data. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 The Least-Squares Regression Line Interpreting the LS Intercept The regression intercept a = etrapolates the least-squares line far below the range of the data on education. Prestige Education John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
10 The Least-Squares Regression Line Interpreting the LS Slope b = 5.361: Each additional year of education is accompanied on average by an increase of a bit more than 5 prestige points. This is a descriptive statement about the association between prestige and education. We may or may not be willing to give the slope coefficient a causal interpretation ( increasing average education by one year causes the prestige of the occupation to rise by more than 5 points ). Because it tell us how y changes with, we are usually more interested in the slope b than in the intercept a. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 The Least-Squares Regression Line Interpreting the LS Intercept and Slope Thought Question Imagine that in a least-squares regression of individuals annual income in dollars on their years of education, we obtain the following regression equation income = 10, education Suppose that the regression equation is a reasonable summary of the relationship between income and education, and that we have data on individuals with 0 to 20 years of education. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
11 The Least-Squares Regression Line Interpreting the LS Intercept and Slope Thought Question Which of the following statements is correct? A The predicted value of income for an individual with 0 years of education is $10,000. B Each additional year of education is associated on average with an increase of $5000 in annual income. C Both of the above. D Neither of the above. E I don t know. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 The Least-Squares Regression Line Graphing the LS Line To plot the regression line on the scatterplot, find two points on the line. Any two points will do, but we can plot the line more accurately if the points are widely separated. For eample, for the regression of prestige on education, we can find the ŷ values corresponding to -values of 6 and 16: for = 6: ŷ = = for = 16: ŷ = = Connecting the points (6, ) and (16, ) locates the least-squares line (as shown on the net slide). Two points that are always on the least-squares line are (0, a) and (, y). For the eample regression, (0, a) = (0, ) and (, y) = (10.738, ) John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
12 The Least-Squares Regression Line Graphing the LS Line Graphing the least-squares line by connecting the points (6, ) and (16, ): Education Prestige John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Regression vs. Correlation The slope b of the least-squares regression line and the correlation r are related by the equation b = r s y s The correlation and slope are similar in certain respects and different in others: When r = 0, indicating that there is no linear relationship between y and, then b = 0 as well. If and y are standardized variables (so that s = s y = 1), then b = r. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
13 Regression vs. Correlation The Two LS Lines The correlation coefficient r doesn t depend upon which variable is treated as the response and which as the eplanatory variable. The slope b does depend upon which variable is treated as the response. If is regressed on y rather than vice-versa (i.e., if is treated as the response variable), then b on y = r s s y which is usually different from b y on. There are two least-squares regression lines one for the regression of y on, and the other for the regression of on y. Unless r = 1, these two regression lines are different. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Regression vs. Correlation The Two LS Lines For eample, for prestige and education: y Prestige 95 regression of on y ^ = y regression of y on ^y = Education John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
14 Regression vs. Correlation Interpreting the Correlation Coefficient The square of the correlation coefficient (r 2 ) has a special interpretation in least-squares regression: Recall the regression residuals, which give the differences between observed and predicted response values, residual i = y i ŷ i The sum of squared residuals represents the variation of y around the regression line, residual 2 i = (y i ŷ i ) 2 The total variation of y around its mean (ignoring the regression line) is (y i y) 2 John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Regression vs. Correlation Interpreting the Correlation Coefficient The difference between the two measures of variation is the amount of variation accounted for by the regression of y on : eplained variation = total variation residual variation = (y i y) 2 residual 2 i = (ŷ i y) 2 The squared correlation epresses the eplained variation as a fraction of the total variation of y, r 2 eplained variation = total variation = (ŷ i y) 2 (y i y) 2 = 1 residual2 i (y i y) 2 John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
15 Regression vs. Correlation Interpreting the Correlation Coefficient When there is a perfect linear relationship between y and, the residuals are all zero; the sum of squared residuals is zero; and r 2 = 1. When there is no linear relationship between y and, the eplained variation is zero, and r 2 = 0. For the regression of occupational prestige on education, r =.85018, and thus the regression accounts for r 2 = =.7228 or about 72 percent of the variation in prestige scores. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Detecting Problems in Least-Squares Linear Regression Influential Data The least-squares line is a good summary of the relationship between y and when the relationship is in fact linear and when the data are well behaved. But the least-squares line can sometimes be markedly affected by outlying data. In regression analysis, an outlier is a point far away from the general pattern of the data. It is a point whose y value is unusual compared to other points with similar -values. Points with unusual -values, when they are out of line with the rest of the data, can be influential, in the sense that their inclusion in the dataset can markedly alter the regression line. Like the mean, standard deviation, and correlation, therefore, the least-squares regression line is not resistant to unusual data. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
16 Detecting Problems in Least-Squares Linear Regression Influential Data The following scatterplot, showing reported and measured weight in kg, is for 101 women engaged in regular eercise. The data were collected by Caroline Davis, a psychologist at York University who studies eating disorders. If the women are unbiased reporters of their weight, then the regression line should be approimately ŷ = (that is, an intercept of 0 and a slope of 1). When the outlying point at the right is omitted, the least-squares line is close to the line of unbiased reporting (the lighter solid line). In this case, the influential outlier represents an error in recording the data. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Detecting Problems in Least-Squares Linear Regression Influential Data Reported Weight (kg) Measured Weight (kg) Important Point Outliers, influential data, and other problems in regression analysis can be detected in the scatterplot of y against. It is therefore important always to plot regression data. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
17 Detecting Problems in Least-Squares Linear Regression Nonlinearity Sometimes problems appear even more clearly in plots of residuals against : In the following graph there is a nonlinear relationship between y and. Note that the average LS residual is 0, and that the residuals and -values are uncorrelated. y Residuals John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Detecting Problems in Least-Squares Linear Regression Changing Spread In the following graph, the spread of y around the regression line (the spread of the residuals) increases with. Predictions at large values of will be less accurate than at small values of Least-squares regression may not be the best method for fitting a line to the scatterplot. y Residuals John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
18 Detecting Problems in Least-Squares Linear Regression What we want to see in a residual plot are unpatterned residuals, unrelated to : y Residuals John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Detecting Problems in Least-Squares Linear Regression Anscombe s Quartet The following eamples (due to Anscombe, and called Anscombe s Quartet by Tufte) are particularly instructive and cautionary: Dataset 1 Dataset 2 y y Dataset 3 Dataset 4 y y John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
19 Y Y Y Y Detecting Problems in Least-Squares Linear Regression Anscombe s Quartet Anscombe s four datasets are cleverly constructed to have eactly the same regression of y on and the same correlation: ŷ = r =.82 As well,, y, s, and s y are all the same in the four datasets. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Detecting Problems in Least-Squares Linear Regression Anscombe s Quartet (a) (b) X X (c) (d) John Fo (McMaster University) Soc X 6Z03: Least-Squares Regression X Fall / 44
20 Detecting Problems in Least-Squares Linear Regression Anscombe s Quartet The linear least-squares regression is a good summary of the relationship between and y only for the first dataset. In the second dataset, the relationship is nonlinear. In the third dataset, there is an outlier. In the fourth dataset, the least-squares line chases the influential observation. None of these problems is clear from the fitted regression equation and correlation, and none (but the last) is clear from looking at the numerical data John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Interpreting Correlation and Regression Cautions: Etrapolation, Lurking Variables Etrapolation: It is not safe to use a regression line for prediction outside of the range of -values observed in the data. Lurking variables: A lurking variable is an eplanatory variable that has been omitted from the analysis and that has an important effect on the relationship between and y. Imagine, for eample, is education and y is income, measured for each of a number of individuals (see the following graph, with contrived data). The filled dots represent men and the hollow dots represent women. If y is regressed on using the data both for women and for men, the relationship between income and education appears to be very weak, with r =.03. But when y is regressed on separately for women and men, the relationships are much stronger; r =.94 for each group. Here, the lurking variable is gender. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
21 Interpreting Correlation and Regression Further Cautions: Lurking Variables Income ($1000s) Education (y ears) John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Interpreting Correlation and Regression Further Cautions: Lurking Variables The opposite effect can also occur an apparent relationship between two variables can be induced by the omission of an important third variable. Freedman was interested in the relationship between the population density of cities and their crime rates. He found that the association between these two variables was due to other factors that are related both to density and to crime: For eample, large cities tend to be denser and to have higher crime rates. If we look separately at cities of similar size, density and crime are not related. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
22 Interpreting Correlation and Regression Further Cautions: Association vs. Causation Important Point Association is not causation: Because an observed association can be due to a lurking variable, mere statistical association between variables does not imply that one variable causes the other. Causal inferences are much more certain in eperimental research than in observational research. In a randomized eperiment, the values of the eplanatory variable are assigned at random to individuals and therefore cannot (ecept by very bad luck) be related to lurking variables. Most interesting sociological research questions are not amenable to eperimental investigation, however. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44 Interpreting Correlation and Regression Further Cautions: Association vs. Causation Thought Question Among the many contributions to statistics of the great British statistician Sir R. A. Fisher was his invention of the randomized comparative eperiment. In the 1950s, Fisher maintained that there was no convincing evidence that smoking causes lung cancer, because the association between these two variables was at the time based solely on observational data. Fisher s argument implies that A there may be one or more lurking variables that are related both to smoking and to lung cancer. B lung cancer causes smoking rather than vice-versa. C there is no observed relationship between smoking and lung cancer. D Fisher s argument just doesn t make sense it is simply a stupid argument. John Fo (McMaster University) Soc 6Z03: Least-Squares Regression Fall / 44
Chapter 9 Descriptive Statistics for Bivariate Data
9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationSection 14 Simple Linear Regression: Introduction to Least Squares Regression
Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship
More informationThe Big Picture. Correlation. Scatter Plots. Data
The Big Picture Correlation Bret Hanlon and Bret Larget Department of Statistics Universit of Wisconsin Madison December 6, We have just completed a length series of lectures on ANOVA where we considered
More informationLinear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares
Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationRelationships Between Two Variables: Scatterplots and Correlation
Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More informationExercise 1.12 (Pg. 22-23)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationLecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation
Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage
More informationSection 3 Part 1. Relationships between two numerical variables
Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationCorrelation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2
Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables
More information2. Simple Linear Regression
Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationPart Three. Cost Behavior Analysis
Part Three Cost Behavior Analysis Cost Behavior Cost behavior is the manner in which a cost changes as some related activity changes An understanding of cost behavior is necessary to plan and control costs
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More informationCorrelation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
More informationChapter 4. Polynomial and Rational Functions. 4.1 Polynomial Functions and Their Graphs
Chapter 4. Polynomial and Rational Functions 4.1 Polynomial Functions and Their Graphs A polynomial function of degree n is a function of the form P = a n n + a n 1 n 1 + + a 2 2 + a 1 + a 0 Where a s
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Module 7 Test Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. You are given information about a straight line. Use two points to graph the equation.
More informationCorrelation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers
Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationThe numerical values that you find are called the solutions of the equation.
Appendi F: Solving Equations The goal of solving equations When you are trying to solve an equation like: = 4, you are trying to determine all of the numerical values of that you could plug into that equation.
More informationMTH 140 Statistics Videos
MTH 140 Statistics Videos Chapter 1 Picturing Distributions with Graphs Individuals and Variables Categorical Variables: Pie Charts and Bar Graphs Categorical Variables: Pie Charts and Bar Graphs Quantitative
More informationThe importance of graphing the data: Anscombe s regression examples
The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective
More information. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)
PEARSON S FATHER-SON DATA The following scatter diagram shows the heights of 1,0 fathers and their full-grown sons, in England, circa 1900 There is one dot for each father-son pair Heights of fathers and
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationCourse Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.
SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed
More informationDESCRIPTIVE STATISTICS. The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses.
DESCRIPTIVE STATISTICS The purpose of statistics is to condense raw data to make it easier to answer specific questions; test hypotheses. DESCRIPTIVE VS. INFERENTIAL STATISTICS Descriptive To organize,
More informationCOMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk
COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution
More informationSo, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.
Joint probabilit is the probabilit that the RVs & Y take values &. like the PDF of the two events, and. We will denote a joint probabilit function as P,Y (,) = P(= Y=) Marginal probabilit of is the probabilit
More information10.1. Solving Quadratic Equations. Investigation: Rocket Science CONDENSED
CONDENSED L E S S O N 10.1 Solving Quadratic Equations In this lesson you will look at quadratic functions that model projectile motion use tables and graphs to approimate solutions to quadratic equations
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents
More informationSouth Carolina College- and Career-Ready (SCCCR) Probability and Statistics
South Carolina College- and Career-Ready (SCCCR) Probability and Statistics South Carolina College- and Career-Ready Mathematical Process Standards The South Carolina College- and Career-Ready (SCCCR)
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares
More informationWhy should we learn this? One real-world connection is to find the rate of change in an airplane s altitude. The Slope of a Line VOCABULARY
Wh should we learn this? The Slope of a Line Objectives: To find slope of a line given two points, and to graph a line using the slope and the -intercept. One real-world connection is to find the rate
More informationWe are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries?
Statistics: Correlation Richard Buxton. 2008. 1 Introduction We are often interested in the relationship between two variables. Do people with more years of full-time education earn higher salaries? Do
More informationHomework 8 Solutions
Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (a-d), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.
More informationStat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015
Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field
More informationThe Slope-Intercept Form
7.1 The Slope-Intercept Form 7.1 OBJECTIVES 1. Find the slope and intercept from the equation of a line. Given the slope and intercept, write the equation of a line. Use the slope and intercept to graph
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More information7.7 Solving Rational Equations
Section 7.7 Solving Rational Equations 7 7.7 Solving Rational Equations When simplifying comple fractions in the previous section, we saw that multiplying both numerator and denominator by the appropriate
More informationMULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.
Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) All but one of these statements contain a mistake. Which could be true? A) There is a correlation
More information1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression
More information1 Determine whether an. 2 Solve systems of linear. 3 Solve systems of linear. 4 Solve systems of linear. 5 Select the most efficient
Section 3.1 Systems of Linear Equations in Two Variables 163 SECTION 3.1 SYSTEMS OF LINEAR EQUATIONS IN TWO VARIABLES Objectives 1 Determine whether an ordered pair is a solution of a system of linear
More informationFive 5. Rational Expressions and Equations C H A P T E R
Five C H A P T E R Rational Epressions and Equations. Rational Epressions and Functions. Multiplication and Division of Rational Epressions. Addition and Subtraction of Rational Epressions.4 Comple Fractions.
More informationExample: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
More informationINTRODUCTION TO ERRORS AND ERROR ANALYSIS
INTRODUCTION TO ERRORS AND ERROR ANALYSIS To many students and to the public in general, an error is something they have done wrong. However, in science, the word error means the uncertainty which accompanies
More informationthe Median-Medi Graphing bivariate data in a scatter plot
the Median-Medi Students use movie sales data to estimate and draw lines of best fit, bridging technology and mathematical understanding. david c. Wilson Graphing bivariate data in a scatter plot and drawing
More informationStatistics 151 Practice Midterm 1 Mike Kowalski
Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and
More informationFormula for linear models. Prediction, extrapolation, significance test against zero slope.
Formula for linear models. Prediction, extrapolation, significance test against zero slope. Last time, we looked the linear regression formula. It s the line that fits the data best. The Pearson correlation
More informationPolynomial and Synthetic Division. Long Division of Polynomials. Example 1. 6x 2 7x 2 x 2) 19x 2 16x 4 6x3 12x 2 7x 2 16x 7x 2 14x. 2x 4.
_.qd /7/5 9: AM Page 5 Section.. Polynomial and Synthetic Division 5 Polynomial and Synthetic Division What you should learn Use long division to divide polynomials by other polynomials. Use synthetic
More information11. Analysis of Case-control Studies Logistic Regression
Research methods II 113 11. Analysis of Case-control Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationDescribing Relationships between Two Variables
Describing Relationships between Two Variables Up until now, we have dealt, for the most part, with just one variable at a time. This variable, when measured on many different subjects or objects, took
More informationSection 3-7. Marginal Analysis in Business and Economics. Marginal Cost, Revenue, and Profit. 202 Chapter 3 The Derivative
202 Chapter 3 The Derivative Section 3-7 Marginal Analysis in Business and Economics Marginal Cost, Revenue, and Profit Application Marginal Average Cost, Revenue, and Profit Marginal Cost, Revenue, and
More informationCommon Core Unit Summary Grades 6 to 8
Common Core Unit Summary Grades 6 to 8 Grade 8: Unit 1: Congruence and Similarity- 8G1-8G5 rotations reflections and translations,( RRT=congruence) understand congruence of 2 d figures after RRT Dilations
More informationAlgebra 1 Course Information
Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through
More informationPOLYNOMIAL FUNCTIONS
POLYNOMIAL FUNCTIONS Polynomial Division.. 314 The Rational Zero Test.....317 Descarte s Rule of Signs... 319 The Remainder Theorem.....31 Finding all Zeros of a Polynomial Function.......33 Writing a
More informationLINEAR INEQUALITIES. less than, < 2x + 5 x 3 less than or equal to, greater than, > 3x 2 x 6 greater than or equal to,
LINEAR INEQUALITIES When we use the equal sign in an equation we are stating that both sides of the equation are equal to each other. In an inequality, we are stating that both sides of the equation are
More informationEQUATIONS OF LINES IN SLOPE- INTERCEPT AND STANDARD FORM
. Equations of Lines in Slope-Intercept and Standard Form ( ) 8 In this Slope-Intercept Form Standard Form section Using Slope-Intercept Form for Graphing Writing the Equation for a Line Applications (0,
More informationAP STATISTICS REVIEW (YMS Chapters 1-8)
AP STATISTICS REVIEW (YMS Chapters 1-8) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with
More informationSlope-Intercept Equation. Example
1.4 Equations of Lines and Modeling Find the slope and the y intercept of a line given the equation y = mx + b, or f(x) = mx + b. Graph a linear equation using the slope and the y-intercept. Determine
More informationFairfield Public Schools
Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity
More informationDescriptive statistics; Correlation and regression
Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human
More informationMEASURES OF VARIATION
NORMAL DISTRIBTIONS MEASURES OF VARIATION In statistics, it is important to measure the spread of data. A simple way to measure spread is to find the range. But statisticians want to know if the data are
More informationIntroduction to Statistics for Psychology. Quantitative Methods for Human Sciences
Introduction to Statistics for Psychology and Quantitative Methods for Human Sciences Jonathan Marchini Course Information There is website devoted to the course at http://www.stats.ox.ac.uk/ marchini/phs.html
More informationDiagrams and Graphs of Statistical Data
Diagrams and Graphs of Statistical Data One of the most effective and interesting alternative way in which a statistical data may be presented is through diagrams and graphs. There are several ways in
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationMeasurement with Ratios
Grade 6 Mathematics, Quarter 2, Unit 2.1 Measurement with Ratios Overview Number of instructional days: 15 (1 day = 45 minutes) Content to be learned Use ratio reasoning to solve real-world and mathematical
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationNEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS
NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS TEST DESIGN AND FRAMEWORK September 2014 Authorized for Distribution by the New York State Education Department This test design and framework document
More information5.1. A Formula for Slope. Investigation: Points and Slope CONDENSED
CONDENSED L E S S O N 5.1 A Formula for Slope In this lesson ou will learn how to calculate the slope of a line given two points on the line determine whether a point lies on the same line as two given
More informationA Quick Algebra Review
1. Simplifying Epressions. Solving Equations 3. Problem Solving 4. Inequalities 5. Absolute Values 6. Linear Equations 7. Systems of Equations 8. Laws of Eponents 9. Quadratics 10. Rationals 11. Radicals
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationAlgebra I Vocabulary Cards
Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationThe Method of Least Squares
Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this
More informationThe correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
More informationIntegrating algebraic fractions
Integrating algebraic fractions Sometimes the integral of an algebraic fraction can be found by first epressing the algebraic fraction as the sum of its partial fractions. In this unit we will illustrate
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationGetting Correct Results from PROC REG
Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking
More informationRegression and Correlation
Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis
More informationPie Charts. proportion of ice-cream flavors sold annually by a given brand. AMS-5: Statistics. Cherry. Cherry. Blueberry. Blueberry. Apple.
Graphical Representations of Data, Mean, Median and Standard Deviation In this class we will consider graphical representations of the distribution of a set of data. The goal is to identify the range of
More informationLESSON EIII.E EXPONENTS AND LOGARITHMS
LESSON EIII.E EXPONENTS AND LOGARITHMS LESSON EIII.E EXPONENTS AND LOGARITHMS OVERVIEW Here s what ou ll learn in this lesson: Eponential Functions a. Graphing eponential functions b. Applications of eponential
More informationUsing Excel for inferential statistics
FACT SHEET Using Excel for inferential statistics Introduction When you collect data, you expect a certain amount of variation, just caused by chance. A wide variety of statistical tests can be applied
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationName Partners Date. Energy Diagrams I
Name Partners Date Visual Quantum Mechanics The Next Generation Energy Diagrams I Goal Changes in energy are a good way to describe an object s motion. Here you will construct energy diagrams for a toy
More informationtable to see that the probability is 0.8413. (b) What is the probability that x is between 16 and 60? The z-scores for 16 and 60 are: 60 38 = 1.
Review Problems for Exam 3 Math 1040 1 1. Find the probability that a standard normal random variable is less than 2.37. Looking up 2.37 on the normal table, we see that the probability is 0.9911. 2. Find
More informationElements of a graph. Click on the links below to jump directly to the relevant section
Click on the links below to jump directly to the relevant section Elements of a graph Linear equations and their graphs What is slope? Slope and y-intercept in the equation of a line Comparing lines on
More informationWEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6
WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent before-thefact, expected values. In particular, the beta coefficient used in
More informationThe Point-Slope Form
7. The Point-Slope Form 7. OBJECTIVES 1. Given a point and a slope, find the graph of a line. Given a point and the slope, find the equation of a line. Given two points, find the equation of a line y Slope
More information1.7 Graphs of Functions
64 Relations and Functions 1.7 Graphs of Functions In Section 1.4 we defined a function as a special type of relation; one in which each x-coordinate was matched with only one y-coordinate. We spent most
More informationIntroduction to Quantitative Methods
Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................
More information