Section 14 Simple Linear Regression: Introduction to Least Squares Regression


 Jack Washington
 4 years ago
 Views:
Transcription
1 Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship between two variables. If the researcher is working with numeric measures and supposes a linear relationship between these two variables, the appropriate measure of association is correlation. Additionally, if a particular set of assumptions is met, we can predict one of the two variables (an outcome) based on the other variable (a predictor ); this is called simple linear regression. Further, a researcher may wish to understand the relationships among more than two variables. This can be done with an extension of simple linear regression, called multiple linear regression. Recall, any statistical hypothesis test is a method for quantifying how much evidence constitutes enough evidence to declare a significant outcome in a research study. The hypothesis being tested by a correlation, and also by simple linear regression, is whether two variables have a significant linear association with each other.
2 Slide 2 Linear Regression: Examples Is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Does estriol level of a mother have a linear relationship with the birthweight of her baby? Can we predict birthweight of a baby from a mother s estriol level? Does the age at which a child first begins talking predict a score of mental ability later in childhood? Is there a linear relationship between systolic blood pressure and age? 2 We learned when we have a measure of two continuous variables we can describe this relationship visually with a scatterplot. In addition, if that relationship appears to be linear, we can measure the strength and direction of the linear association. Finally, if certain assumptions are met, we may be able to predict the value of one measure from another measure. For example, is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? What is the relationship between the number of people living on farms and the passing of time from 1935 to In other words, how fast did the number of people living on farms in the US decrease from 1935 to 1990? What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Can we predict plasma volume in the blood from a person s body weight? How well? Does estriol level of a mother have a linear relationship with the birthweight of her baby? Can we predict birthweight of a baby from a mother s estriol level? If so, can we anticipate a low birthweight baby from estriol levels? Does the age at which a child first begins talking predict a score of mental ability later in childhood? Is there a linear relationship between systolic blood pressure and age? In all of these examples, we are investigating the relationship between two quantitative variables. We may begin this investigation with a scatterplot followed by a correlation analysis. We will now take our investigation further by introducing simple linear regression.
3
4 Slide 3 Simple Linear Regression Simple Linear Regression(SLR) analysis is used to quantify the linear relationship between two quantitative variables. In this way, it is like correlation, but regression goes farther: It allows us to draw the line that best describes the linear relationship between X and Y. It allows us to predict the value of the outcome Y for a specified value of X. It allows us to quantify how much of a change in the value of Y is seen with a specified change in the value of X. In other studies the goal is to assess the relationships among a set of variables. 3 Simple linear regression analysis is used to quantify the linear relationship between two quantitative variables. In this way, it is like correlation, but regression goes farther: It allows us to draw the line that best describes the linear relationship between X and Y. It allows us to predict the value of the outcome Y for a specified value of X. It allows us to quantify how much of a change in the value of Y is seen with a specified change in the value of X.
5 Slide 4 Variable (X) and Variable (Y) We can describe the relationship or association between two quantitative variables using: Scatterplot Correlation Simple linear regression Usually we identify one variable as the outcome of interest, what we have been mostly thinking of as a disease variable so far. This is often called the response, or dependent, variable. The other variable is the predictor of interest, what we have been mostly thinking of as an exposure variable so far. This is often called the explanatory, or independent, variable. 4 Recall, usually we identify one variable as the outcome of interest, what we have been mostly thinking of as a disease variable so far. This is often called the response, or dependent, variable. The other variable is the predictor of interest, what we have been mostly thinking of as an exposure variable so far. This is often called the explanatory, or independent, variable. When each unit (person) has two measures we usually call one x and one y. If one variable can help predict the value of the other variable we call this variable x. It is also called the predictor, explanatory or independent variables. The other variable, y, is called the outcome, response variable or dependent variable. Sometimes we cannot tell which is the predictor and which is the outcome. Simple linear regression requires we pick one variable as the outcome.
6 Slide 5 Wine Consumption and Heart Disease Is higher wine consumption associated with lower rates of hear disease? What is the nature of this relationship? Is the relationship linear? Moore and McCabe, Introduction to the Practice of Statistics 4 th Edition, W. H. Freeman & Co., New York.. 5 Here is some data on wine consumption and heart disease deaths. Does this data suggest a linear relationship between these two variables?
7 Slide 6 Wine Consumption and Heart Disease 6 The data suggest a negative trend. Can we estimate how much lower heart disease rates are for each extra liter per person per year? How would we draw a line through this data to help us with this estimate? What can we say about the precision of this regression line? How much of the variability in heart disease deaths is explained by the regression line? Do you think these data come from a random sample? What assumptions are we making when using linear regression to make predictions? What confounders must we consider? These are all concepts we will investigate with linear regression.
8 Slide 7 Population Living on Farms What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? 7 What is the relationship between the number of people living on farms and the passing of time from 1935 to How fast did the number of people living on farms in the US decrease? Does this data suggest a linear relationship between these two variables?
9 Slide 8 Population Living on Farms. How fast did the number of people living on farms in the US decrease? 8 We can see a strong negative trend that appears fairly linear. How might we draw a line through this data? Is there a best way to draw this line?
10 Slide 9 Plasma Volume and Body Weight What is the relationship between plasma volume in the blood and body weight? Do these two measures have a linear relationship? Body Plasma Subject Weight(kg) Volume(l) Consider the association between bodyweight in kilograms and plasma volume in the blood in liters for eight randomly selected people. Do heavier people have more plasma? If so, how much more? Is this relationship linear?
11 Slide 10 Simple Linear Regression Y, plasma volume (liters) Pearson s correlation = X, body weight (kg) 10 When we plot the data we can see a positive relationship between bodyweight and plasma. The data do not fall perfectly in a line. The correlation value when calculated is of We could calculate the value of correlation to help us understand the strength of the linear relationship. We may want to draw a line through this data, thus giving us a mathematical model to estimate plasma volume from weight, but which is the best line? The white line, the green line or the purple line? The technique of least squares regression will help us pick the line of best fit.
12 Slide 11 How Do We Choose the Best Line? The least squares regression line is the line which gets closest to all of the points How do we measure closeness to more than one point? minimize n (y i point_on_line i ) 2 i=1 11 The line of best fit is the regression line is the line that gets `closest' to all the data points. `Closeness' is measured as the vertical distance from the line to the data points. Specifically, the regression line is the one that minimizes the sum of all the squared vertical distances, hence estimation of this line is called least squares and the line is called the least square regression line.
13 Slide 12 Simple Linear Regression 12 Visually, we find the line that minimizes the squares of the vertical distances and the positive measures (points above the line) and the negative measures (points below the line), sum to zero. This could be very difficult to achieve by trial and error. We have some mathematical formulas that help us determine this exact line.
14 Slide 13 Equation of a Line Definition A line is defined by The intercept a (where the line crosses the vertical axis, the value of Y when X = 0), and The slope b (`rise over run,' how much y changes for each 1 unit change in x). y = a + bx 13 Before we move further with linear regression, let s review the equation of a line. That is, how do we represent a line with a mathematical function. A line is defined by the intercept a (where the line crosses the vertical axis, the value of Y when X = 0), and the slope b (`rise over run,' how much y changes for each 1 unit change in x). We write this as y = a + bx.
15 Slide 14 Equation of a Line 14 We can see the line crosses the vertical axis at the value a, when x = 0. We also see that for every one unit increase in x, y will change by the amount b.
16 Slide 15 Equation of a Line: Statistical Notation b b 0 1 = intercept = slope ˆ = b + b x y In statistics, the symbol for the intercept is b knot and the symbol for the slope is b sub one. Then we write the line as : y hat equals b0 + b1x. The reason we use yhat instead of y is to differentiate between the real data value y and our predicted value yhat given a value of x.
17 Slide 16 Equation of a Line: Statistical Notation y ˆ = b + b x y 0 1 b 0 b 1 slope intercept 0 x 16 Using statistical notation, we have the same picture as before. Here the line crosses the vertical axis at the value b knot, when x = 0. We also see that for every one unit increase in x, yhat will change by the amount b sub 1.
18 Slide 17 Estimating Intercept and Slope b b 0 1 = y b x s = r s y x 1 yˆ = b + b x The least squares line minimizes the sum of squared vertical distances. This translates into: b knot equal ybar slope times xbar. The slope is the correlation times the ratio of the standard deviation of the observed y values divided by the standard deviation of the observed x values. In this way, we see the slope and the correlation are related to one another. The correlation depends on both the slope and the precision. The equations are obtained using mathematics beyond this course. It is enough to understand that these are the equations to help us determine the least squares regression line, y hat = b not plus b sub 1 times x.
19 Slide 18 y y Slope and Correlation b >0 1 b 1 = 0 b 1 < 0 0 x 18 Notice if the slope is positive then the correlation is positive. If the slope is zero then the correlation is zero. If the slope is negative then the correlation is negative.
20 Slide 19 Simple Linear Regression Y, plasma volume (liters) Pearson s correlation = X, body weight (kg) 19 The data points are represented as the dots in our scatterplot, but the data points don't fall exactly on the line. How do we compute (and write) the least squares line for this data? Once we have the line, for any x value within the range of those values in our dataset, yhat is the point that will fall exactly on the least squares line, not the data value for y. Thus every x value can be plugged into this equation to calculate a predicted y value which we denote yhat.
21 Slide 20 Estimating Intercept and Slope sy b1 = r = s x = b = y b x = (66.875) = yˆ = x 20 Using the equations for estimating the slope and intercept for the least squares regression line, we get an intercept of and a slope of We must calculate the slope first because the equation for the intercept requires the use of the estimate of the slope. Generally, we do not do these calculations by hand. We use software to compute these values.
22 Slide 21 Plasma Volume and Weight yˆ = x 21 Using R we plot the least square regression line. This means for every one kilogram increase in body weight there is on average a liter increase in plasma volume. The intercept is the estimated plasma volume for a person who weighs zero kilograms. This estimate does not make biological sense. In this way, the intercept for this model is merely used to help us determine the line, not make a prediction at x = 0. The only meaningful estimates are within the range of our x values. That is weights from about 55 to 75 kilograms.
23 Slide 22 Plasma Volume and Weight Measurement of plasma volume very time consuming Body weight easy to measure: use equation and body weight to estimate plasma volume yˆ = x = (60) = Measuring plasma volume is very time consuming. We may want to estimate the plasma volume of a person outside this study based on the person s weight. For example, what on average would you expect plasma volume to be in liters for a 60 kilogram man? We would put 60 kilograms in for x and then calculate the estimated value to be 2.7 liters. That is, yhat equals * 60. Be very careful only to make estimates within the range of the data that was used to estimate the regression line. Also, be aware that measurement unit is meaningful. We would not want to insert values in pounds when the regression line is based on kilograms.
24 Slide 23 RSQUARE The square of the correlation (r 2= RSQUARE) is the fraction of the variation in the values of y that is explained by the least squares regression of y on x. r 2 variance of predicted values ŷ = variance of observed values of y = SSM SST 23 Recall Pearson s correlation: It measures the strength of the linear relationship between two quantitative variables. There is another measure called the coefficient of determination. It s value is Pearson s correlation squared. For this reason, it is often denoted RSQUARE. When using least squares regression typically the value of the coefficient of determination is used to help understand the amount of total variation that is explained by the regression of y on x. In fact, RSQUARE = SSM/SST. This is the sum of the squares of the model divided by the sum of the squares total. Those values will come from the ANOVA table in the linear regression output from the software. We will discuss the ANOVA table at length in a later lesson.
25 Slide 24 Plasma Volume and Weight This means 57.6% of the variation in plasma volume is explained by the least squares regression line of plasma volume on body weight. r 2 = 2 (0.759) = Recall, the correlation between plasma volume and weight is It we square this value, we have the coefficient of determination. The value is This means 57.6% of the variation in plasma volume is explained by the least squares regression line of plasma volume on body weight. When RSQUARE is close to 1, the regression line (the yhat values) is representing the original data (the Y values) well. When RSQUARE is close to 0, the regression line is not representing the original data well.
26 Slide 25 Simple Linear Regression: Residuals 25 When we draw the least squares regression line, the line of best fit, the line does not fall directly on all the data points. That is, the yhat values are different than the actual y values for the data. We call these vertical distances Residuals.
27 Slide 26 Residuals Model ˆ = b + b x y 0 1 ε = i y i yˆ i ε i =difference between observed and predicted value of response for each value of x => Called the residual. 26 y yhat for each piece of data is the residual for that point. This value is often denoted with epsilon sub i. We can calculate the value at any x in our dataset by taking the observed y value minus the predicted value, yhat from the model. If the residual is positive, it means the data value is above the line. If the residual is negative, the data value is below the line. We will use residuals and residual plots in our next lesson to investigate how well the linear model is fitting the data observed.
28 Slide 27 Estriol and Infant Birthweight Obstetricians sometimes order tests for estriol levels from 24hour urine specimens taken from pregnant women who are near term. The level of estriol (mg/24 hours) has been found to be positively related to the birthweight (grams/100) of the infant. Thus, the test can provide indirect evidence of an abnormally small fetus. [Bernard Rosner, Fundamentals of Biostatistics, page 425] 27 Let s do an another example. Obstetricians sometimes order tests for estriol levels from 24hour urine specimens taken from pregnant women who are near term, since the level of estriol has been found to be related to the birthweight of the infant. The test may provide indirect evidence of an abnormally small fetus.
29 Slide 28 Estriol and Infant Birthweight Pearson' s Correlation, r = Here is the scatterplot of birthweight and Estriol for 31 women and babies. We can see that there is a positive relationship between estriol level and birthweight. The relationship is not perfect, but linear regression may still help with predictions. The Pearson s correlation value is Notice that birthweight is in g/100. We will want to know this unit later for our calculations.
30 Slide 29 Estriol and Infant Birthweight yˆ = x 29 The values of the slope and intercept can be calculated using software, or by using the equations given in earlier slides. The prediction line shown on the scatterplot is yhat = x. This means for every one unit increase in estriol level the birthweight of the infant is on average g/100 higher, about 60 grams.
31 Slide 30 Estriol and Infant Birthweight Using estriol level to predict infant birthweight when estriol level is 10mg. yˆ = x = (10) = 27.6 grams/ Suppose we want to estimate the birthweight of a baby whose mother has an estriol level of 10 mg. Before we begin, we verify 10 mg is in the range of the original data. We can do this by looking at the scatterplot of the data. We can then put 10 mg in the least squares regression equation for x and calculate an estimated weight of 27.6 g/100. This is 2,760 grams.
32 Slide 31 Estriol and Infant Birthweight Using estriol level to predict infant birthweight when estriol level is 30mg. 31 Suppose we want to estimate the birthweight of a baby whose mother has an estriol level of 30 mg. Before we begin, we verify 30 mg is in the range of the original data. We can do this by looking at the scatterplot of the data. We see that 30mg is NOT in the range of the x data for our study. We should not use the regression line to estimate infant birthweight!
33 Slide 32 Estriol and Infant Birthweight Now let's go in the reverse direction: Low birthweight may be defined as infant birthweight less than 2500 grams. For what estriol level is the predicted infant birthweight equal to 2500 grams? (First convert to the correct units: 2500 grams = 25 grams/100.) 25 = x = 0.608x = x = x 32 Now let's go in the reverse direction: Low birthweight may be defined as infant birthweight less than 2500 grams. For what estriol level is the predicted infant birthweight equal to 2500 grams? First we must convert to the correct units: 2500 grams = 25 grams/100. If you set 25 = x and then solve for x, you will find the estriol level that predicts a low birthweight baby. The value of x is 5.72 mg.
34 Slide 33 Assumptions L = linear relationship between y and x. I = independence between values of y. (Value of one y does not affect value of another y). N = normality of error around each value of y. E= equality of variance around y for each value of x. 33 Linear regression requires we make some assumptions. Conveniently, these assumptions follow the acronym LINE. These assumptions are: L = = linear relationship between y and x. I = independence between values of y. One value of y does not affect another value of y. N = normality of error around each value of y. E= equality of variance around y for each value of x. Our next lesson will explore techniques to evaluate each of these assumptions.
35 Slide 34 Cautions Predicted values should only be computed for X values that fall within the range of X values in the original data. Just like a correlation, a regression line only summarizes the linear relationship between X and Y. If the relationship is truly nonlinear, then using the regression line can be misleading. Seeing a relationship (an association) between X and Y does not imply causation: that changes in X will cause changes in Y. 34 In addition to evaluating linear regression assumptions, we must take caution with the interpretation of our results. Predicted values should only be computed for X values that fall within the range of X values in the original data. Just like a correlation, a regression line only summarizes the linear relationship between X and Y. If the relationship is truly nonlinear, then using the regression line can be misleading. Seeing a relationship (an association) between X and Y does not imply causation: that changes in X will cause changes in Y.
36 Slide 35 Cautions In the regression context, a lurking variable is a third variable that may influence the relationship between X and Y. Outliers and skewed data can impact the regression line, just like they can impact the correlation. Either X or Y or both could have outliers or skewness. If including a particular data point changes the regression line compared to when it is not included, the data point is called influential. 35 In the regression context, a lurking variable is a third variable that may influence the relationship between X and Y. Outliers and skewed data can impact the regression line, just like they can impact the correlation. Either X or Y or both could have outliers or skewness. If including a particular data point changes the regression line compared to when it is not included, the data point is called influential. Does that seem like many `cautions'? It is: as we learn methods that are more complicated, there will often be more limits on their use and interpretation.
Linear Regression. Chapter 5. Prediction via Regression Line Number of new birds and Percent returning. Least Squares
Linear Regression Chapter 5 Regression Objective: To quantify the linear relationship between an explanatory variable (x) and response variable (y). We can then predict the average response for all subjects
More informationSection 3 Part 1. Relationships between two numerical variables
Section 3 Part 1 Relationships between two numerical variables 1 Relationship between two variables The summary statistics covered in the previous lessons are appropriate for describing a single variable.
More information2. Simple Linear Regression
Research methods  II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according
More informationCorrelation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables 2
Lesson 4 Part 1 Relationships between two numerical variables 1 Correlation Coefficient The correlation coefficient is a summary statistic that describes the linear relationship between two numerical variables
More informationChapter 10. Key Ideas Correlation, Correlation Coefficient (r),
Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables
More informationExercise 1.12 (Pg. 2223)
Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.
More informationChapter 7: Simple linear regression Learning Objectives
Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) 
More informationX X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)
CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.
More informationUnit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression
Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a
More informationCorrelation key concepts:
CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)
More informationUnivariate Regression
Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is
More informationRelationships Between Two Variables: Scatterplots and Correlation
Relationships Between Two Variables: Scatterplots and Correlation Example: Consider the population of cars manufactured in the U.S. What is the relationship (1) between engine size and horsepower? (2)
More informationSimple linear regression
Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between
More informationExample: Boats and Manatees
Figure 96 Example: Boats and Manatees Slide 1 Given the sample data in Table 91, find the value of the linear correlation coefficient r, then refer to Table A6 to determine whether there is a significant
More informationSPSS Guide: Regression Analysis
SPSS Guide: Regression Analysis I put this together to give you a stepbystep guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar
More information" Y. Notation and Equations for Regression Lecture 11/4. Notation:
Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through
More informationDescribing Relationships between Two Variables
Describing Relationships between Two Variables Up until now, we have dealt, for the most part, with just one variable at a time. This variable, when measured on many different subjects or objects, took
More informationCALCULATIONS & STATISTICS
CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 15 scale to 0100 scores When you look at your report, you will notice that the scores are reported on a 0100 scale, even though respondents
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More information17. SIMPLE LINEAR REGRESSION II
17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.
More informationModule 3: Correlation and Covariance
Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis
More informationCorrelation and Regression
Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationChapter 9 Descriptive Statistics for Bivariate Data
9.1 Introduction 215 Chapter 9 Descriptive Statistics for Bivariate Data 9.1 Introduction We discussed univariate data description (methods used to eplore the distribution of the values of a single variable)
More informationHYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION
HYPOTHESIS TESTING: CONFIDENCE INTERVALS, TTESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate
More informationSimple Regression Theory II 2010 Samuel L. Baker
SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the
More informationAnswer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade
Statistics Quiz Correlation and Regression  ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements
More informationMeans, standard deviations and. and standard errors
CHAPTER 4 Means, standard deviations and standard errors 4.1 Introduction Change of units 4.2 Mean, median and mode Coefficient of variation 4.3 Measures of variation 4.4 Calculating the mean and standard
More informationHomework 8 Solutions
Math 17, Section 2 Spring 2011 Homework 8 Solutions Assignment Chapter 7: 7.36, 7.40 Chapter 8: 8.14, 8.16, 8.28, 8.36 (ad), 8.38, 8.62 Chapter 9: 9.4, 9.14 Chapter 7 7.36] a) A scatterplot is given below.
More informationThe importance of graphing the data: Anscombe s regression examples
The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 3031, 2008 B. Weaver, NHRC 2008 1 The Objective
More informationReview of Fundamental Mathematics
Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decisionmaking tools
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationNCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )
Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates
More information11. Analysis of Casecontrol Studies Logistic Regression
Research methods II 113 11. Analysis of Casecontrol Studies Logistic Regression This chapter builds upon and further develops the concepts and strategies described in Ch.6 of Mother and Child Health:
More informationSession 7 Bivariate Data and Analysis
Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table covariation least squares
More informationCorrelation. What Is Correlation? Perfect Correlation. Perfect Correlation. Greg C Elvers
Correlation Greg C Elvers What Is Correlation? Correlation is a descriptive statistic that tells you if two variables are related to each other E.g. Is your related to how much you study? When two variables
More information1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ
STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material
More informationSimple Predictive Analytics Curtis Seare
Using Excel to Solve Business Problems: Simple Predictive Analytics Curtis Seare Copyright: Vault Analytics July 2010 Contents Section I: Background Information Why use Predictive Analytics? How to use
More information2013 MBA Jump Start Program. Statistics Module Part 3
2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just
More informationChapter 13 Introduction to Linear Regression and Correlation Analysis
Chapter 3 Student Lecture Notes 3 Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing
More information6.4 Normal Distribution
Contents 6.4 Normal Distribution....................... 381 6.4.1 Characteristics of the Normal Distribution....... 381 6.4.2 The Standardized Normal Distribution......... 385 6.4.3 Meaning of Areas under
More informationSTAT 350 Practice Final Exam Solution (Spring 2015)
PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects
More information. 58 58 60 62 64 66 68 70 72 74 76 78 Father s height (inches)
PEARSON S FATHERSON DATA The following scatter diagram shows the heights of 1,0 fathers and their fullgrown sons, in England, circa 1900 There is one dot for each fatherson pair Heights of fathers and
More informationLecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation
Lecture 11: Chapter 5, Section 3 Relationships between Two Quantitative Variables; Correlation Display and Summarize Correlation for Direction and Strength Properties of Correlation Regression Line Cengage
More informationCURVE FITTING LEAST SQUARES APPROXIMATION
CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship
More informationStat 412/512 CASE INFLUENCE STATISTICS. Charlotte Wickham. stat512.cwick.co.nz. Feb 2 2015
Stat 412/512 CASE INFLUENCE STATISTICS Feb 2 2015 Charlotte Wickham stat512.cwick.co.nz Regression in your field See website. You may complete this assignment in pairs. Find a journal article in your field
More informationDescriptive statistics; Correlation and regression
Descriptive statistics; and regression Patrick Breheny September 16 Patrick Breheny STA 580: Biostatistics I 1/59 Tables and figures Descriptive statistics Histograms Numerical summaries Percentiles Human
More informationAlgebra 1 2008. Academic Content Standards Grade Eight and Grade Nine Ohio. Grade Eight. Number, Number Sense and Operations Standard
Academic Content Standards Grade Eight and Grade Nine Ohio Algebra 1 2008 Grade Eight STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express
More informationCOMP6053 lecture: Relationship between two variables: correlation, covariance and rsquared. jn2@ecs.soton.ac.uk
COMP6053 lecture: Relationship between two variables: correlation, covariance and rsquared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution
More informationAP STATISTICS REVIEW (YMS Chapters 18)
AP STATISTICS REVIEW (YMS Chapters 18) Exploring Data (Chapter 1) Categorical Data nominal scale, names e.g. male/female or eye color or breeds of dogs Quantitative Data rational scale (can +,,, with
More information5/31/2013. 6.1 Normal Distributions. Normal Distributions. Chapter 6. Distribution. The Normal Distribution. Outline. Objectives.
The Normal Distribution C H 6A P T E R The Normal Distribution Outline 6 1 6 2 Applications of the Normal Distribution 6 3 The Central Limit Theorem 6 4 The Normal Approximation to the Binomial Distribution
More informationCourse Objective This course is designed to give you a basic understanding of how to run regressions in SPSS.
SPSS Regressions Social Science Research Lab American University, Washington, D.C. Web. www.american.edu/provost/ctrl/pclabs.cfm Tel. x3862 Email. SSRL@American.edu Course Objective This course is designed
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measuresoffit in multiple regression Assumptions
More informationCHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression
Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the
More informationLinear Models in STATA and ANOVA
Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 42 A Note on NonLinear Relationships 44 Multiple Linear Regression 45 Removal of Variables 48 Independent Samples
More informationCORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREERREADY FOUNDATIONS IN ALGEBRA
We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREERREADY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical
More information1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96
1 Final Review 2 Review 2.1 CI 1propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years
More informationThe correlation coefficient
The correlation coefficient Clinical Biostatistics The correlation coefficient Martin Bland Correlation coefficients are used to measure the of the relationship or association between two quantitative
More information1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number
1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x  x) B. x 3 x C. 3x  x D. x  3x 2) Write the following as an algebraic expression
More informationCorrelation and Simple Linear Regression
Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a
More informationUNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014
UNIVERSITY OF TORONTO SCARBOROUGH Department of Computer and Mathematical Sciences Midterm Test March 2014 STAB22H3 Statistics I Duration: 1 hour and 45 minutes Last Name: First Name: Student number: Aids
More informationUNDERSTANDING THE TWOWAY ANOVA
UNDERSTANDING THE e have seen how the oneway ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables
More informationLesson 1: Comparison of Population Means Part c: Comparison of Two Means
Lesson : Comparison of Population Means Part c: Comparison of Two Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationFinal Exam Practice Problem Answers
Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal
More informationAssociation Between Variables
Contents 11 Association Between Variables 767 11.1 Introduction............................ 767 11.1.1 Measure of Association................. 768 11.1.2 Chapter Summary.................... 769 11.2 Chi
More informationII. DISTRIBUTIONS distribution normal distribution. standard scores
Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,
More informationChapter 3 Quantitative Demand Analysis
Managerial Economics & Business Strategy Chapter 3 uantitative Demand Analysis McGrawHill/Irwin Copyright 2010 by the McGrawHill Companies, Inc. All rights reserved. Overview I. The Elasticity Concept
More informationAdditional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm
Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jintselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm
More informationPreAlgebra 2008. Academic Content Standards Grade Eight Ohio. Number, Number Sense and Operations Standard. Number and Number Systems
Academic Content Standards Grade Eight Ohio PreAlgebra 2008 STANDARDS Number, Number Sense and Operations Standard Number and Number Systems 1. Use scientific notation to express large numbers and small
More informationIntroduction to Regression and Data Analysis
Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it
More informationModule 5: Multiple Regression Analysis
Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College
More informationModeration. Moderation
Stats  Moderation Moderation A moderator is a variable that specifies conditions under which a given predictor is related to an outcome. The moderator explains when a DV and IV are related. Moderation
More informationMicroeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS
DUSP 11.203 Frank Levy Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS These notes have three purposes: 1) To explain why some simple calculus formulae are useful in understanding
More information4. Multiple Regression in Practice
30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look
More informationDealing with Data in Excel 2010
Dealing with Data in Excel 2010 Excel provides the ability to do computations and graphing of data. Here we provide the basics and some advanced capabilities available in Excel that are useful for dealing
More informationAP Physics 1 and 2 Lab Investigations
AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks
More information5. Multiple regression
5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful
More informationCharacteristics of Binomial Distributions
Lesson2 Characteristics of Binomial Distributions In the last lesson, you constructed several binomial distributions, observed their shapes, and estimated their means and standard deviations. In Investigation
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationPremaster Statistics Tutorial 4 Full solutions
Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for
More informationStatistical Models in R
Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 16233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate
More informationAn analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression
Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship
More informationEDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION
EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 510 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day
More informationRegression and Correlation
Regression and Correlation Topics Covered: Dependent and independent variables. Scatter diagram. Correlation coefficient. Linear Regression line. by Dr.I.Namestnikova 1 Introduction Regression analysis
More informationMULTIPLE REGRESSION WITH CATEGORICAL DATA
DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting
More informationPart 1 : 07/27/10 21:30:31
Question 1  CIA 593 III64  Forecasting Techniques What coefficient of correlation results from the following data? X Y 1 10 2 8 3 6 4 4 5 2 A. 0 B. 1 C. Cannot be determined from the data given. D.
More informationTesting for Lack of Fit
Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit
More informationStatistics courses often teach the twosample ttest, linear regression, and analysis of variance
2 Making Connections: The TwoSample ttest, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the twosample
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is Rsquared? Rsquared Published in Agricultural Economics 0.45 Best article of the
More informationPolynomial and Rational Functions
Polynomial and Rational Functions Quadratic Functions Overview of Objectives, students should be able to: 1. Recognize the characteristics of parabolas. 2. Find the intercepts a. x intercepts by solving
More informationSOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS
SOLUTIONS TO BIOSTATISTICS PRACTICE PROBLEMS BIOSTATISTICS DESCRIBING DATA, THE NORMAL DISTRIBUTION SOLUTIONS 1. a. To calculate the mean, we just add up all 7 values, and divide by 7. In Xi i= 1 fancy
More informationBusiness Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.
Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGrawHill/Irwin, 2008, ISBN: 9780073319889. Required Computing
More informationAPPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING
APPLICATION OF LINEAR REGRESSION MODEL FOR POISSON DISTRIBUTION IN FORECASTING Sulaimon Mutiu O. Department of Statistics & Mathematics Moshood Abiola Polytechnic, Abeokuta, Ogun State, Nigeria. Abstract
More informationSouth Carolina College and CareerReady (SCCCR) Probability and Statistics
South Carolina College and CareerReady (SCCCR) Probability and Statistics South Carolina College and CareerReady Mathematical Process Standards The South Carolina College and CareerReady (SCCCR)
More informationGraphical Integration Exercises Part Four: Reverse Graphical Integration
D4603 1 Graphical Integration Exercises Part Four: Reverse Graphical Integration Prepared for the MIT System Dynamics in Education Project Under the Supervision of Dr. Jay W. Forrester by Laughton Stanley
More informationSolución del Examen Tipo: 1
Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical
More informationWEB APPENDIX. Calculating Beta Coefficients. b Beta Rise Run Y 7.1 1 8.92 X 10.0 0.0 16.0 10.0 1.6
WEB APPENDIX 8A Calculating Beta Coefficients The CAPM is an ex ante model, which means that all of the variables represent beforethefact, expected values. In particular, the beta coefficient used in
More information