Correlation and Regression

Size: px
Start display at page:

Download "Correlation and Regression"

Transcription

1 Correlation and Regression Fathers and daughters heights Fathers heights mean = 67.7 SD = height (inches) Daughters heights mean = 63.8 SD = height (inches) Reference: Pearson and Lee (1906) Biometrika 2: pairs

2 Fathers and daughters heights 70 corr = 0.52 Daughter s height (inches) Father s height (inches) Reference: Pearson and Lee (1906) Biometrika 2: pairs Covariance and correlation Let X and Y be random variables with µ X = E(X), µ Y = E(Y), σ X = SD(X), σ Y = SD(Y) For example, sample a father/daughter pair and let X = the father s height and Y = the daughter s height. Covariance Correlation cov(x,y) = E{(X µ X ) (Y µ Y )} cor(x, Y) = cov(x, Y) σ X σ Y cov(x,y) can be any real number 1 cor(x, Y) 1

3 Examples corr = 0 corr = 0.1 corr =! Y 0 Y 20 Y 20!1! !3!2! corr = 0.3 corr = 0.5 corr =! Y Y Y corr = 0.7 corr = 0.9 corr =!0.9 Y Y Y Estimated correlation Consider n pairs of data: (x 1,y 1 ), (x 2,y 2 ), (x 3,y 3 ),..., (x n,y n ) We consider these as independent draws from some bivariate distribution. We estimate the correlation in the underlying distribution by: r = i (x i x)(y i ȳ) i (x i x) 2 i (y i ȳ) 2 This is sometimes called the correlation coefficient.

4 Correlation measures linear association All three plots have correlation 0.7! Correlation versus regression Covariance / correlation: Quantifies how two random variables X and Y co-vary. There is typically no particular order between the two random variables (e. g., fathers versus daughters height). Regression Assesses the relationship between predictor X and response Y: we model E[Y X]. The values for the predictor are often deliberately chosen, and are therefore not random quantities. We typically assume that we observe the values for the predictor(s) without error.

5 Example Measurements of degradation of heme with different concentrations of hydrogen peroxide (H 2 O 2 ), for different types of heme. A A and B A B OD OD H2O2 concentration H2O2 concentration Linear regression 140 Y = X Y = X Y 80 Y = X Y = 0 + 5X X

6 Linear regression 3 2! 1 Y 1 1! 0 0! X The regression model Let X be the predictor and Y be the response. Assume we have n observations (x 1, y 1 ),..., (x n, y n ) from X and Y. The simple linear regression model is y i =β 0 + β 1 x i + ɛ i, ɛ i iid N(0,σ 2 ). This implies: E[Y X] = β 0 + β 1 X. Interpretation: For two subjects that differ by one unit in X, we expect the responses to differ by β 1. How do we estimate β 0, β 1, σ 2?

7 Fitted values and residuals We can write ɛ i =y i β 0 β 1 x i For a pair of estimates ( ˆβ 0, ˆβ 1 ) for the pair of parameters (β 0, β 1 ) we define the fitted values as ŷ i = ˆβ 0 + ˆβ 1 x i The residuals are ˆɛ i =y i ŷ i =y i ˆβ 0 ˆβ 1 x i Residuals Y Y Y^ "^ X

8 Residual sum of squares For every pair of values for β 0 and β 1 we get a different value for the residual sum of squares. RSS(β 0, β 1 )= i (y i β 0 β 1 x i ) 2 We can look at RSS as a function of β 0 and β 1. We try to minimize this function, i. e. we try to find ( ˆβ 0, ˆβ 1 )=min β0,β 1 RSS(β 0, β 1 ) Hardly surprising, this method is called least squares estimation. Residual sum of squares RSS b0 b1

9 Notation Assume we have n observations: (x 1, y 1 ),..., (x n, y n ). i x = x i n i ȳ = y i n = i (x i x) 2 = i x 2 i n( x) 2 SYY = i (y i ȳ) 2 = i y 2 i n(ȳ) 2 SXY = i (x i x)(y i ȳ)= i x i y i n xȳ RSS = i (y i ŷ i ) 2 = i ˆɛ 2 i Parameter estimates The function RSS(β 0, β 1 )= i (y i β 0 β 1 x i ) 2 is minimized by ˆβ 1 = SXY ˆβ 0 = ȳ ˆβ 1 x

10 Useful to know Using the parameter estimates, our best guess for any y given x is y= ˆβ 0 + ˆβ 1 x Hence ˆβ 0 + ˆβ 1 x = ȳ ˆβ 1 x + ˆβ 1 x = ȳ That means every regression line goes through the point ( x, ȳ). Variance estimates As variance estimate we use ˆσ 2 = RSS n 2 This quantity is called the residual mean square. It has the following property: In particular, this implies (n 2) ˆσ2 σ 2 χ2 n 2 E(ˆσ 2 )=σ 2

11 Example H 2 O 2 concentration We get x=21.25, ȳ=0.27, = , SXY= 16.48, RSS= Therefore ˆβ 1 = = , ˆβ0 = 0.27 ( ) = 0.353, ˆσ = = Example 0.35 Y = 0.353! X 0.30 OD H2O2 concentration

12 Comparing models We want to test whether β 1 = 0: H 0 : y i = β 0 + ɛ i versus H a : y i = β 0 + β 1 x i + ɛ i Fit under H a Fit under H o y x Example 0.35 Y = 0.353! X Y = OD H2O2 concentration

13 Sum of squares Under H a : RSS = i (y i ŷ i ) 2 = SYY (SXY)2 = SYY ˆβ 2 1 Under H 0 : (y i ˆβ 0 ) 2 = i i (y i ȳ) 2 = SYY Hence SS reg = SYY RSS = (SXY)2 ANOVA Source df SS MS F regression on X 1 SS reg MS reg = SS reg 1 MS reg MSE residuals for full model n 2 RSS MSE = RSS n 2 total n 1 SYY

14 Example Source df SS MS F regression on X residuals for full model total Parameter estimates One can show that E( ˆβ 0 )=β 0 E( ˆβ 1 )=β 1 ( Var( ˆβ 0 )=σ 2 1 n + x ) 2 Var( ˆβ 1 )= σ2 Cov( ˆβ 0, ˆβ 1 )= σ 2 x Cor( ˆβ 0, ˆβ 1 )= x x 2 + /n Note: We re thinking of the x s as fixed.

15 Parameter estimates One can even show that the distribution of ˆβ 0 and ˆβ 1 is a bivariate normal distribution! ( ) ˆβ0 ˆβ 1 N(β, Σ) where β = ( β0 β 1 ) and Σ = σ 2 1 n + x2 x x 1 Simulation: coefficients!0.0034! slope!0.0038!0.0040!0.0042! y!intercept

16 Possible outcomes OD H 2 O 2 Confidence intervals We know that ˆβ 0 N ( (β 0, σ 2 1 n + x )) 2 ( ˆβ 1 N β 1, σ 2 ) We can use those distributions for hypothesis testing and to construct confidence intervals!

17 Statistical inference We want to test: H 0 : β 1 = β 1 versus H a : β 1 β 1 (generally, β1 is 0.) We use t= ˆβ 1 β 1 se( ˆβ 1 ) t n 2 where se( ˆβ 1 )= ˆσ 2 Also, [ ˆβ1 t (1 α 2 ),n 2 se( ˆβ 1 ), ˆβ 1 + t (1 α 2 ),n 2 se( ˆβ ] 1 ) is a (1 α) 100% confidence interval for β 1. Results The calculations in the test H 0 : β 0 = β0 versus H a : β 0 β0 are analogous, except that we have to use ( se( ˆβ 1 0 )= ˆσ2 n + x ) 2 For the example we get the 95% confidence intervals (0.342, 0.364) for the intercept ( , ) for the slope Testing whether the intercept (slope) is equal to zero, we obtain 70.7 ( 22.0) as test statistic. This corresponds to a p-value of ( ).

18 Now how about that Testing for the slope being equal to zero, we use t= ˆβ 1 se( ˆβ 1 ) For the squared test statistic we get t 2 = ( ˆβ1 se( ˆβ 1 ) ) 2 = ˆβ 2 1 ˆσ 2 / = ˆβ 2 1 ˆσ 2 = (SYY RSS)/1 RSS/n 2 = MS reg MSE = F The squared t statistic is the same as the F statistic from the ANOVA! Joint confidence region A 95% joint confidence region for the two parameters is the set of all values (β 0, β 1 ) that fulfill ( ) T ( β0 n i x )( ) i β 1 i x β0 i i x2 i β 1 2ˆσ 2 F (0.95),2,n-2 where β 0 = β 0 ˆβ 0 and β 1 = β 1 ˆβ 1.

19 Joint confidence region!^1!^0 Notation Assume we have n observations: (x 1, y 1 ),..., (x n, y n ). We previously defined = i SYY = i (x i x) 2 = i (y i ȳ) 2 = i x 2 i n( x) 2 y 2 i n(ȳ) 2 SXY = i (x i x)(y i ȳ) = i x i y i n xȳ We also define r XY = SXY SYY (called the sample correlation)

20 Coefficient of determination We previously wrote Define SS reg = SYY RSS = (SXY)2 R 2 = SS reg SYY =1 RSS SYY R 2 is often called the coefficient of determination. Notice that R 2 = SS reg SYY = (SXY) 2 SYY =r2 XY The Anscombe Data!^0 =3.0!^1 =0.5 #^2=13.75 R 2 =0.667!^0 =3.0!^1 =0.5 #^2=13.75 R 2 = !^0 =3.0!^1 =0.5 #^2=13.75 R 2 =0.667!^0 =3.0!^1 =0.5 #^2=13.75 R 2 =

21 Fathers and daughters heights 70 corr = 0.52 Daughter s height (inches) Father s height (inches) Linear regression 70 Daughter s height (inches) Father s height (inches)

22 Linear regression 70 Daughter s height (inches) Father s height (inches) Regression line 70 Daughter s height (inches) Father s height (inches) Slope = r SD(Y) / SD(X)

23 SD line 70 Daughter s height (inches) Father s height (inches) Slope = SD(Y) / SD(X) SD line vs regression line 70 Daughter s height (inches) Father s height (inches) Both lines go through the point ( X, Ȳ).

24 Predicting father s ht from daughter s ht 70 Daughter s height (inches) Father s height (inches) Predicting father s ht from daughter s ht 70 Daughter s height (inches) Father s height (inches)

25 Predicting father s ht from daughter s ht 70 Daughter s height (inches) Father s height (inches) There are two regression lines! 70 Daughter s height (inches) Father s height (inches)

26 The equations Regression of y on x (for predicting y from x) Slope = r SD(y) SD(x) Goes through the point ( x, ȳ) ŷ ȳ = r SD(y) SD(x) (x x) ŷ = ˆβ 0 + ˆβ 1 x where ˆβ 1 = r SD(y) SD(x) and ˆβ 0 = ȳ ˆβ 1 x Regression of x on y (for predicting x from y) Slope = r SD(x) SD(y) Goes through the point (ȳ, x) ˆx x = r SD(x) SD(y) (y ȳ) ˆx = ˆβ 0 + ˆβ 1 y where ˆβ 1 = r SD(x) SD(y) and ˆβ 0 = x ˆβ 1 ȳ Estimating the mean response 0.35 Y = 0.353! X 0.30 OD H2O2 concentration We can use the regression results to predict the expected response for a new concentration of hydrogen peroxide. But what is its variability?

27 Variability of the mean response Let ŷ be the predicted mean for some x, i. e. ŷ= ˆβ 0 + ˆβ 1 x Then E(ŷ) = β 0 + β 1 x ( ) var(ŷ) = σ 2 1 n + (x x) 2 where y = β 0 + β 1 x is the true mean response. Why? E(ŷ) =E( ˆβ 0 + ˆβ 1 x) =E( ˆβ 0 )+xe( ˆβ 1 ) = β 0 + x β 1 var(ŷ) = var( ˆβ 0 + ˆβ 1 x) = var( ˆβ 0 )+var( ˆβ 1 x)+2 cov( ˆβ 0, ˆβ 1 x) = var( ˆβ 0 )+x 2 var( ˆβ 1 )+2xcov( ˆβ 0, ˆβ 1 ) ( = σ 2 1 n + x ) 2 ( ) x + σ 2 2 2x x σ 2 [ 1 = σ 2 n + (x x) 2 ]

28 Confidence intervals Hence ŷ ± t (1 α 2 ),n 2 ˆσ 1 n + (x x) 2 is a (1 α) 100% confidence interval for the mean response given x. Confidence limits 95% confidence limits for the mean response OD H2O2 concentration

29 Prediction Now assume that we want to calculate an interval for the predicted response y for a value of x. There are two sources of uncertainty: (a) the mean response (b) the natural variation σ 2 The variance of ŷ is ( var(ŷ )=σ 2 + σ 2 1 n + (x x) 2 ) =σ 2 ( n + (x x) 2 ) Prediction intervals Hence ŷ ± t (1 α 2 ),n 2 ˆσ n + (x x) 2 is a (1 α) 100% prediction interval for the predicted response given x. When n is very large, we get roughly ŷ ± t (1 α 2 ),n 2 ˆσ

30 Prediction intervals % confidence limits for the mean response 95% confidence limits for the prediction 0.30 OD H2O2 concentration Span and height Height (inches) Span (inches)

31 With just 100 individuals Height (inches) Span (inches) Regression for calibration That prediction interval is for the case that the x s are known without error while y=β 0 + β 1 x + ɛ where ɛ= error Another common situation: We have a number of pairs (x,y) to get a calibration line/curve. x s basically without error; y s have measurement error. We obtain a new value, y, and want to estimate the corresponding x : y =β 0 + β 1 x + ɛ

32 Example Y X Another example Y X

33 Regression for calibration Data: (x i,y i ) for i = 1,...,n with y i =β 0 + β 1 x i + ɛ i, ɛ i iid Normal(0, σ) y j for j = 1,...,m with y j =β 0 + β 1 x + ɛ j, ɛ j iid Normal(0, σ) for some x Goal: Estimate x and give a 95% confidence interval. The estimate: Obtain ˆβ 0 and ˆβ 1 by regressing the y i on the x i. Let ˆx =(ȳ ˆβ 0 )/ ˆβ 1 where ȳ = j y j /m 95% CI for ˆx Let T denote the 97.5th percentile of the t distr n with n 2 d.f. Let g = T / [ ˆβ 1 / (ˆσ/ )] = (T ˆσ) / ( ˆβ 1 ) If g 1, we would fail to reject H 0 : β 1 =0! In this case, the 95% CI for ˆx is (, ). If g < 1, our 95% CI is the following: ˆx ± (ˆx x) g 2 +(Tˆσ / ˆβ 1 ) (ˆx x) 2 / +(1 g 2 )( 1 m + 1 n ) 1 g 2 For very large n, this reduces to approximately ˆx ±(T ˆσ) / ( ˆβ 1 m)

34 Example Y X Another example Y X

35 Infinite m Y X Infinite n Y X

36 Multiple linear regression A and B 0.35 A B OD H2O2 concentration Multiple linear regression general parallel concurrent coincident

37 Multiple linear regression A and B 0.35 A B OD H2O2 concentration More than one predictor # Y X 1 X The model with two parallel lines can be described as Y=β 0 + β 1 X 1 + β 2 X 2 + ɛ In other words (or, equations): { β0 + β 1 X 1 + ɛ if X 2 =0 Y= (β 0 + β 2 )+β 1 X 1 + ɛ if X 2 =1

38 Multiple linear regression A multiple linear regression model has the form Y=β 0 + β 1 X β k X k + ɛ, ɛ N(0, σ 2 ) The predictors (the X s) can be categorical or numerical. Often, all predictors are numerical or all are categorical. And actually, categorical variables are converted into a group of numerical ones. Interpretation Let X 1 be the age of a subject (in years). E[Y] = β 0 + β 1 X 1 Comparing two subjects who differ by one year in age, we expect the responses to differ by β 1. Comparing two subjects who differ by five years in age, we expect the responses to differ by 5β 1.

39 Interpretation Let X 1 be the age of a subject (in years), and let X 2 be an indicator for the treatment arm (0/1). E[Y] = β 0 + β 1 X 1 + β 2 X 2 Comparing two subjects from the same treatment arm who differ by one year in age, we expect the responses to differ by β 1. Comparing two subjects of the same age from the two different treatment arms (X 2 =1 versus X 2 =0), we expect the responses to differ by β 2. Interpretation Let X 1 be the age of a subject (in years), and let X 2 be an indicator for the treatment arm (0/1). E[Y] = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 E[Y] = β 0 + β 1 X 1 (if X 2 =0) E[Y] = β 0 + β 1 X 1 + β 2 + β 3 X 1 = β 0 + β 2 + (β 1 + β 3 ) X 1 (if X 2 =1) Comparing two subjects who differ by one year in age, we expect the responses to differ by β 1 if they are in the control arm (X 2 =0), and expect the responses to differ by β 1 + β 3 if they are in the treatment arm (X 2 =1).

40 Estimation We have the model y i = β 0 + β 1 x i1 + + β k x ik + ɛ i, ɛ i iid Normal(0, σ 2 ) We estimate the β s by the values for which RSS = i (y i ŷ i ) 2 is minimized where ŷ i = ˆβ 0 + ˆβ 1 x i1 + + ˆβ k x ik (aka least squares ). RSS We estimate σ by ˆσ = n (k + 1) FYI Calculation of the ˆβ s (and their SEs and correlations) is not that complicated, but without matrix algebra, the formulas are nasty. Here is what you need to know: The SEs of the ˆβ s involve σ and the x s. The ˆβ s are normally distributed. Obtain confidence intervals for the β s using ˆβ ± t ŜE( ˆβ) where t is a quantile of t dist n with n (k+1) d.f. Test H 0 : β =0using ˆβ /ŜE( ˆβ) Compare this to a t distribution with n (k+1) d.f.

41 The example: a full model x 1 = [H 2 O 2 ]. x 2 = 0 or 1, indicating type of heme. y = the OD measurement. The model: i.e., y = β 0 + β 1 X 1 + β 2 X 2 + β 3 X 1 X 2 + ɛ β 0 + β 1 X 1 + ɛ if X 2 =0 y = (β 0 + β 2 ) + (β 1 + β 3 )X 1 + ɛ if X 2 =1 β 2 =0 Same intercepts. β 3 =0 Same slopes. β 2 = β 3 =0 Same lines. Results Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) < 2e-16 x e-15 x x1:x Residual standard error: on 20 degrees of freedom Multiple R-Squared: 0.98,Adjusted R-squared: F-statistic: on 3 and 20 DF, p-value: < 2.2e-16

42 Testing many parameters We have the model y i = β 0 + β 1 x i1 + + β k x ik + ɛ i, ɛ i iid Normal(0, σ 2 ) We seek to test H 0 : β r+1 = = β k = 0. In other words, do we really have just: y i = β 0 + β 1 x i1 + + β r x ir + ɛ i, ɛ i iid Normal(0, σ 2 )? What to do Fit the full model (with all k x s). 2. Calculate the residual sum of squares, RSS full. 3. Fit the reduced model (with only r x s). 4. Calculate the residual sum of squares, RSS red. 5. Calculate F = (RSS red RSS full )/(df red df full ) RSS full /df full. where df red =n r 1 and df full =n k 1). 6. Under H 0,F F(df red df full, df full ).

43 In particular... Assume the model y i = β 0 + β 1 x i1 + + β k x ik + ɛ i, ɛ i iid Normal(0, σ 2 ) We seek to test H 0 : β 1 = = β k =0 (i.e., none of the x s are related to y). Full model: All the x s Reduced model: y = β 0 + ɛ RSS red = i (y i ȳ) 2 F=[( i (y i ȳ) 2 i (y i ŷ i ) 2 )/k] / [ i (y i ŷ i ) 2 /(n k 1)] Compare this to a F(k, n k 1) dist n. The example To test β 2 = β 3 =0 Analysis of Variance Table Model 1: y x1 Model 2: y x1 + x2 + x1:x2 Res.Df RSS Df Sum of Sq F Pr(>F) e-05

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Testing for Lack of Fit

Testing for Lack of Fit Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller

More information

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Joint Exam 1/P Sample Exam 1

Joint Exam 1/P Sample Exam 1 Joint Exam 1/P Sample Exam 1 Take this practice exam under strict exam conditions: Set a timer for 3 hours; Do not stop the timer for restroom breaks; Do not look at your notes. If you believe a question

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables. SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

More information

Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Part II. Multiple Linear Regression

Part II. Multiple Linear Regression Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Week 5: Multiple Linear Regression

Week 5: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM 1986-2010

MULTIPLE REGRESSIONS ON SOME SELECTED MACROECONOMIC VARIABLES ON STOCK MARKET RETURNS FROM 1986-2010 Advances in Economics and International Finance AEIF Vol. 1(1), pp. 1-11, December 2014 Available online at http://www.academiaresearch.org Copyright 2014 Academia Research Full Length Research Paper MULTIPLE

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

Addressing Alternative. Multiple Regression. 17.871 Spring 2012

Addressing Alternative. Multiple Regression. 17.871 Spring 2012 Addressing Alternative Explanations: Multiple Regression 17.871 Spring 2012 1 Did Clinton hurt Gore example Did Clinton hurt Gore in the 2000 election? Treatment is not liking Bill Clinton 2 Bivariate

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics UNIVERSITY OF DUBLIN TRINITY COLLEGE Faculty of Engineering, Mathematics and Science School of Computer Science & Statistics BA (Mod) Enter Course Title Trinity Term 2013 Junior/Senior Sophister ST7002

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Topic 3. Chapter 5: Linear Regression in Matrix Form

Topic 3. Chapter 5: Linear Regression in Matrix Form Topic Overview Statistics 512: Applied Linear Models Topic 3 This topic will cover thinking in terms of matrices regression on multiple predictor variables case study: CS majors Text Example (NKNW 241)

More information

Factors affecting online sales

Factors affecting online sales Factors affecting online sales Table of contents Summary... 1 Research questions... 1 The dataset... 2 Descriptive statistics: The exploratory stage... 3 Confidence intervals... 4 Hypothesis tests... 4

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

More information

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5

Copyright 2007 by Laura Schultz. All rights reserved. Page 1 of 5 Using Your TI-83/84 Calculator: Linear Correlation and Regression Elementary Statistics Dr. Laura Schultz This handout describes how to use your calculator for various linear correlation and regression

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

General Regression Formulae ) (N-2) (1 - r 2 YX

General Regression Formulae ) (N-2) (1 - r 2 YX General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

An analysis method for a quantitative outcome and two categorical explanatory variables.

An analysis method for a quantitative outcome and two categorical explanatory variables. Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU

PITFALLS IN TIME SERIES ANALYSIS. Cliff Hurvich Stern School, NYU PITFALLS IN TIME SERIES ANALYSIS Cliff Hurvich Stern School, NYU The t -Test If x 1,..., x n are independent and identically distributed with mean 0, and n is not too small, then t = x 0 s n has a standard

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Simple Linear Regression

Simple Linear Regression Chapter Nine Simple Linear Regression Consider the following three scenarios: 1. The CEO of the local Tourism Authority would like to know whether a family s annual expenditure on recreation is related

More information

ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS

ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS ORTHOGONAL POLYNOMIAL CONTRASTS INDIVIDUAL DF COMPARISONS: EQUALLY SPACED TREATMENTS Many treatments are equally spaced (incremented). This provides us with the opportunity to look at the response curve

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015

Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Multicollinearity Richard Williams, University of Notre Dame, http://www3.nd.edu/~rwilliam/ Last revised January 13, 2015 Stata Example (See appendices for full example).. use http://www.nd.edu/~rwilliam/stats2/statafiles/multicoll.dta,

More information

Algebra 1 Course Information

Algebra 1 Course Information Course Information Course Description: Students will study patterns, relations, and functions, and focus on the use of mathematical models to understand and analyze quantitative relationships. Through

More information

Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready

More information

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear.

Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. Multiple Regression in SPSS This example shows you how to perform multiple regression. The basic command is regression : linear. In the main dialog box, input the dependent variable and several predictors.

More information

Chapter 7. One-way ANOVA

Chapter 7. One-way ANOVA Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Stat 704 Data Analysis I Probability Review

Stat 704 Data Analysis I Probability Review 1 / 30 Stat 704 Data Analysis I Probability Review Timothy Hanson Department of Statistics, University of South Carolina Course information 2 / 30 Logistics: Tuesday/Thursday 11:40am to 12:55pm in LeConte

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Correlation and Regression

Correlation and Regression Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

Getting Correct Results from PROC REG

Getting Correct Results from PROC REG Getting Correct Results from PROC REG Nathaniel Derby, Statis Pro Data Analytics, Seattle, WA ABSTRACT PROC REG, SAS s implementation of linear regression, is often used to fit a line without checking

More information

DATA INTERPRETATION AND STATISTICS

DATA INTERPRETATION AND STATISTICS PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480 1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

More information

How To Run Statistical Tests in Excel

How To Run Statistical Tests in Excel How To Run Statistical Tests in Excel Microsoft Excel is your best tool for storing and manipulating data, calculating basic descriptive statistics such as means and standard deviations, and conducting

More information