BINF 702 Chapter 11 Regression and Correlation Methods. Chapter 11 Regression and Correlation Methods (SPRING 2014) 1

Size: px
Start display at page:

Download "BINF 702 Chapter 11 Regression and Correlation Methods. Chapter 11 Regression and Correlation Methods (SPRING 2014) 1"

Transcription

1 BINF 702 Chapter 11 Regression and Correlation Methods (SPRING 2014) 1

2 Section 11.1 Introduction Example 11.1 Obstetrics Obstetricians sometimes order tests for estriol levels from 24-hour urine specimens taken from pregnant women who are near term, since the level of estriol has been found to be related to the birthweight of the infant. The test can provide indirect evidence of an abnormally small fetus. The relationship between estriol level and birthweight can be quantified by fitting a regression line that relates the two variables. Example 11.2 Hypertension Much discussion has taken place in the literature concerning the familial aggregation of blood pressure. In general, children whose parents have high blood pressure tend to have higher blood pressure than their peers. One way of expressing this relationship is to compute a correlation coefficient relating the blood pressure of parents and children over a large collection of families. (SPRING 2014) 2

3 Section 11.2 General Concepts Let us return to our consideration of the relationship between estriol level and birthweight data. Let x = estriol level and y = birthweight. We might posit a relationship such as Eq E(y x) = a + bx Our regression line is defined as Def y = a + bx, a is the y-intercept and b is the slope. It is expected of course that our regression line does not fit exactly. There will be some associated error to the fit. Eq y = a + bx + e where e ~ N(0,s 2 ) where x is the independent variable and y is the dependent variable. (SPRING 2014) 3

4 Section 11.2 General Concepts A linear regression fit for our birthweight data (SPRING 2014) 4

5 Section 11.2 General Concepts Some nuances of the fit We can vary noise. b may vary. (SPRING 2014) 5

6 Section 11.3 Fitting Regression Lines The Method of Least Squares Def The least-squares line, or estimated regression line, is the line y = a +bx minimizing the sum of squares distances of the sample points from the line given by S n d i1 2 i We choose this criteria because the math is tractable. Eq Estimation of the Least- Squares Line The coefficients of the least-squares line y = ax + b are given by L b and a y bx L i xy i1 i1 (SPRING xx 2014) n n y b x n i 6

7 Section 11.3 Fitting Regression Lines The Method of Least Squares Section 11.3 Fitting Regression Lines The Method of Least Squares L xx n i1 x 2 i n n i1 x i 2 DEF The predicted, or average, value of y for a given value of x, as estimated from the fitted regression line, is denoted by ŷ a bx L n x y xy i i i1 n n x i i1 i1 n y i (SPRING 2014) 7

8 Section 11.3 Fitting Regression Lines The Method of Least Squares Regression in R lm {stats} R Documentation Fitting Linear Models Description lm is used to fit linear models. It can be used to carry out regression, single stratum analysis of variance and analysis of covariance (although aov may provide a more convenient interface for these). Usage lm(formula, data, subset, weights, na.action, method = "qr", model = TRUE, x = FALSE, y = FALSE, qr = TRUE, singular.ok = TRUE, contrasts = NULL, offset,...) (SPRING 2014) 8

9 Section 11.3 Fitting Regression Lines The Method of Least Squares Regression in R (The Arguments) Formula a symbolic description of the model to be fit. The details of model specification are given below. Data an optional data frame containing the variables in the model. If not found in data, the variables are taken from environment(formula), typically the environment from which lm is called. Subset an optional vector specifying a subset of observations to be used in the fitting process. Weights an optional vector of weights to be used in the fitting process. If specified, weighted least squares is used with weights weights (that is, minimizing sum(w*e^2)); otherwise ordinary least squares is used. na.action a function which indicates what should happen when the data contain NAs. The default is set by the na.action setting of options, and is na.fail if that is unset. The factory-fresh default is na.omit. Another possible value is NULL, no action. (SPRING 2014) 9

10 Section 11.3 Fitting Regression Lines The Method of Least Squares Regression in R (The Arguments) Method the method to be used; for fitting, currently only method = "qr" is supported; method = "model.frame" returns the model frame (the same as with model = TRUE, see below). model, x, y, qr logicals. If TRUE the corresponding components of the fit (the model frame, the model matrix, the response, the QR decomposition) are returned. singular.ok logical. If FALSE (the default in S but not in R) a singular fit is an error. Contrasts an optional list. See the contrasts.arg of model.matrix.default. offsetthis can be used to specify an a priori known component to be included in the linear predictor during fitting. An offset term can be included in the formula instead or as well, and if both are specified their sum is used... additional arguments to be passed to the low level regression fitting functions (see below). (SPRING 2014) 10

11 Section 11.3 Fitting Regression Lines The Method of Least Squares Regression in R (Some of the Details) Models for lm are specified symbolically. A typical model has the form response ~ terms where response is the (numeric) response vector and terms is a series of terms which specifies a linear predictor for response. A terms specification of the form first + second indicates all the terms in first together with all the terms in second with duplicates removed. A specification of the form first:second indicates the set of terms obtained by taking the interactions of all terms in first with all terms in second. The specification first*second indicates the cross of first and second. This is the same as first + second + first:second. If response is a matrix a linear model is fitted to each column of the matrix. See model.matrix for some further details. The terms in the formula will be re-ordered so that main effects come first, followed by the interactions, all second-order, all third-order and so on: to avoid this pass a terms object as the formula. A formula has an implied intercept term. To remove this use either y ~ x - 1 or y ~ 0 + x. See formula for more details of allowed formulae. lm calls the lower level functions lm.fit, etc, see below, for the actual numerical computations. For programming only, you may consider doing likewise. All of weights, subset and offset are evaluated in the same way as variables in formula, that is first in data and then in the environment of formula. (SPRING 2014) 11

12 Section 11.3 Fitting Regression Lines The Method of Least Squares Regression in R (Some of the Details) lm returns an object of class "lm" or for multiple responses of class c("mlm", "lm"). The functions summary and anova are used to obtain and print a summary and analysis of variance table of the results. The generic accessor functions coefficients, effects, fitted.values and residuals extract various useful features of the value returned by lm. An object of class "lm" is a list containing at least the following components: Coefficients a named vector of coefficients Residuals the residuals, that is response minus fitted values. fitted.values the fitted mean values. rank the numeric rank of the fitted linear model. weights (only for weighted fits) the specified weights. df.residualthe residual degrees of freedom. call the matched call. terms the terms object used. contrasts (only where relevant) the contrasts used. xlevels (only where relevant) a record of the levels of the factors used in fitting. y if requested, the Chapter response 11 Regression used. and Correlation Methods x if requested, the model matrix (SPRING used. 2014) 12 model if requested (the default), the model frame used.

13 Section 11.3 Fitting Regression Lines The Method of Least Squares Example 11.8 Obstetrics Birthweight as a function of estriol in R. es = c(7,9,9,12,14,16,16,14,16,16,17,19,21,24,15,1 6,17,25,27,15,15,15,16,19,18,17,18,20,22,25,2 4) bw = c(25,25,25,27,27,27,24,30,30,31,30,31,30,28,3 2,32,32,32,34,34,34,35,35,34,35,36,37,38,40,3 9,43) library(stats) bw.lm = lm(bw ~ es) bw.lm$coefficients (Intercept) es plot(es,bw) lines(es, * es ) (SPRING 2014) 13

14 Section 11.4 Inferences About Parameters from Regression Lines EQ 11.5 Decomposition of the Total Sum of Squares into Regression and Residual Components Check out Figure 11.6 n 2 n 2 n 2 y y yˆ y y yˆ i i i i i1 i1 i1 Total Sum of Squares = Regression Sum of Squares + Residual Sum of Squares A good-fitting regression line will have regression components large in absolute value relative to the residual components whereas Chapter the 11 Regression opposite is and Correlation Methods true for poor fitting lines. (SPRING 2014) 14

15 F Test for Simple Linear Regression We will use the ratio of the regression sum of squares to the residual sum of squares as a regression test. A large ratio will indicate a good fit where we are testing H 0 : b = 0 versus H 1 :b!= 0 where b is the slope of the regression line. Some helpful notation Regression mean square (Reg MS) is (Reg SS)/k, the number of predictors in the model. Residual mean square, Res MS is (Res SS)/(n k 1). Df = (n k -1), the degrees of freedom of the residual sum of squares, Res df. In the literature Res MS = s 2 y,x Reg SS = bl xy = b 2 L xx = L 2 xy/l xx Res SS = Total SS Reg SS = L yy L 2 xy/l xx (SPRING 2014) 15

16 F Test for Simple Linear Regression Eq F Test for Simple Linear Regression To test H 0 : b = 0 versus H 1 : b!= 0, use the following procedure: 1) Compute the test statistic F = Reg MS/Res MS = (L 2 xy/l xx )/[L yy L 2 xy/l xx )(n-2)] that follows an F 1,n-2 distribution under H 0. 2) For a two-sided test with significance level a, if F > F 1,n-2,1-a then reject H 0 ; if F <= F 1,n-2,1-a then accept H 0. 3) The exact p-value is given by P(F 1,n-2 > F) (SPRING 2014) 16

17 F Test for Simple Linear Regression Def R 2 is defined as (Reg SS)/(Total SS) Interpretation of R 2 R 2 can be though of as the proportion of the variance of y that can be explained by the variable x R 2 = 1 all of the data points fall on the regression line R 2 = 0 x gives no information about the variance of y (SPRING 2014) 17

18 F Test for Simple Linear Regression The obstetrics data revisited in R > summary(bw.lm) Call: lm(formula = bw ~ es) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 *** es *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 29 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 1 and 29 DF, p-value: (SPRING 2014) 18

19 F Test for Simple Linear Regression Using aov in R to perform the regression fit on the obstetrics data > summary(aov(bw ~ es)) Df Sum Sq Mean Sq F value Pr(>F) es *** Residuals Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 (SPRING 2014) 19

20 t Test for Simple Linear Regression EQ 11.8 t Test for Simple Linear Regression To test the hypothesis H 0 : b = 0 versus H 1 : b!= 0, use the following procedure: 1) Compute the test statistic t = b/(s 2 yx/l xx ) 1/2 2) For a two-sided test with significance level a, if t > t n-2,1-a/2 or t <= t n-2,a/2 = -t n-2,1-a/2 Then reject H 0 ; if t n-2,1-a/2 <= t <= t n-2,1-a/2 Then accept H 0 3) The p-value is given by p = 2 x (area to the left of t under a t n-2 distribution) if t < 0 p = 2 x (area to the right of t under a t n-2 distribution) if t >= 0 (SPRING 2014) 20

21 F Test for Simple Linear Regression The R output of the obstetrics data revisited > summary(bw.lm) Call: lm(formula = bw ~ es) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 *** es *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 29 degrees of freedom Multiple R-Squared: Chapter , Regression and Adjusted Correlation Methods R-squared: F-statistic: on 1 and (SPRING ) DF, p-value:

22 11.5 Interval Estimation for Linear Regression Interval Estimates for Regression Parameters Under certain assumptions how well can we quantify the uncertainty in our estimates of the slope and y-intercept Interval Estimation for Predictions Made from Regression Line Under certain assumptions how well can we quantify the uncertainty in our estimates of the predicted values (SPRING 2014) 22

23 11.5 Interval Estimation for Linear Regression Interval Estimates for Regression Parameters Eq Standard Errors of Estimated Parameters in Simple Linear Regression se() b s L 2 yx xx se( a) s 2 yx 1 n x L 2 xx (SPRING 2014) 23

24 11.5 Interval Estimation for Linear Regression Interval Estimates for Regression Parameters Eq Two-Sided 100% x (1 a) Confidence Intervals for the Parameters of a Regression Line: If b and a are, respectively, the estimated slope and intercept of a regression line as given on the previous slide, i. e. se(b) and se(a) are the estimated standards errors, the the two-sided 100% x (1-a) confidence intervals for b and a are given by b t se() b n2,1 a / 2 a t se( a) n2,1 a / 2 (SPRING 2014) 24

25 Interval Estimates for Regression Parameters Confidence intervals on regression parameters in R > summary(bw.lm) Call: lm(formula = bw ~ es) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-09 *** es *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 29 degrees of freedom Multiple R-Squared: , (SPRING 2014) Adjusted R-squared: F-statistic: on 1 and 29 DF, p-value:

26 Interval Estimation for Predictions Made from Regression Lines A pedagogical example Forced expiratory volume (FEV) is a standard measure of pulmonary function. To identify people with abnormal pulmonary function, standards of FEV for normal people must be established. One problem here is that FEV is related to both age and height. Let us focus on boys who are ages and postulate a regression model for the form FEV = a + b(height) + e. Data were collected on FEV and height for 655 boys in this age group residing in Tecumseh, Michigan. The mean FEV in liters is presented for each of twelve 4-cmheight groups in the Table below. Find the best fitting regression line and test for statistical significance. What proportion of the variance of FEV can be explained by height? (SPRING 2014) 26

27 Interval Estimation for Predictions Made from Regression Lines Our FEV pedagogical example continued. Mean Mean Height FEV Height FEV (cm) (L) (cm) (L) (SPRING 2014) 27

28 Interval Estimation for Predictions Made from Regression Lines (SPRING 2014) 28

29 Interval Estimation for Predictions Made from Regression Lines EX Pulmonary Function suppose we wish to use the FEVheight regression line computed previously to develop normal ranges for 10- to 15-year-old boys of particular heights. In particular, consider John H., whose is 12 years old and 160 cm tall and whose FEV is 2.5 L. Can his FEV be considered abnormal for his age and height? (SPRING 2014) 29

30 Interval Estimation for Predictions Made from Regression Lines Eq Predictions Made from Regression Lines for Individual Observations Suppose we wish to make predictions from a regression line for an individual observations with independent variable x that was not used in constructing the regress line. The distribution of observed y values for the subset of individuals with independent variable x is normal with mean = and standard deviation given by ŷ a bx Furthermore, 100% x (1-a) of the observed values will fall within the interval yˆ t se ( yˆ) n2,1 a / 2 1 This interval is sometimes called a 100% x (1-a) prediction interval for y. 2 1 se1 yˆ syx 1 n x x 2 L xx (SPRING 2014) 30

31 Interval Estimation for Predictions Made from Regression Lines Predicted Confidence Intervals in R > new = list(ht=160) > predict(fev.lm,new,interval='prediction') fit lwr upr [1,] We note that John s observed value of 2.5 does not fall within the predicted interval. John merits follow up. (SPRING 2014) 31

32 Interval Estimation for Predictions Made from Regression Lines Suppose we wish to asses the mean FEV value for a large number of boys with the same x value? Eq Standard Error and Confidence Interval for Predictions Made from Regression Lines for the Average Value of y for a Given x The best estimate of the average value of y for a given x is ŷ a bx Its standard error is given by 2 ˆ se y s 2 yx 1 n x x 2 L xx Furthermore, a two-sided 100% x (1-a) confidence interval for he average value of y is n2,1 a / 2 2 ˆ yˆ t se y (SPRING 2014) 32

33 Interval Estimation for Predictions Made from Regression Lines Predicted Confidence Intervals in R for the average value of y > predict(fev.lm,new,interval='confidence') fit lwr upr [1,] This is sometimes denoted within the statistics community as the confidence interval for the regression function. (SPRING 2014) 33

34 Interval Estimation for Predictions Made from Regression Lines Example (SPRING 2014) 34

35 11.6 Assessing the Goodness of Fit of Regression Lines Eq Assumptions Made in Linear-Regression Models 1) For any given value of x, the corresponding value of y has an average value of a + bx, which is a linear function of x. 2) For any given value of x, the corresponding value of y is normally distributed about a + bx with the same variance s 2 for any x. 3) For any two data points (x 1, y 1 ), (x 2, y 2 ), the error terms e 1, e 2, are independent of each other. (SPRING 2014) 35

36 11.6 Assessing the Goodness of Fit of Regression Lines The simplest type of diagnostic plot. There may be more variability for larger values of es. Which assumption is this violating? (SPRING 2014) 36

37 11.6 Assessing the Goodness of Fit of Regression Lines Eq Standard Deviation of Residuals About Fitted Regression Line Let (x i, y i ) be a sample point used in estimating the regression line, y = a +bx. If y = a + bx is the estimated regression line, and The Studentized residual corresponding to the point (x i,y i ) is given by eˆ i sd eˆ i eˆi = residual for the point (x i, y i ) about the estimated regression line, then eˆ y ( a bx ) i i i and 1 sd( eˆ) ˆ2 i s 1 n x x 2 L xx (SPRING 2014) 37

38 11.6 Assessing the Goodness of Fit of Regression Lines (Regression Diagnostic Plots in R - I) (SPRING 2014) 38

39 11.6 Assessing the Goodness of Fit of Regression Lines (Regression Diagnostic Plots in R - II) (SPRING 2014) 39

40 11.6 Assessing the Goodness of Fit of Regression Lines (Interpreting the Regression Diagnostic Plots in R) Assessing uniformity of variance and linearity of residual structure. (SPRING 2014) 40

41 11.6 Assessing the Goodness of Fit of Regression Lines (Interpreting the Regression Diagnostic Plots in R) Assessing normality of residual structure with QQ plots. (SPRING 2014) 41

42 11.6 Assessing the Goodness of Fit of Regression Lines (Interpreting the Regression Diagnostic Plots in R) A few EDA type plots for assessment of normality. (SPRING 2014) 42

43 11.6 Assessing the Goodness of Fit of Regression Lines (Interpreting the Regression Diagnostic Plots in R) QQ plots for various types of distributions. (SPRING 2014) 43

44 11.6 Assessing the Goodness of Fit of Regression Lines (Interpreting the Regression Diagnostic Plots in R) Cook's Distance for the i-th observation is based on the differences between the predicted responses from the model constructed from all of the data and the predicted responses from the model constructed by setting the i-th observation aside. For each observation, the sum of squared residuals is divided by (p+1) times the Residual Mean Square from the full model. Some analysts suggest investigating observations for which Cook's distance is greater than 1. Others suggest looking at a dot plot to find extreme values. Cooks Distance Plots. (SPRING 2014) 44

45 11.6 Assessing the Goodness of Fit of Regression Lines (Interpreting the Regression Diagnostic Plots in R) A pedagogical example. Age is age at first word (x-values) and gesell (y-values) is the Gesell adaptive score. age = c(15,26,10,9,15,20,18,11,8,20,7,9,1 0,11,11,10,12,42,17,11,10) gesell = c(95,71,83,91,102,87,93,100,104,94, 113,96,83,84,102,100,105,57,121,86, 100) > plot(gesell ~ age) > identify(gesell ~ age) [1] (SPRING 2014) 45

46 11.6 Assessing the Goodness of Fit of Regression Lines (Interpreting the Regression Diagnostic Plots in R) Gesell example continued (SPRING 2014) 46

47 11.6 Assessing the Goodness of Fit of Regression Lines (Interpreting the Regression Diagnostic Plots in R) (SPRING 2014) 47

48 11.17 The Correlation Coefficient The sample correlation coefficient offers an alternative way to measure a linear association between variables. One can use it rather than the regression coefficient. The sample, Pearson, correlation coefficient is given by r = L xy /sqrt(l xx *L yy ) Properties of r r > 0 positively correlated r < 0 negatively correlated r = 0 uncorrelated (SPRING 2014) 48

49 11.17 The Correlation Coefficient Relationship between sample correlation coefficient r and the population correlation coefficient r r Lxy /( n 1) sxy L L ss xx yy x n 1 n1 y (SPRING 2014) 49

50 11.17 The Correlation Coefficient There is actually a simple relationship between the sample correlation coefficient and the regression coefficient b rs s x y So these two quantities really are just rescaled versions of one another (SPRING 2014) 50

51 11.17 The Correlation Coefficient The sample Pearson correlation coefficient, r, in R Example > es = c(7,9,9,12,14,16,16,14,16,16,17,19,21,24,15,16,17,25,27,15, 15,15,16,19,18,17,18,20,22,25,24) > bw = c(25,25,25,27,27,27,24,30,30,31,30,31,30,28,32,32,32,32,34, 34,34,35,35,34,35,36,37,38,40,39,43) > cor(es,bw,method='pearson') [1] (SPRING 2014) 51

52 11.8 Statistical Inference for Correlation Coefficients : One-Sample t-test for a Correlation Coefficient Eq One-sample t Test for a Correlation Coefficient To test the hypothesis H 0 : r = 0 versus H 1 : r!= 0, use the following procedure: 1) Compute the sample correlation coefficient r. 2) Compute the test statsitic t = r(n 2) 1/2 /(1 r 2 ) 1/2 Which under H 0 follows a t distribution with n 2 df. 3) For a two-sided level a test, if accept t > t n-2,1-a/2 or t < -t n-2,1-a/2 then reject H 0. If t n-2,1-a/2 <= t <=t n-2,1-a/2 4) The p-value is given by p = 2 * (area to the left of t under a t n-2 distribution) if t < 0 P = 2 * (area to the right of t under a t n-2 distribution) if t >= 0 5) We assume an underlying normal distribution for each of the random variables used to compute r. (SPRING 2014) 52

53 11.8 Statistical Inference for Correlation Coefficients : One-Sample t-test for a Correlation Coefficient Problem pg. 505 in R > logmort = c(-2.35, -2.20, -2.12,-1.95,-1.85,-1.80,-1.70,-1.58) > logcig = c(-0.26,-0.03,0.30,0.37,0.40,0.50,0.55,0.55) > cor(logmort,logcig) [1] > cor.test(logmort,logcig) Pearson's product-moment correlation data: logmort and logcig t = , df = 6, p-value = alternative hypothesis: true correlation is not equal to 0 95 percent confidence interval: sample estimates: cor (SPRING 2014) 53

54 11.8 Statistical Inference for Correlation Coefficients : One-Sample z-test for a Correlation Coefficient Eq One-Sample z Test for a Correlation Coefficient To test the hypothesis H 0 : r = r 0 versus H 1 :r!=r 0, use the following procedure: 1) Compute the sample correlation coefficient r and the z transformation of r. 2) Compute the test statistic l = (z z 0 )*sqrt(n-3) 3) If l > z 1-a/2 or l < -z 1-a/2 reject H 0. If z 1-a/2 <= l <= z 1-a/2 accept H 0. 4) The exact p-value is given by P = 2 * F(l) if l <= 0 P = 2 * [1 F(l)] if l > 0 5) Assume an underlying normal distribution for each of the random variables used to compute r and z. (SPRING 2014) 54

55 11.8 Statistical Inference for Correlation Coefficients : One-Sample z-test for a Correlation Coefficient 1 1 r 1 1 r 1 0 z ln N ln, under H0 2 1 r 2 1 r0 n3 z 0 (SPRING 2014) 55

56 11.8 Statistical Inference for Correlation Coefficients : One-Sample z-test for a Correlation Coefficient There is no implementation of this in R but this method is used to compute confidence intervals when the number of observation is larger than 6 when one calls cor.test (SPRING 2014) 56

57 11.9 Multiple Regression Consider Ex on pg. 466 of the text. Eq y = a + b 1 x 1 + b 2 x 2 + e where y is the systolic blood pressure, x 1 is birth weight and x 2 is the age in days where e ~ N(0, s 2 ). We choose the method of least square to minimize the sum of [y (a + b 1 x 1 + b 2 x 2 )] 2 In general if we have k independent variables x 1,, x k then a linearregression model relating y to x 1,, x k is of the form EQ k bj j, e ~ N(0, s 2 ) j1 (SPRING 2014) y a x e 57

58 11.9 Multiple Regression Def k j j j1 y a b x e Partial Regression Coefficients (SPRING 2014) 58

59 11.9 Multiple Regression Def The standardized regression coefficient b s is given by b * (s x /s y ) (SPRING 2014) 59

60 Hypothesis Testing Eq F Test for Testing the Hypothesis H 0 : b 1 = b 2 = b k = 0 versus H 1 :At least One of the b j!= 0 in Multiple Regression 1) Fit the regression parameters using the method of least squares, and compute Reg SS and Res SS Re s SS y yˆ i1 Re g SS Total SS Re s SS Total SS y y i1 yˆ a b x n n i j ij j1 k i i i 2 2 x jth independent variable for ith subject, j 1,, k; i 1,, n ik (SPRING 2014) 60

61 Hypothesis Testing Eq F Test for Testing the Hypothesis H 0 : b 1 = b 2 = b k = 0 versus H 1 :At least One of the b j!= 0 in Multiple Regression 2) Compute Reg MS = RegSS/k, RegMS = ResSS/(n-k-1) 3) Compute the test statistic F=Reg MS/Res MS which follows an F k,n-k-1 distribution under H 0. 4) For a level a test, F > F k, n-k,1-a then reject H 0 : If F <= F k,n-k,1-a then accept H 0 5) The exact p-value is given by the area to the right of F under an F k,n-k-1 distribution = P(F k,n-k-1 > F) (SPRING 2014) 61

62 Hypothesis Testing Eq t Test for Testing the Hypothesis H 0 :b l = 0, All Other bj!= 0 versus H 1 :b l!= 0, All other b j!= 0 in Multiple Linear Regression 1) Compute t = b l /se(b l ) 2) If t < t n-k-1,a/2 or t > t n-k-1,1-a/2 then reject H 0 If t n-k-1,a/2 <= t <= t n-k-1,1-a/2 then accept H 0 3) The exact p-value is given by 2 * P(t n-k-1 > t) if t >= 0 2 * P(t n-k-1 <=t) if t < 0 (SPRING 2014) 62

63 11.9 Multiple Regression (EX in R) > bwmv = c(135,120,100,105,130,125,125,105,120,90,120,95,120,150,160,125) > agemv = c(3,4,3,2,4,5,2,3,5,4,2,3,3,4,3,3) > bpmv = c(89, 90, 83, 77, 92, 98, 82, 85, 96, 95, 80, 79, 86, 97, 92,88) > bpmv.lm = lm(bpmv ~ bwmv + agemv) > summary(bpmv.lm) Call: lm(formula = bpmv ~ bwmv + agemv) Residuals: Min 1Q Median 3Q Max Coefficients: Estimate Std. Error t value Pr(> t ) (Intercept) e-08 *** bwmv ** agemv e-07 *** --- Signif. codes: 0 `***' `**' 0.01 `*' 0.05 `.' 0.1 ` ' 1 Residual standard error: on 13 degrees of freedom Multiple R-Squared: , Adjusted R-squared: F-statistic: on 2 and 13 DF, p-value: 9.844e-07 (SPRING 2014) 63

64 Regression Diagnostics (SPRING 2014) 64

65 11.9 Multiple Regression (EX in R) (SPRING 2014) 65

66 11.9 Multiple Regression (EX in R) (SPRING 2014) 66

67 Chapter 11 Homework ; , (SPRING 2014) 67

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

Using R for Linear Regression

Using R for Linear Regression Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Comparing Nested Models

Comparing Nested Models Comparing Nested Models ST 430/514 Two models are nested if one model contains all the terms of the other, and at least one additional term. The larger model is the complete (or full) model, and the smaller

More information

Correlation and Simple Linear Regression

Correlation and Simple Linear Regression Correlation and Simple Linear Regression We are often interested in studying the relationship among variables to determine whether they are associated with one another. When we think that changes in a

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

General Regression Formulae ) (N-2) (1 - r 2 YX

General Regression Formulae ) (N-2) (1 - r 2 YX General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard

More information

We extended the additive model in two variables to the interaction model by adding a third term to the equation.

We extended the additive model in two variables to the interaction model by adding a third term to the equation. Quadratic Models We extended the additive model in two variables to the interaction model by adding a third term to the equation. Similarly, we can extend the linear model in one variable to the quadratic

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

ANOVA. February 12, 2015

ANOVA. February 12, 2015 ANOVA February 12, 2015 1 ANOVA models Last time, we discussed the use of categorical variables in multivariate regression. Often, these are encoded as indicator columns in the design matrix. In [1]: %%R

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

Psychology 205: Research Methods in Psychology

Psychology 205: Research Methods in Psychology Psychology 205: Research Methods in Psychology Using R to analyze the data for study 2 Department of Psychology Northwestern University Evanston, Illinois USA November, 2012 1 / 38 Outline 1 Getting ready

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

More information

Week 5: Multiple Linear Regression

Week 5: Multiple Linear Regression BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Structure of models in R Model Assessment (Part IA) Anova

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Univariate Regression

Univariate Regression Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

More information

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996) MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Part II. Multiple Linear Regression

Part II. Multiple Linear Regression Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations

More information

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is

Lets suppose we rolled a six-sided die 150 times and recorded the number of times each outcome (1-6) occured. The data is In this lab we will look at how R can eliminate most of the annoying calculations involved in (a) using Chi-Squared tests to check for homogeneity in two-way tables of catagorical data and (b) computing

More information

3.4 Statistical inference for 2 populations based on two samples

3.4 Statistical inference for 2 populations based on two samples 3.4 Statistical inference for 2 populations based on two samples Tests for a difference between two population means The first sample will be denoted as X 1, X 2,..., X m. The second sample will be denoted

More information

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl

Statistiek II. John Nerbonne. October 1, 2010. Dept of Information Science j.nerbonne@rug.nl Dept of Information Science j.nerbonne@rug.nl October 1, 2010 Course outline 1 One-way ANOVA. 2 Factorial ANOVA. 3 Repeated measures ANOVA. 4 Correlation and regression. 5 Multiple regression. 6 Logistic

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

STAT 350 Practice Final Exam Solution (Spring 2015)

STAT 350 Practice Final Exam Solution (Spring 2015) PART 1: Multiple Choice Questions: 1) A study was conducted to compare five different training programs for improving endurance. Forty subjects were randomly divided into five groups of eight subjects

More information

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

More information

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne

Applied Statistics. J. Blanchet and J. Wadsworth. Institute of Mathematics, Analysis, and Applications EPF Lausanne Applied Statistics J. Blanchet and J. Wadsworth Institute of Mathematics, Analysis, and Applications EPF Lausanne An MSc Course for Applied Mathematicians, Fall 2012 Outline 1 Model Comparison 2 Model

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Testing for Lack of Fit

Testing for Lack of Fit Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

More information

12: Analysis of Variance. Introduction

12: Analysis of Variance. Introduction 1: Analysis of Variance Introduction EDA Hypothesis Test Introduction In Chapter 8 and again in Chapter 11 we compared means from two independent groups. In this chapter we extend the procedure to consider

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Generalized Linear Models

Generalized Linear Models Generalized Linear Models We have previously worked with regression models where the response variable is quantitative and normally distributed. Now we turn our attention to two types of models where the

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0.

Predictor Coef StDev T P Constant 970667056 616256122 1.58 0.154 X 0.00293 0.06163 0.05 0.963. S = 0.5597 R-Sq = 0.0% R-Sq(adj) = 0. Statistical analysis using Microsoft Excel Microsoft Excel spreadsheets have become somewhat of a standard for data storage, at least for smaller data sets. This, along with the program often being packaged

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

Lecture Notes Module 1

Lecture Notes Module 1 Lecture Notes Module 1 Study Populations A study population is a clearly defined collection of people, animals, plants, or objects. In psychological research, a study population usually consists of a specific

More information

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1

Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 Stat 5303 (Oehlert): Tukey One Degree of Freedom 1 > catch

More information

N-Way Analysis of Variance

N-Way Analysis of Variance N-Way Analysis of Variance 1 Introduction A good example when to use a n-way ANOVA is for a factorial design. A factorial design is an efficient way to conduct an experiment. Each observation has data

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Difference of Means and ANOVA Problems

Difference of Means and ANOVA Problems Difference of Means and Problems Dr. Tom Ilvento FREC 408 Accounting Firm Study An accounting firm specializes in auditing the financial records of large firm It is interested in evaluating its fee structure,particularly

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F

E(y i ) = x T i β. yield of the refined product as a percentage of crude specific gravity vapour pressure ASTM 10% point ASTM end point in degrees F Random and Mixed Effects Models (Ch. 10) Random effects models are very useful when the observations are sampled in a highly structured way. The basic idea is that the error associated with any linear,

More information

STATISTICA Formula Guide: Logistic Regression. Table of Contents

STATISTICA Formula Guide: Logistic Regression. Table of Contents : Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

More information

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance 2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Example G Cost of construction of nuclear power plants

Example G Cost of construction of nuclear power plants 1 Example G Cost of construction of nuclear power plants Description of data Table G.1 gives data, reproduced by permission of the Rand Corporation, from a report (Mooz, 1978) on 32 light water reactor

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools. Tools for Summarizing Data Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

More information

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1) CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech.

MSwM examples. Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech. MSwM examples Jose A. Sanchez-Espigares, Alberto Lopez-Moreno Dept. of Statistics and Operations Research UPC-BarcelonaTech February 24, 2014 Abstract Two examples are described to illustrate the use of

More information

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING

LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING LAB 4 INSTRUCTIONS CONFIDENCE INTERVALS AND HYPOTHESIS TESTING In this lab you will explore the concept of a confidence interval and hypothesis testing through a simulation problem in engineering setting.

More information

Simple Linear Regression

Simple Linear Regression Chapter Nine Simple Linear Regression Consider the following three scenarios: 1. The CEO of the local Tourism Authority would like to know whether a family s annual expenditure on recreation is related

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing

Introduction. Hypothesis Testing. Hypothesis Testing. Significance Testing Introduction Hypothesis Testing Mark Lunt Arthritis Research UK Centre for Ecellence in Epidemiology University of Manchester 13/10/2015 We saw last week that we can never know the population parameters

More information

UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE TWO-WAY ANOVA UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

More information