# Notes on Applied Linear Regression

Size: px
Start display at page:

Transcription

1 Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat BT Amsterdam The Netherlands phone: +31 (0) April 10, 2003 These were compiled from notes made by Jamie DeCoster and Julie Gephart from Dr. Rebecca Doerge s class on applied linear regression at Purdue University. Textbook references refer to Neter, Kutner, Nachtsheim, & Wasserman s Applied Linear Statistical Models, fourth edition. For help with data analysis visit ALL RIGHTS TO THIS DOCUMENT ARE RESERVED.

2 Contents 1 Linear Regression with One Independent Variable 2 2 Inferences in Regression Analysis 4 3 Diagnostic and Remedial Measures I 11 4 Simultaneous Inferences and Other Topics 15 5 Matrix Approach to Simple Linear Regression 17 6 Multiple Regression I 22 7 Multiple Regression II 26 8 Building the Regression Model I: Selection of Predictor Variables 32 9 Building the Regression Model II: Diagnostics Qualitative Predictor Variables Single Factor ANOVA Analysis of Factor Level Effects 43 1

3 Chapter 1 Linear Regression with One Independent Variable 1.1 General information about regression Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables so that one variable can predicted from another. The variable we are trying to predict is called the response or dependent variable. predicting this is called the explanatory or independent variable. The variable Relationships between variables can be either functional or statistical. exact, while a statistical relationship has error associated with it. A functional relationship is Regression analysis serves three purposes: Description, Control, and Prediction. Whenever reporting results, be sure to use at least four decimal places. 1.2 Simple linear regression model The simple linear regression function is Y i = β 0 + β 1 X i + ɛ i. (1.1) In this model β 0, β 1, and ɛ i are parameters and Y i and X i are measured values. This is called a first order model because it is linear in both the parameters and the independent variable. For every level of X, there is a probability distribution for Y having mean E(Y i ) = β 0 + β 1 X i and variance σ 2 (Y i ) = σ 2, where σ 2 is the variance of the entire population. β 0 is called the intercept and β 1 is called the slope of the regression line. Data for the regression analysis may be either observational or experimental. Observational data is simply recorded from naturally occurring phenomena while experimental data is the result of some manipulation by the experimenter. 1.3 Estimated regression function The estimated regression function is Y i = b 0 + b 1 X i. (1.2) 2

4 We calculate b 0 and b 1 using the methods of least squares. This chooses estimates that minimizes the sum of squared errors. These estimates can be calculated as and b 1 = (Xi X)(Y i Ȳ ) Xi Yi Xi Y i (Xi X) n = 2 2 Xi ( X i) 2 b 0 = Ȳ b 1 X = 1 n n (1.3) ( Yi b 1 Xi ). (1.4) Under the conditions of the regression model given above, the least squares estimates are unbiased and have minimum variance among all unbiased linear estimators. This means that the estimates get us as close to the true unknown parameter values as we can get. E(Y ) is referred to as the mean response or the expected value of Y. It is the center of the probability distribution of Y. The least squares regression line always passes through the point ( X, Ȳ ). The predicted or fitted values of the regression equation calculated as Ŷ i = b 0 + b 1 X i. (1.5) Residuals are the difference between the response and the fitted value, e i = Y i Ŷi. (1.6) We often examine the distribution of the residuals to check the fit of our model. Some general properties of the residuals are: e i = 0 e 2 i Y i = Ŷ i X i e i = 0 Ŷ i e i = 0 is minimum when using least squares estimates The sample variance s 2 estimates the population variance σ 2. This estimate is calculated as s 2 (Yi = Ŷi) 2, (1.7) n 2 and is also referred to as the Mean Square Error (MSE). Note that estimated values are always represented by English letters while parameter values are always represented by Greek letters. 1.4 Normal error regression model The normal error regression model is Y i = β 0 + β 1 X i + ɛ i, (1.8) where i = 1... n are the trials of the experiment, Y i is the observed response on trial i, X i is the level of the independent variable on trial i, β 0 and β 1 are parameters, and the residuals ɛ i are all normally distributed with a mean of 0 and a variance of σ 2. In this model Y i and X i are known values while β 0, β 1, and ɛ i are parameters. The additional assumptions in this model allow us to draw inferences from our data. This means that we can test our parameter estimates and build confidence intervals. 3

5 Chapter 2 Inferences in Regression Analysis 2.1 General information about statistical inferences Every point estimate has a sampling distribution that describes the behavior of that estimate. By knowing the sampling distribution you know the center and the spread (expected value and variance, respectively) of the distribution of the estimate. Recall that a sampling distribution is also a probability distribution. We use two different ways to estimate parameters: point estimates and confidence intervals. 2.2 Hypothesis tests Specifically we are often interested in building hypothesis tests pitting a null hypothesis versus an alternative hypothesis. One hypothesis test commonly performed in simple linear regression is H 0 : β 1 = 0 H a : β 1 0. (2.1) There are four possible results to a hypothesis test. You can: 1. Fail to reject the null when the null is true (correct). 2. Reject the null when the alternative is true (correct). 3. Reject the null when the null is true (Type I error). 4. Fail to reject the null when the alternative is true (Type II error). The seven general steps to presenting hypothesis tests are: 1. State the parameters of interest 2. State the null hypothesis 3. State the alternative hypothesis 4. State the test statistic and its distribution 5. State the rejection region 6. Perform necessary calculations 7. State your conclusion 4

6 2.3 Test statistics Knowing the sampling distribution of an estimate allows us to form test statistics and to make inferences about the corresponding population parameter. The general form of a test statistic is point estimate E(point estimate) standard deviation of point estimate, (2.2) where E(point estimate) is the expected value of our point estimate under H Determining the significance of a test statistic Once we calculate a test statistic we can talk about how likely the results were due to chance if we know the underlying distribution. The p-value of our test statistic is a measure of how likely our observed results are just due to chance. To get the p-value we must know the distribution from which our test statistic was drawn. Typically this involves knowing the general form of the distribution (t, F, etc.) and any specific parameters (such as the degrees of freedom). Once we know the distribution we look up our calculated test statistic in the appropriate table. If the specific value of our calculated test statistic is not in the table we use the p-value associated with the largest test statistic that is smaller than the one we calculated. This gives us a conservative estimate. Similarly, if there is no table with the exact degrees of freedom that we have in our statistic we use the table with largest degrees of freedom that is less than the degrees of freedom associated with our calculated statistic. Before calculating our test statistic we establish a significance level α. We reject the null hypothesis if we obtain a p-value less than α and fail to reject the null when we obtain a p-value greater than α. The significance level is the probability that we make a Type I error. Many fields use a standard α = Testing β 1 and β 0 To test whether there is a linear relationship between two variables we can perform a hypothesis test on the slope parameter in the corresponding simple linear regression model. β 0 is the expected value when X = 0. If X cannot take on the value 0 then β 0 has no meaningful interpretation. When testing β 1 : Point estimate = least squares estimate b 1 E(point estimate) = β 1 under H 0 (This is typically = 0) Standard deviation = σ 2 {b 1 } We estimate σ 2 {b 1 } using the following equation: s 2 {b 1 } = MSE from model (Xi X) 2 (2.3) The resulting test statistic t = b 1 β 1 s{b 1 } follows a t distribution with n 2 degrees of freedom. (2.4) 5

7 To make tests about the intercept β 0 we use Point estimate = least squares estimate b 0 E(point estimate) = β 0 under H 0 Standard deviation = σ 2 {b 0 } We estimate σ 2 {b 0 } using the following equation: ( ) 1 s 2 {b 0 } = MSE n + X2 (Xi X) 2 The resulting test statistic t = b 0 β 0 s{b 0 } follows a t distribution with n 2 degrees of freedom. (2.5) (2.6) 2.6 Violations of normality If our Y s are truly normally distributed then we can be confident about our estimates of β 0 and β 1. If our Y s are approximately normal then our estimates are approximately correct. If our Y s are distinctly not normal then our estimates are likely to be incorrect, but they will approach the true parameters as we increase our sample size. 2.7 Confidence intervals Point estimates tell us about the central tendency of a distribution while confidence intervals tell us about both the central tendency and the spread of a distribution. The general form of a confidence interval is point estimate ± (critical value)(standard deviation of point estimate) (2.7) We often generate a 1 α confidence interval for β 1 : b 1 ± (t 1 α 2,n 2 )s{b 1 }. (2.8) When constructing confidence intervals it s important to remember to divide your significance level by 2 when determining your critical value. This is because half of your probability is going into each side. You can perform significance tests using confidence intervals. If your interval contains the value of the parameter under the null hypothesis you fail to reject the null. If the value under the null hypothesis falls outside of your confidence interval then you reject the null. 2.8 Predicting new observations Once we have calculated a regression equation we can predict a new observation at X h by substituting in this value and calculating Ŷ h = b 0 + b 1 X h. (2.9) It inappropriate to predict new values when X h is outside the range of X s used to build the regression equation. This is called extrapolation. 6

8 We can build several different types of intervals around Ŷh since we know its sampling distribution. In each case we have (n 2) degrees of freedom and the (1 α) prediction interval takes on the form Ŷ h ± (t 1 α 2,n 2 )s{pred}, (2.10) where n is the number of observations to build the regression equation and s{pred} is the standard deviation of our prediction. The exact value of s{pred} depends on what we are specifically trying to predict. It is made up of the variability we have in our estimate and the variability we have in what we re trying to predict. This last part is different in each case described below. To predict the mean response at X h we use s{pred} = To predict a single new observation we use s{pred} = MSE MSE ( 1 n + (X h X) 2 (Xi X) 2 ), (2.11) ( n + (X h X) 2 (Xi X) 2 ), (2.12) To predict the mean of m new observations we use ( 1 s{pred} = MSE m + 1 n + (X h X) ), 2 (Xi X) (2.13) 2 These equations are related. Equation 2.11 is equal to equation 2.13 where m =. Similarly, equation 2.12 is equal to equation 2.13 where m = ANOVA approach to regression Our observed variable Y will always have some variability associated with it. We can break this into the variability related to our predictor and the variability unrelated to our predictor. This is called an Analysis of Variance. SSTO = Total sums of squares = (Y i Ȳ )2 is the total variability in Y. It has n 1 degrees of freedom associated with it. SSR = Regression sums of squares = (Ŷi Ȳ )2 is the variability in Y accounted for by our regression model. Since we are using one predictor (X) it has 1 degree of freedom associated with it. SSE = Error sums of squares = (Y i Ŷi) 2 = e i 2 is the variability in Y that is not accounted for by our regression model. It has n 2 degrees of freedom associated with it. These are all related because and SSTO = SSR + SSE (2.14) df total = df model + df error (2.15) The MSR (regression mean square) is equal to the SSR divided by its degrees of freedom. Similarly, the MSE (error mean square) is equal to the SSE divided by its degrees of freedom. 7

9 All of this is summarized in the following ANOVA table: Source of variation SS df MS Model (R) Error (E) Total ( Ŷ i Ȳ )2 1 SSR 1 (Yi Ŷi) 2 n 2 SSE n 2 (Yi Ȳ )2 n The General Linear Test We can use the information from the ANOVA table to perform a general linear test of the slope parameter. In this we basically try to see if a full model predicts significantly more variation in our dependent variable than a reduced model. The full model is the model corresponding to your alternative hypothesis while your reduced model is the model corresponding to your null hypothesis. The basic procedure is to fit both your full and your reduced model and then record the SSE and DF from each. You then calculate the following test statistic: F = SSE reduced SSE full df reduced df full SSE full, (2.16) df full which follows an F distribution with (df full df reduced ) numerator and df full denominator degrees of freedom. When testing the slope in simple linear regression, F is equal to SSTO SSE MSE F 1,n 2,1 α = (t n 2,1 α ) Descriptive measures These measure the degree of association between X and Y. Coefficient of determination: r 2 This is the effect of X in reducing the variation in Y. r 2 = SSTO SSR 0 r 2 1 Coefficient of correlation: r This is a measure of the linear association between X and Y. r = ± r 2, where the sign indicates the slope direction 1 r 1 = MSR MSE. 8

10 2.12 Using SAS Something like the following commands typically appear at the top of any SAS program you run: options ls=72; title1 Computer program 1 ; data one; infile mydata.dat ; input Y X; The first line formats the output for a letter-sized page. The second line puts a title at the top of each page. The third establishes the name of the SAS data set you are working on as one. The fourth line tells the computer that you want to read in some data from a text file called mydata.dat. The fifth tells SAS that the first column in the dataset contains the variable Y and the second column contains the variable X. Each column in the text file is separated by one or more spaces. To print the contents of a dataset use the following code: proc print data=one; var X Y; where the variables after the var statement are the ones you want to see. You can omit the var line altogether if you want to see all the variables in the dataset. To plot one variable against another use the following code: proc plot data=one; plot Y*X = A ; In this case variable Y will appear on the Y-axis, variable X will appear on the X-axis, and each point will be marked by the letter A. If you substitute the following for the second line: plot Y*X; then SAS will mark each point with a character corresponding to how many times that observation is repeated. You can have SAS do multiple plots within the same proc plot statement by listing multiple combinations on the plot line: plot Y*X= A Y*Z= B A*B; as long as all of the variables are in the same dataset. Each plot will produce its own graph. If you want them printed on the same axes you must also enter the /overlay switch on the plot line: plot Y*X Z*X /overlay; To have SAS generate a least-square regression line predicting Y by X you would use the following code: proc reg data=one; model Y = X; 9

12 Chapter 3 Diagnostic and Remedial Measures I 3.1 General information about diagnostics It is important that we make sure that our data fits the assumptions of the normal error regression model. Violations of the assumptions lessen the validity of our hypothesis tests. Typical violations of the simple linear regression model are: 1. Regression function is not linear 2. Error terms do not have constant variance 3. Error terms are not independent 4. One or more observations are outliers 5. Error terms are not normally distributed 6. One or more important predictors have been omitted from the model 3.2 Examining the distribution of the residuals We don t want to examine the raw Y i s because they are dependent on the X i s. Therefore we examine the e i s. We will often examine the standardized residuals instead because they are easier to interpret. Standardized residuals are calculated by the following formula: e i = e i MSE. (3.1) To test whether the residuals are normally distributed we can examine a normal probability plot. This is a plot of the residuals against the normal order statistic. Plots from a normal distribution will look approximately like straight lines. If your Y i s are non-normal you should transform them into normal data. To test whether the variance is constant across the error terms we plot the residuals against Ŷi. The graph should look like a random scattering. If the variance is not constant we should either transform Y or use weighted least squares estimators. If we transform Y we may also have to transform X to keep the same basic relationship. To test whether the error terms are independent we plot the residuals against potential variables not included in the model. The graph should look like a random scattering. If our terms are not independent we should add the appropriate terms to our model to account for the dependence. 11

13 We should also check for the possibility that the passage of time might have influenced our results. We should therefore plot our residuals versus time to see if there is any dependency. 3.3 Checking for a linear relationship To see if there is a linear relationship between X and Y you can examine a graph of Y i versus X i. We would expect the data to look roughly linear, though with error. If the graph appears to have any systematic non-linear component we would suspect a non-linear relationship. You can also examine a graph of e i versus X i. In this case we would expect the graph to look completely random. Any sort of systematic variation would be an indicator of a non-linear relationship. If there is evidence of a non-linear relationship and the variance is not constant then we want to transform our Y i s. If there is evidence of a non-linear relationship but the variance is constant then we want to transform our X i s. The exact nature of the transformation would depend on the observed type of non-linearity. Common transformations are logarithms and square roots. 3.4 Outliers To test for outliers we plot X against Y or X against the residuals. You can also check univariate plots of X and Y individually. There should not be any points that look like they are far away from the others. We should check any outlying data points to make sure that they are not the result of a typing error. True outliers should be dropped from the analysis. Dealing with outliers will be discussed more thoroughly in later chapters. 3.5 Lack of Fit Test In general, this tests whether a categorical model fits the data significantly better than a linear model. The assumptions of the test are that the Y i s are independent, normally distributed, and have constant variance. In order to do a lack of fit test you need to have repeated observations at one or more levels of X. We perform the following hypothesis test: H 0 : Y i = β 0 + β 1 X i H a : Y i β 0 + β 1 X i (3.2) Our null hypothesis is a linear model while our alternative hypothesis is a categorical model. The categorical model uses the mean response at each level of X to predict Y. This model will always fit the data better than the linear model because the linear model has the additional constraint that its predicted values must all fall on the same line. When performing this test you typically hope to show that the categorical model fits no better than the linear model. This test is a little different than the others we ve done because we don t want to reject the null. We perform a general linear test. Our full model is Y ij = µ j + ɛ ij i = 1... n j j = 1... c (3.3) 12

14 where c is the number of different levels of X, n j is the number of observations at level X j, and µ j is the expected value of Y ij. From our full model we calculate SSE full = j (Y ij ˆµ j ) 2, (3.4) i where ˆµ j = 1 Y i = c Ȳj. (3.5) i There are n c degrees of freedom associated with the full model. Our reduced model is a simple linear regression, namely that Y ij = β 0 + β 1 X i + ɛ ij i = 1... n j j = 1... c (3.6) The SSE reduced is the standard SSE from our model, equal to [Y ij (b 0 + b 1 X i )] 2. (3.7) There are n 2 degrees of freedom associated with the reduced model. j i When performing a lack of fit test we often talk about two specific components of our variability. The sum of squares from pure error (SSPE) measures the variability within each level of X. The sum of squares from lack of fit (SSLF) measures the non-linear variability in our data. These are defined as follows: SSLF = SSE reduced SSE full = (Ȳj Ŷij) 2 (3.8) j i SSPE = SSE full = (Y ij Ȳj) 2. (3.9) j i The SSPE has n c degrees of freedom associated with it while the SSLF has c 2 degrees of freedom associated with it. These components are related because SSE reduced = SSPE + SSLF. (3.10) and df SSEreduced = df SSPE + df SSLF. (3.11) From this we can construct a more detailed ANOVA table: Source of variation SS df MS Model (R) ( Ŷ ij Ȳ )2 1 SSR 1 Error (E) (Yij Ŷij) 2 n 2 SSE n 2 LF (Ȳj Ŷij) 2 c 2 SSLF c 2 PE (Y ij Ȳj) 2 n c SSPE n c Total (Yij Ȳ )2 n 1 13

15 We calculate the following test statistic: F = SSE reduced SSE full df reduced df full = SSE full df full SSE - SSPE c 2 SSPE n c = SSLF c 2 SSPE n c (3.12) which follows an F distribution with c 2 numerator and n c denominator degrees of freedom. Our test statistic is a measure of how much our data deviate from a simple linear model. We must reject our linear model if it is statistically significant. 3.6 Using SAS To get boxplots and stem and leaf graphs of a variable you use the plot option of proc univariate: proc univariate plot; var X; There are two different ways to generate a normal probability plot. One way is to use the plot option of proc univariate: proc univariate plot; var Y; While this is the easiest method, the plot is small and difficult to read. The following code will generate a full-sized normal probability plot: proc rank data=one normal=blom; var Y; ranks ynorm; proc plot; plot Y*ynorm; SAS refers to standardized residuals as studentized residuals. You can store these in an output data set using the following code: proc reg data=one; model Y=X; output out=two student=sresid; To transform data in SAS all you need to do is create a new variable as a function of the old one, and then perform your analysis with the new variable. You must create the variable in the data step, which means that it must be done ahead of any procedure statements. For example, to do a log base 10 transform on your Y you would use the code: ytrans = log10(y); proc reg data=one; model ytrans = X; SAS contains funcions for performing square roots, logarithms, exponentials, and just about any other sort of transformation you might possibly want. 14

16 Chapter 4 Simultaneous Inferences and Other Topics 4.1 General information about simultaneous inferences When you perform multiple hypothesis tests on the same set of data you inadvertently inflate your chances of getting a type I error. For example, if you perform two tests each at a.05 significance level, your overall chance of having a false positive are actually ( ) =.0975, which is greater than the.05 we expect. We therefore establish a family confidence coefficient α, specifying how certain we want to be that none of our results are due to chance. We modify the significance levels of our individual tests so that put together our total probability of getting a type I error is less than this value. 4.2 Joint estimates of β 0 and β 1 Our tests of β 0 and β 1 are drawn from the same data, so we should present a family confidence coefficient when discussing our results. We describe the Bonferroni method, although there are others available. In this method we divide α by the number of tests we re performing to get the significance level for each individual test. For example, we would generate the following confidence intervals for β 0 and β 1 using a family confidence coefficient of α: β 0 : b 0 ± (t n 2,1 α )s{b 4 0} β 1 : b 1 ± (t n 2,1 α )s{b (4.1) 4 1} While we usually select our critical value at (1 α 2 ) in confidence intervals (because it is a two-tailed test), we use (1 α 4 ) here because we are estimating two parameters simultaneously. 4.3 Regression through the origin Sometimes we have a theoretically meaningful (0,0) point in our data. For example, the gross profit of a company that produces no product would have to be zero. In this case we can build a regression model with β 0 set to zero: Y i = β 1 X i + ɛ i, (4.2) where i = 1... n. 15

17 The least squares estimate of β 1 in this model is b 1 = Xi Y i Xi 2. (4.3) The residuals from this model are not guaranteed to sum to zero. We have one more degree of freedom in this model than in the simple linear regression model because we don t need to estimate β 0. Our estimate of the variance (MSE) is therefore Similarly, df error = n 1. s 2 (Yi = Ŷi) 2. (4.4) n Measurement error Error with respect to our Y i s is accommodated by our model, so measurement error with regard to the response variable is not problematic as long as it is random. It does, however, reduce the power of our tests. Error with respect to our X i s, however, is problematic. It can cause our error terms to be correlated with our explanatory variables which violates the assumptions of our model. In this case the leastsquares estimates are no longer unbiased. 4.5 Inverse prediction We have already learned how to predict Y from X. Sometimes, however, we might want to predict X from Y. We can calculate an estimate with the following equation: ˆX h = Y h b 0 b 1, (4.5) which has a variance of s 2 {predx} = MSE b 1 2 ( n + ( ˆX h X) ) 2 (Xi X). (4.6) 2 While this can be used for informative purposes, it should not be used to fill in missing data. That requires more complicated procedures. 4.6 Choosing X levels The variance of our parameter estimates are all divided by (X i X) 2. To get more precise estimates we would want to maximize this. So you want to have a wide range to your X s to have precise parameter estimates. It is generally recommended that you use equal spacing between X levels. When attempting to predict Y at a single value of X, the variance of our predicted value related to (X h X) 2. To get more precise estimate we would want to minimize this. So you want to sample all of your points at X h if you want a precise estimate of Ŷh. 16

18 Chapter 5 Matrix Approach to Simple Linear Regression 5.1 General information about matrices A matrix is a collection of numbers organized into rows and columns. Rows go horizontally across while columns go vertically down. The following is an example of a matrix: M = (5.1) The dimension of the matrix (the numbers in the subscript) describes the number of rows and columns. Non-matrix numbers (the type we deal with every day) are called scalars. Matrix variables are generally typed as capital, bold-faced letters, such as A to distinguish them from scalars. When writing matrix equations by hand, the name of the matrix is usually written over a tilde, such as A ~. A single value inside the matrix is called an element. Elements are generally referred to by the lowercase letter of the matrix, with subscripts indicating the element s position. For example: A = a 11 a 12 a 13 a 21 a 22 a 23. (5.2) 3 3 a 31 a 32 a 33 We use matrices because they allow us to represent complicated calculations with very simple notation. Any moderately advanced discussion of regression typically uses matrix notation. 5.2 Special types of matrices A square matrix is a matrix with the same number of rows and columns: (5.3) A column vector is a matrix with exactly one column: 7 2. (5.4) 8 17

19 A row vector is a matrix with exactly one row: ( ). (5.5) A symmetric matrix is a square matrix that reflects along the diagonal. This means that the element in row i column j is the same as the element in row j column i: (5.6) A diagonal matrix is a square matrix that has 0 s everywhere except on the diagonal: (5.7) An identity matrix is a square matrix with 1 s on the diagonal and 0 s everywhere else: I = (5.8) Multiplying a matrix by the identity always gives you back the original matrix. Note that we always use I to represent an identity matrix. A unity matrix is a square matrix where every element equal to 1: J = (5.9) We can also have unity row vectors and unity column vectors. We use unity matrices for summing. Note that we always use J to represent a unity matrix and 1 to represent a unity vector. A zero matrix is a square matrix where every element is 0: 0 = (5.10) Note that we always use 0 to represent a zero matrix. 5.3 Basic matrix operations Two matrices are considered equal if they are equal in dimension and if all their corresponding elements are equal. To transpose a matrix you switch the rows and the columns. For example, the transpose of M in equation 5.1 above would be M =. (5.11) The transpose of a symmetric matrix is the same as the original matrix. 18

20 To add two matrices you simply add together the corresponding elements of each matrix. You can only add matricies with the same dimension. For example, ( ) ( ) ( ) =. (5.12) To subtract one matrix from another you subtract the corresponding element from the second matrix from the element of the first matrix. Like addition, you can only subtract a matrix from one of the same dimension. For example, ( ) ( ) = ( ) 1 3. (5.13) 6 3 You can multiply a matrix by a scalar. To obtain the result you just multiply each element in the matrix by the scalar. For example: ( ) ( ) =. (5.14) Similarly, you can factor a scalar out of a matrix. 5.4 Matrix multiplication You can multiply two matrices together, but only under certain conditions. The number of columns in the first matrix has to be equal to the number of rows in the second matrix. The resulting matrix will have the same number of rows as the first matrix and the same number of columns as the second matrix. For example: A B = C. (5.15) Basically, the two dimensions on the inside have to be the same to multiply matrices (3 = 3) and the resulting matrix has the dimensions on the outside (2 4). An important fact to remember is that matrix multiplication is not commutative: (AB) (BA). You always need to keep track of which matrix is on the left and which is on the right. Always multiply from left to right, unless parentheses indicate otherwise. Each element of the product matrix is calculated by taking the cross product of a row of the first matrix with a column of the second matrix. Consider the following example: ( ) ( ) ( ) 7 2 3(8) + 6(7) + 1(5) 3(11) + 6(2) + 1(9) = = (5.16) (8) + 10(7) + 12(5) 4(11) + 10(2) + 12(9) Basically, to calculate the element in cell ij of the resulting matrix you multiply the elements in row i of the first matrix with the corresponding elements of column j of the second matrix and then add the products. 5.5 Linear dependence Two variables are linearly dependent if one can be written as a linear function of the other. For example, if x = 4y then x and y are linearly dependent. If no linear function exists between two variables then they are linearly independent. 19

21 We are often interested in determining whether the columns of a matrix are linearly dependent. Let c be the column vectors composing a matrix. If any set of λ s exists (where at least one λ is nonzero) such that λ 1 c 1 + λ 2 c λ i c i = 0, (5.17) (where there are i columns) then we say that the columns are linearly dependent. If equation 5.17 can be satisfied it means that at least one column of the matrix can be written as a linear function of the others. If no set of λ s exists satisfying equation 5.17 then your columns are linearlly independent. The rank of a matrix is the maximum number of linearly independent columns. If all the columns in a matrix are linearly independent then we say that the matrix is full rank. 5.6 Inverting a matrix If we have a square matrix S, then its inverse S 1 is a matrix such that SS 1 = I. (5.18) The inverse of a matrix always has the same dimensions as the original matrix. You cannot calculate the inverse of a matrix that is not full rank. There is a simple procedure to calculate the inverse of a 2 2 matrix. If ( ) a b A =, (5.19) 2 2 c d then A 1 1 = 2 2 ad cb ( d b c a ). (5.20) The quantity ad cb is called the determinant. If the matrix is not full rank this is equal to 0. There are ways to calculate more complicated inverses by hand, but people typically just have computers do the work. 5.7 Simple linear regression model in matrix terms We can represent our simple linear model and its least squares estimates in matrix notation. Consider our model Y i = β 0 + β 1 X i + ɛ i, (5.21) where i = 1... n. We can define the following matrices using different parts of this model: Y 1 1 X 1 ɛ 1 ( ) Y Y = 2 1 X n 1.. X = 2 n 2.. β0.. β = ɛ 2 1 β = ɛ n 1 (5.22) Y n 1 X n We can write the simple linear regression model in terms of these matrices: The matrix equation for the least squares estimates is Y = X β + ɛ. (5.23) n 1 n n 1 b = (X X) 1 X Y. (5.24) 2 1 ɛ n 20

22 The matrix equation for the predicted values is To simplify this we define the hat matrix H such that and Ŷ = Xb = n 1 X(X X) 1 X Y. (5.25) H = n n X(X X) 1 X (5.26) Ŷ = HY. (5.27) We shall see later that there are some important properties of the hat matrix. The matrix equation for the residuals is 5.8 Covariance matrices e = Y Ŷ = (I H)Y. (5.28) n 1 Whenever you are working with more than one random variable at the same time you can consider the covariances between the variables. The covariance between two variables is a measure of how much the varibles tend to be high and low at the same time. It is related to the correlation between the variables. The covariance between variables Y 1 and Y 2 would be represented as σ{y 1, Y 2 } = σ{y 2, Y 1 }. (5.29) We can display the covariances between a set of variables with a variance-covariance matrix. This matrix is always symmetric. For example, if we had k different variables we could build the following matrix: 5.9 Using SAS 2 σ 1 σ 12 σ 1k 2 σ 12 σ 2 σ 2k σ 1k σ 2k σ k. (5.30) SAS does have an operating system for performing matrix computations called IML. However, we will not need to perform any complex matrix operations for this class so it will not be discussed in detail. You can get X X, X Y, and Y Y when you use proc reg by adding a print xpx statement: proc reg; model Y=X; print xpx; You can get (X X) 1, the parameter estimates, and the SSE when you use proc reg by adding a print i statement: proc reg; model Y=X; print i; If you want both you can include them in the same print statement. 21

23 Chapter 6 Multiple Regression I 6.1 General information about multiple regression We use multiple regression when we want to relate the variation in our dependent variable to several different independent variables. The simple linear regression model (which we ve used up to this point) can be done using multiple regression methodology. 6.2 First order model with two independent variables The general form of this model is Y i = β 0 + β 1 X i1 + β 2 X i2 + ɛ i (6.1) where i = 1... n and the ɛ i s are normally distributed with a mean of 0 and a variance of σ 2. We interpret the parameters in this model as follows: β 0 : If X 1 and X 2 have meaningful zero points then β 0 is the mean response at X 1 = 0 and X 2 = 0. Otherwise there is no useful interpretation. β 1 : This is the change in the mean response per unit increase in X 1 when X 2 is held constant. β 2 : This is the change in the mean response per unit increase in X 2 when X 1 is held constant. β 1 and β 2 are called partial regression coefficients. If X 1 and X 2 are independent then they are called additive effects. 6.3 The General linear model (GLM) This is the generalization of the first-order model with two predictor variables. In this model we can have as many predictors as we like. The general form of this model is Y i = β 0 + β 1 X i1 + β 2 X i2 + + β p 1 X i,p 1 + ɛ i, (6.2) where i = 1... n and ɛ i follows a normal distribution with mean 0 and variance σ 2. This model can be used for many different types of regression. For example, you can: 22

24 Have qualitative variables, where the variable codes the observation as being in a particular category. You can then test how this categorization predicts some response variable. This will be discussed in detail in chapter 11. Perform polynomial regression, where your response variable is related to some power of your predictor. You could then explain quadratic, cubic, and other such relationships. This will be discussed in detail in chapter 7. Consider interactions between variables. In this case one of your predictors is actually the product of two other variables in your model. This lets you see if the effect of one predictor is dependent on the level of another. This will be discussed in detail in chapter The matrix form of the GLM The GLM can be expressed in matrix terms: Y = X n 1 n p β p 1 + ɛ n 1, (6.3) where Y = n 1 β = p 1 Y 1 Y 2. Y n β 0 β 1. β p 1 X = n p 1 X 11 X 12 X 1,p 1 1 X 21 X 22 X 2,p X n1 X n2 X n,p 1 ɛ n 1 = ɛ 1 ɛ 2. ɛ n. (6.4) As in simple linear regression, the least squares estimates may be calculated as b = (X X) 1 X Y, (6.5) where b = b 0 b 1.. b p 1. (6.6) Using this information you can write an ANOVA table: Source of variation SS df MS Model (R) b X Y 1 n Y JY p 1 SSR p 1 Error (E) e e = Y Y b X Y n p SSE n p Total Y Y 1 n Y JY n Statistical inferences You can test whether there is a significant relationship between your response variable and any of your predictors: H 0 : β 1 = β 2 = = β p 1 = 0 (6.7) H a : at least one β k 0. 23

25 Note that you are not testing β 0 because it is not associated with a predictor variable. You perform a general linear test decide between these hypotheses. You calculate F = MSR MSE, (6.8) which follows an F distribution with p 1 numerator and n p denominator degrees of freedom. You can calculate a coefficient of multiple determination for your model: R 2 = SSR SSTO = 1 SSE SSTO. (6.9) This is the proportion of the variation in Y that can be explained by your set of predictors. A large R 2 does not necessarily mean that you have a good model. Adding parameters always increases R 2, but might not increase its significance. Also, your regression function was designed to work with your data set, so your R 2 could be inflated. You should therefore be cautious until you ve replicated your findings. You can also test each β k individually: H 0 : β k = 0 H a : β k 0. (6.10) To do this you calculate t = b k β k s{b k } = b k s{b k }, (6.11) which follows a t distribution with n p degrees of freedom. You can obtain s{b k } by taking the square root of the kth diagonal element of MSE (X X) 1, (6.12) which is the variance/covariance matrix of your model s 2 {b}. Remember to use a family confidence coefficient if you decide to perform several of these test simultaneously. You can also build prediction intervals for new observations. You first must define the vector X h1 X X h = h2.., (6.13) X h,p 1 where X h1... X h,p 1 is the set of values at which you want to predict Y. You may then calculate your predicted value Ŷ h = X h b. (6.14) It is inappropriate to select combinations of X s that are very different from those used to build the regression model. As in simple linear regression, we can build several different types of intervals around Ŷh depending on exactly what we want to predict. In each case we have (n p) degrees of freedom and the (1 α) prediction interval takes on the form Ŷ h ± (t 1 α 2,n p )s{pred}, (6.15) where n is the number of observations to build the regression equation, and s{pred} depends on exactly what you want to predict. To predict the mean response at X h you use s{pred} = MSE (X h (X X) 1 X h ). (6.16) 24

26 To predict a single new observation you use s{pred} = MSE (1 + X h (X X) 1 X h ). (6.17) To predict the mean of m new observations you use ( ) 1 s{pred} = MSE m + X h (X X) 1 X h. (6.18) 6.6 Diagnostics and remedial measures Locating outliers takes more effort in multiple regression. An observation could be an outlier because it has an unusually high or low value on a predictor. But it also could be an outlier if it simply represents an unusual combination of the predictor values, even if they all are within the normal range. You should therefore examine plots of the predictors against each other in addition to scatterplots and normal probability plots to locate outliers. To assess the appropriateness of the multiple regression model and the consistancy of variance you should plot the residuals against the fitted values. You should also plot the residuals against each predictor variable seperately. In addition to other important variables, you should also check to see if you should include any interactions between your predictors in your model. 6.7 Using SAS Running a multiple regression is SAS is very easy. The code looks just like that for simple linear regression, but you have more than one variable after the equals sign: proc reg model Y = X1 X2 X3; Predicting new observations in multiple regression also works the same way. You just insert an observation in your data set (with a missing value for Y ) and then add a print cli or print clm line after your proc reg statement. 25

27 Chapter 7 Multiple Regression II 7.1 General information about extra sums of squares Extra sums of squares measure the marginal reduction in the error sum of squares when one or more independent variables are added to a regression model. Extra sums of squares are also referred to as partial sums of squares. You can also view the extra sums of squares as the marginal increase in regression sum of squares when one or more independent variables are added to a regression model. In general, we use extra sums of squares to determine whether specific variables are making substantial contributions to our model. 7.2 Extra sums of squares notation The regression sum of squares due to a single factor X 1 is denoted by SSR(X 1 ). The corresponding error sum of squares is denoted by SSE(X 1 ). This is the SSR from the regression model where i = 1... n. Y i = β 0 + β 1 X i + ɛ i, (7.1) The sums of squares due to multiple factors is represented in a similar way. For example, the regression sum of squares due to factors X 1, X 2, and X 3 is denoted by SSR(X 1, X 2, X 3 ) and the error sum of squares by SSE(X 1, X 2, X 3 ). This is the SSR from the regression model where i = 1... n. Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ɛ i, (7.2) When we have more than one predictor variable we can talk about the extra sum of sqares associated with a factor or group of factors. In model 7.2 above, the extra sum of squares uniquely associated with X 1 is calculated as SSR(X 1 X 2, X 3 ) = SSE(X 2, X 3 ) SSE(X 1, X 2, X 3 ) = SSR(X 1, X 2, X 3 ) SSR(X 2, X 3 ). (7.3) As mentioned above, SSR(X 1 X 2, X 3 ) is a measure of how much factor X 1 contributes to the model above and beyond the contribution of factors X 2 and X 3. The unique sums of squares associated with each factor are often referred to as Type II sums of squares. 26

28 7.3 Decomposing SSR into extra sums of squares Just as we can partition SSTO into SSR and SSE, we can partition SSR into the extra sums of squares associated with each of our variables. For example, the SSR from model 7.2 above can be broken down as follows: SSR(X 1, X 2, X 3 ) = SSR(X 1 ) + SSR(X 2 X 1 ) + SSR(X 3 X 1, X 2 ). (7.4) These components are often referred to as the Type I sums of squares. We can represent the relationships between our sums of squares in an ANOVA table. For example, the table corresponding to model 7.2 would be: Source of variation SS df MS Model (R) SSR(X 1, X 2, X 3 ) (p 1) = 3 MSR(X 1, X 2, X 3 ) X 1 SSR(X 1 ) 1 MSR(X 1 ) X 2 X 1 SSR(X 2 X 1 ) 1 MSR(X 2 X 1 ) X 3 X 1, X 2 SSR(X 3 X 1, X 2 ) 1 MSR(X 3 X 1, X 2 ) Error SSE(X 1, X 2, X 3 ) n 4 MSE(X 1, X 2, X 3 ) Total SSTO n 1 Note that putting our parameters in a different order would give us different Type I SS. 7.4 Comparing Type I and Type II sums of squares To review, to calculate the Type I SS you consider the variability accounted for by each predictor, conditioned on the the predictors that occur earlier in the model: X 1 : SSR(X 1 ) X 2 : SSR(X 2 X 1 ) X 3 : SSR(X 3 X 1, X 2 ). (7.5) To calculate the Type II SS you consider the unique variablility accounted for by each predictor: X 1 : SSR(X 1 X 2, X 3 ) X 2 : SSR(X 2 X 1, X 3 ) X 3 : SSR(X 3 X 1, X 2 ). (7.6) The Type I SS are guaranteed to sum up to SSR(X 1, X 2, X 3 ), while the Type II SS are not guaranteed to sum up to anything. We say that the Type I SS partition the SSR. The Type II SS are guaranteed to be the same no matter how you enter your variables in your model, while the Type I SS may change if you use different orders. Testing the significance of parameters is the same as testing the Type II SS. 7.5 Testing parameters using extra sums of squares To test whether a single β k = 0 we can perform a general linear model test using the extra sums of squares associated with that parameter. 27

29 For example, if we wanted to test parameter β 2 from model 7.2 our hypotheses would be H 0 : β 2 = 0 H a : β 2 0. (7.7) The full model is Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ɛ i. (7.8) Our reduced model is the same except without the variable we want to test: Y i = β 0 + β 1 X i1 + β 3 Xi3 + ɛ i. (7.9) We then calculate the following test statistic: F = SSE reduced SSE full df reduced df full = SSE full df full SSE(X 1,X 3) SSE(X 1,X 2,X 3) 1 SSE(X 1,X 2,X 3), (7.10) n 4 which follows an F distribution with 1 numerator and n 4 denominator degrees of freedom. Testing parameters in this way is referred to as performing partial F-tests. We can also test the contribution of several of your variables simultaneously. For example, if we wanted to simultaneously test the contribution of X 1 and X 3 in model 7.2 above our hypotheses would be H 0 : β 1 = β 3 = 0 H a : at least one 0. (7.11) The full model is Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ɛ i (7.12) and the reduced model is Y i = β 0 + β 2 X i2 + ɛ i. (7.13) We then calculate the following test statistic: F = SSE(X 2 ) SSE(X 1,X 2,X 3 ) 2 SSE(X 1,X 2,X 3 ), (7.14) n 4 which follows an F distribution with 2 numerator and n 4 denominator degrees of freedom. We can perform other tests as well. For example, if we wanted to test whether β 1 = β 2 our hypotheses would be H 0 : β 1 = β 2 (7.15) H a : β 1 β 2. The full model would be and the reduced model would be To build the reduced model we could create a new variable Y i = β 0 + β 1 X i1 + β 2 X i2 + β 3 X i3 + ɛ i (7.16) Y i = β 0 + β c (X i1 + X i2 ) + β 3 X i3 + ɛ i. (7.17) X ic = X i1 + X i2 (7.18) and then build a model predicting Y from X c and X 3. Once we obtain the SSR from these models we perform a general linear test to make inferences. 28

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Exercise 1.12 (Pg. 22-23)

Individuals: The objects that are described by a set of data. They may be people, animals, things, etc. (Also referred to as Cases or Records) Variables: The characteristics recorded about each individual.

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

### Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

### An analysis appropriate for a quantitative outcome and a single quantitative explanatory. 9.1 The model behind linear regression

Chapter 9 Simple Linear Regression An analysis appropriate for a quantitative outcome and a single quantitative explanatory variable. 9.1 The model behind linear regression When we are examining the relationship

### 2. Simple Linear Regression

Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

### Topic 3. Chapter 5: Linear Regression in Matrix Form

Topic Overview Statistics 512: Applied Linear Models Topic 3 This topic will cover thinking in terms of matrices regression on multiple predictor variables case study: CS majors Text Example (NKNW 241)

### Module 5: Multiple Regression Analysis

Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

### Statistics courses often teach the two-sample t-test, linear regression, and analysis of variance

2 Making Connections: The Two-Sample t-test, Regression, and ANOVA In theory, there s no difference between theory and practice. In practice, there is. Yogi Berra 1 Statistics courses often teach the two-sample

: Table of Contents... 1 Overview of Model... 1 Dispersion... 2 Parameterization... 3 Sigma-Restricted Model... 3 Overparameterized Model... 4 Reference Coding... 4 Model Summary (Summary Tab)... 5 Summary

### Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

### Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

### Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

### Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases:

Profile Analysis Introduction Profile analysis is the multivariate equivalent of repeated measures or mixed ANOVA. Profile analysis is most commonly used in two cases: ) Comparing the same dependent variables

### Introduction to Matrix Algebra

Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary

### CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

### 5. Linear Regression

5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

### POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

### Chapter 7. One-way ANOVA

Chapter 7 One-way ANOVA One-way ANOVA examines equality of population means for a quantitative outcome and a single categorical explanatory variable with any number of levels. The t-test of Chapter 6 looks

### Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY

Biostatistics: DESCRIPTIVE STATISTICS: 2, VARIABILITY 1. Introduction Besides arriving at an appropriate expression of an average or consensus value for observations of a population, it is important to

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Week 5: Multiple Linear Regression

BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School

### How To Check For Differences In The One Way Anova

MINITAB ASSISTANT WHITE PAPER This paper explains the research conducted by Minitab statisticians to develop the methods and data checks used in the Assistant in Minitab 17 Statistical Software. One-Way

### Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures

Testing Group Differences using T-tests, ANOVA, and Nonparametric Measures Jamie DeCoster Department of Psychology University of Alabama 348 Gordon Palmer Hall Box 870348 Tuscaloosa, AL 35487-0348 Phone:

### December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### Recall this chart that showed how most of our course would be organized:

Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

### Statistical Functions in Excel

Statistical Functions in Excel There are many statistical functions in Excel. Moreover, there are other functions that are not specified as statistical functions that are helpful in some statistical analyses.

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### Multivariate Analysis of Variance (MANOVA)

Chapter 415 Multivariate Analysis of Variance (MANOVA) Introduction Multivariate analysis of variance (MANOVA) is an extension of common analysis of variance (ANOVA). In ANOVA, differences among various

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### CS 147: Computer Systems Performance Analysis

CS 147: Computer Systems Performance Analysis One-Factor Experiments CS 147: Computer Systems Performance Analysis One-Factor Experiments 1 / 42 Overview Introduction Overview Overview Introduction Finding

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA

We Can Early Learning Curriculum PreK Grades 8 12 INSIDE ALGEBRA, GRADES 8 12 CORRELATED TO THE SOUTH CAROLINA COLLEGE AND CAREER-READY FOUNDATIONS IN ALGEBRA April 2016 www.voyagersopris.com Mathematical

### Module 3: Correlation and Covariance

Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

### UNDERSTANDING THE TWO-WAY ANOVA

UNDERSTANDING THE e have seen how the one-way ANOVA can be used to compare two or more sample means in studies involving a single independent variable. This can be extended to two independent variables

### X X X a) perfect linear correlation b) no correlation c) positive correlation (r = 1) (r = 0) (0 < r < 1)

CORRELATION AND REGRESSION / 47 CHAPTER EIGHT CORRELATION AND REGRESSION Correlation and regression are statistical methods that are commonly used in the medical literature to compare two or more variables.

### AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

### An analysis method for a quantitative outcome and two categorical explanatory variables.

Chapter 11 Two-Way ANOVA An analysis method for a quantitative outcome and two categorical explanatory variables. If an experiment has a quantitative outcome and two categorical explanatory variables that

### DATA INTERPRETATION AND STATISTICS

PholC60 September 001 DATA INTERPRETATION AND STATISTICS Books A easy and systematic introductory text is Essentials of Medical Statistics by Betty Kirkwood, published by Blackwell at about 14. DESCRIPTIVE

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution

A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September

### Algebra I Vocabulary Cards

Algebra I Vocabulary Cards Table of Contents Expressions and Operations Natural Numbers Whole Numbers Integers Rational Numbers Irrational Numbers Real Numbers Absolute Value Order of Operations Expression

### Part II. Multiple Linear Regression

Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations

### Data Analysis Tools. Tools for Summarizing Data

Data Analysis Tools This section of the notes is meant to introduce you to many of the tools that are provided by Excel under the Tools/Data Analysis menu item. If your computer does not have that tool

### Multivariate Analysis of Variance (MANOVA)

Multivariate Analysis of Variance (MANOVA) Aaron French, Marcelo Macedo, John Poulsen, Tyler Waterson and Angela Yu Keywords: MANCOVA, special cases, assumptions, further reading, computations Introduction

### 1 Simple Linear Regression I Least Squares Estimation

Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data

Using Excel (Microsoft Office 2007 Version) for Graphical Analysis of Data Introduction In several upcoming labs, a primary goal will be to determine the mathematical relationship between two variable

### Algebraic expressions are a combination of numbers and variables. Here are examples of some basic algebraic expressions.

Page 1 of 13 Review of Linear Expressions and Equations Skills involving linear equations can be divided into the following groups: Simplifying algebraic expressions. Linear expressions. Solving linear

### SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89. by Joseph Collison

SYSTEMS OF EQUATIONS AND MATRICES WITH THE TI-89 by Joseph Collison Copyright 2000 by Joseph Collison All rights reserved Reproduction or translation of any part of this work beyond that permitted by Sections

### Multivariate Analysis of Variance (MANOVA): I. Theory

Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

### GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

### One-Way ANOVA using SPSS 11.0. SPSS ANOVA procedures found in the Compare Means analyses. Specifically, we demonstrate

1 One-Way ANOVA using SPSS 11.0 This section covers steps for testing the difference between three or more group means using the SPSS ANOVA procedures found in the Compare Means analyses. Specifically,

### Coefficient of Determination

Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

### Analysis of Variance ANOVA

Analysis of Variance ANOVA Overview We ve used the t -test to compare the means from two independent groups. Now we ve come to the final topic of the course: how to compare means from more than two populations.

### Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

### MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

### 5. Multiple regression

5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Fairfield Public Schools

Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

### Correlation and Regression

Correlation and Regression Scatterplots Correlation Explanatory and response variables Simple linear regression General Principles of Data Analysis First plot the data, then add numerical summaries Look

### Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

### 15.062 Data Mining: Algorithms and Applications Matrix Math Review

.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

### Analysis of Variance. MINITAB User s Guide 2 3-1

3 Analysis of Variance Analysis of Variance Overview, 3-2 One-Way Analysis of Variance, 3-5 Two-Way Analysis of Variance, 3-11 Analysis of Means, 3-13 Overview of Balanced ANOVA and GLM, 3-18 Balanced

### II. DISTRIBUTIONS distribution normal distribution. standard scores

Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

### Review of Fundamental Mathematics

Review of Fundamental Mathematics As explained in the Preface and in Chapter 1 of your textbook, managerial economics applies microeconomic theory to business decision making. The decision-making tools

### Equations, Inequalities & Partial Fractions

Contents Equations, Inequalities & Partial Fractions.1 Solving Linear Equations 2.2 Solving Quadratic Equations 1. Solving Polynomial Equations 1.4 Solving Simultaneous Linear Equations 42.5 Solving Inequalities

### Elements of statistics (MATH0487-1)

Elements of statistics (MATH0487-1) Prof. Dr. Dr. K. Van Steen University of Liège, Belgium December 10, 2012 Introduction to Statistics Basic Probability Revisited Sampling Exploratory Data Analysis -

### Simple linear regression

Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

### Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

### 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

### Hypothesis testing - Steps

Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

### Using R for Linear Regression

Using R for Linear Regression In the following handout words and symbols in bold are R functions and words and symbols in italics are entries supplied by the user; underlined words and symbols are optional

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

### QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS

QUANTITATIVE METHODS BIOLOGY FINAL HONOUR SCHOOL NON-PARAMETRIC TESTS This booklet contains lecture notes for the nonparametric work in the QM course. This booklet may be online at http://users.ox.ac.uk/~grafen/qmnotes/index.html.