4.5.1 Model Utility Test To test if any of the predictors has an effect test the null hypothesis that all slopes are 0

Size: px
Start display at page:

Download "4.5.1 Model Utility Test To test if any of the predictors has an effect test the null hypothesis that all slopes are 0"

Transcription

1 4. Hypothesis Tests After estimating the parameters of the model, we should check if the model is supported by the data. The first questions to be answered will be if any of the predictors has an effect on the mean response. If evidence shows this to be true, then we should answer the question, which of predictors has an effect. 4.. Model Utility Test To test if any of the predictors has an effect test the null hypothesis that all slopes are 0 H 0 : β β β k 0 versus H a : at least on slope is not 0 The test is based on an ANOVA approach, where we analyse the total sum of squares in Y, and find how much of the variation is due to the variables in the model and how much of the variation remains unexplained. The Total Sum of Squares: Let J n be the n n matrix with all entries being. SS T Y Y The Residual Sum of Squares (see above): n Y J n Y nx i (Y i Ȳ ) SS Res e e Y (I n X(X X) X ) Y Y Y ˆβ X Y nx The Sum of Squares for Regression: SS R SS T SS Res ˆβ X Y n Y J n Y nx i i (Ŷi Ȳ ) (Y i Ŷi) The F-score relating the Sum of Squares for Regression with the Sum of Squares for Residuals is used for the test SS R /k F SS Res /(n k ) MS R MS Res Under the assumptions of the MLRM and if β β β k 0 this F-score is F-distributed with df n k and df d n k. The proof of this result uses the fact that SS R is also χ distributed with df k and that SS R and SS Res are independent. The information for this test is usually summarized in an ANOVA table. Source SS df MS F Regression SS R k MS R MS R /MS Res Residual SS Res n k MS Res Total SS T n

2 ANOVA F-test. Hyp.: Choose α. H 0 : β β β k 0 versus H a : at least on slope is not 0. Assumptions: Random samples, the MLRM is a proper description of the population, including homoscedasticity and normality of the error.. Test statistic: F 0 4. P-value: P (F > F 0 ) SS R /k SS Res /(n k ) MS R MS Res, df n k p, df d n k n p Continue Example.?? Continue the example on the effect of age and weight on blood pressure and the analysis of the model: bp β 0 + β (weight) + β (age) + ε, ε N (0, σ ) First find the ANOVA table. We already calculated SS Res.8. It is SS T y y n y J y (0, 4, 4,,, 9) Therefore SS R SS T SS Res (Use that n y J y ( P n i y ) /n) ANOVA table C A () 4.8 Source SS df MS F Regression Residual.8.9 Total 4.8 ANOVA F-test. Hyp.: Choose α 0.0. H 0 : β β 0 versus H a : at least one slope is not 0. Assumptions: Random samples, the MLRM is a proper description of the population, including homoscedacity and normality of the error.

3 . Test statistic: F , df n, df d 4. P-value: P (F > F 0 ), use F-distribution table from my website, since F 0 is greater than the 0.00 critical value of the F distribution with df n and df d, the P-value is smaller than Therefore the P-value is smaller than α, reject H 0.. Context: At significance level of % the data provide sufficient evidence that at least one of the slopes is different from zero, i.e. that at least one of age or weight has an effect on the mean blood pressure. From here arises the question which of the slopes is not zero. The model utility test does not indicate that the model is a good description for the population, only that if it is a good fit then at least on of the regressors has an effect on the mean response. A residual analysis will help to judge if the model is appropriate for the population under consideration. Adjusted coefficient of determination The adjusted coefficient of determination, R adj, indicates how well the data is represented by the estimated model. In comparison to the coefficient of determination, R, it is adjusted for the number of variables in the model. When adding an additional variable to an existing model the coefficient of determination will always increase, regardless of the relevance of the variable for the response. The adjusted coefficient of determination includes an adjustment for the number of variables in the model to correct for this effect, but otherwise is to be interpreted in the same way as R. R adj MS Res MS T Continue Example.?? The adjusted coefficient of determination for the blood pressure example is R adj.9 4.8/ 0.9 9% of the sample variation in blood pressure can be explained by the changes in weight and age. The adjusted coefficient of determination is often used in the model building process. 4.. Test for individual slopes In order to find out if a variable x i has a significant linear effect on the mean response when correcting for the other variables in the model, a test for H 0 : β i 0, i k should be conducted. The test can be based on the t-score standardizing ˆβ i : Theorem 4.. In the MLRM t ˆβ i β i se( ˆβ i ) ˆβ i β i ˆσ C ii, where C ii is the i th diagonal entry in (X X), is t-distributed with df n p.

4 Conclusion: Let i {,,..., k}. A ( α) 00% confidence interval for slope β i is given by ˆβ i ± t n p α/ Ȉσ C ii, Test for individual slopes correcting for the other variables in the model:. Hypotheses: Choose α. Type Hypotheses upper tail H 0 : β i β i0 versus H a : β i > β i0 lower tail H 0 : β i β i0 versus H a : β i < β i0 tail H 0 : β i β i0 versus H a : β i β i0. Assumptions: Random samples, the MLRM is a proper description of the population, including homoscedacity and normality of the error.. Test statistic: t 0 ˆβ i β i0 ˆσ C ii, df n p 4. P-value: Type P-value upper tail P (t > t 0 ) lower tail P (t < t 0 ) tail P (t > t 0 ) Continue Example.?? Test if the mean blood pressure increases with increasing weight(x ) within the proposed model bp β 0 + β (weight) + β (age) + ε, ε N (0, σ ). Hypotheses: H 0 : β 0 versus H a : β > 0, choose α Assumptions: Random samples, the MLRM is a proper description of the population, including homoscedasticity and normality of the error.. Test statistic: t È.9(0.08).8, df 4. P-value: P-value P (t >.8), using the t-table with df, find that the P-value< Decision: Reject H 0.. Context: At significance level of % the data provide sufficient evidence that on average the blood pressure increases with weight when correcting for age. 4

5 4.. Test for a subset of slopes In some applications it is of interest to test if in the presence of some (usually confounding) variables at least one in a set of slopes for different variables is not zero. Such a question would for example arise, when it is of interest to find the effect of the amount of humour, violence, and drama on the popularity of a movie, after correcting for the effects of age and sex (the confounding variables). For the development of the test statistic the extra sum of squares principle is applied. It is based on the comparison of the residual sum of squares for two models:. The full model: model which includes all variables.. The reduced model: the model only including the variables to be corrected for. Essentially it is tested if the inclusion of the extra variables results in a significant decrease in the residual sum of squares.. The full model with k predictors, therefore p k + parameters (including the intercept) Y X β + ε where Y and ε are n dimensional random vectors, the design matrix X R n p, and the parameter vector β R p. To test for the effect of r < k predictors partition the design matrix and the parameter vector such that Y [X X ] " β β # + ε X β + X β + ε with partitioned design matrix where X R n (p r) and X R n r, and partitioned parameter vector where β R (p r) and β R r. For the full model with ˆβ (X X) X Y with df n p. We wish to test H 0 : β 0. SS Res ( β) Y Y ˆβ X Y. The reduced model, where we assume β 0, with k r predictors and k r + p r parameters: Y X β + ε and with ˆβ (X X ) X Y with df n (p r) n p + r. SS Res ( β ) Y Y ˆβ X Y

6 The difference in the Sum of Squares for Residuals when adding variables x k r+,..., x k to the model with variables x,..., x k r is SS Res ( β β ) SS Res ( β ) SS Res ( β) with df n p + r (n p) r. These are called the extra sum of squares due to β (extra sum of squares for regression, since the decrease in the residual sum of squares implies an increase by the same amount in the regression sum of squares). One can prove that SS Res ( β β ) is independent of MS Res therefore is F distributed with df n r and df d n p. ANOVA Extra Sum of Squares Test:. Hypotheses: Choose α. F SS Res( β β )/r MS Res H 0 : β k r+ β k 0( β 0) versus H a : at least one slope is not 0. Assumptions: Random samples, the MLRM is a proper description of the population, including homoscedasticity and normality of the error.. Test statistic: F 0 SS Res( β β )/r MS Res, df n r, df d n k 4. P-value: P (F > F 0 ) Continue Example.?? The model is bp β 0 + β (weight) + β (age) + ε, ε N (0, σ ) Use the extra sum of squares test to see if age has a significant effect on the mean blood pressure when correcting for weight. The full model: The reduced model:. Hypotheses: Choose α. bp β 0 + β (weight) + β (age) + ε, ε N (0, σ ) bp β 0 + β (weight) + ε, ε N (0, σ ) H 0 : β 0 versus H a : β 0

7 . Assumptions: Random samples, the MLRM is a proper description of the population, including homoscedasticity and normality of the error.. Test statistic: r. From previous work SS Res ( β).8 and MS Res.9. For finding SS Res (β ) X C A with X X Œ and (X X ) Œ and X y 0 Œ gives SS Res (β ) y y (X y) (X X ) (X y) 8. F 0 (8.8.8)/.9 4.4, df n r, df d n p 4. P-value: 0.00 < P (F > F 0 ) < 0.0 according to the F-distribution table.. Decision: Reject H 0.. Context: At significance level of % the data provide sufficient evidence that age has a significant effect on the blood pressure when correcting for weight. R reduced<-lm(bp~weight) full<-lm(bp~weight+age) anova(reduced, full) 4. Orthogonal Columns in the Design Matrix Definition 4.. Matrices A R n m and B R n k are orthogonal, iff A B 0 m k. Theorem 4.. If the design matric can be partitioned, so that X [X X ] with X X 0 then the least squares estimator for β ( β, β ) can be partitioned into ˆβ ( ˆβ, ˆβ ), so that ˆβ (X X ) X Y, and ˆβ (X X ) X Y

8 Proof: It is ˆβ (X X) X Y. First find (X X) : X X X [X X X ] X X X X X X X X X X 0 (p r) r 0 r (p r) X X Then is the inverse of X X, because X XA A 4 X X 0 (p r) r 0 r (p r) X X (X X ) 0 (p r) r 0 r (p r) (X X ) 4 (X X ) 0 (p r) r 0 r (p r) (X X ) 4 (X X )(X X ) + 0 (p r) r 0 r (p r) (X X )0 (p r) r + 0 r (p r) (X X ) 0 r (p r) (X X ) + (X X )0 r (p r) 0 r (p r) 0 (p r) r + (X X )(X X ) 4 I p r 0 (p r) r 0 r (p r) I r With this result now find ˆβ I p ˆβ (X X) X Y 4 (X X ) 0 (p r) r 0 r (p r) (X X ) 4 (X X ) 0 (p r) r 0 r (p r) (X X ) 4 (X X ) X Y (X X ) X Y 4 ˆβ ˆβ 4 X X 4 X Y X Y Y q.e.d. Conclusion: Because of the theorem, we find in the case of orthogonal columns in the design matrix, that the parameter vector can be partitioned, so that the parts are independent from each other, and can be found independently. The extra sum of squares test for H 0 : β 0 is in the case of orthogonal columns the same as the model utility test in the partitioned model Y X β + ε. For the full model, it is now easy to prove that SS R ( β) SS R ( β ) + SS R ( β ) 8

9 I.e. the sum of squares for regression for the complete parameter vector equals the sum of squares for regression for the partitioned parameter vector, and therefore according to the extra sum of squares principle the two are independent, and tests for one partitioned vector is independent from the other. 9

Regression Analysis: A Complete Example

Regression Analysis: A Complete Example Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

More information

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96 1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

More information

" Y. Notation and Equations for Regression Lecture 11/4. Notation:

 Y. Notation and Equations for Regression Lecture 11/4. Notation: Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

More information

Final Exam Practice Problem Answers

Final Exam Practice Problem Answers Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Notes on Applied Linear Regression

Notes on Applied Linear Regression Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

More information

One-Way Analysis of Variance

One-Way Analysis of Variance One-Way Analysis of Variance Note: Much of the math here is tedious but straightforward. We ll skim over it in class but you should be sure to ask questions if you don t understand it. I. Overview A. We

More information

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression

CHAPTER 13 SIMPLE LINEAR REGRESSION. Opening Example. Simple Regression. Linear Regression Opening Example CHAPTER 13 SIMPLE LINEAR REGREION SIMPLE LINEAR REGREION! Simple Regression! Linear Regression Simple Regression Definition A regression model is a mathematical equation that descries the

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) Limitations of the t-test Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

More information

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480 1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

More information

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy. Blue vs. Orange. Review Jeopardy Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

More information

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1) Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

1 Introduction to Matrices

1 Introduction to Matrices 1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns

More information

Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA. Analysis Of Variance Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

More information

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996)

MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL. by Michael L. Orlov Chemistry Department, Oregon State University (1996) MULTIPLE LINEAR REGRESSION ANALYSIS USING MICROSOFT EXCEL by Michael L. Orlov Chemistry Department, Oregon State University (1996) INTRODUCTION In modern science, regression analysis is a necessary part

More information

Linear Models in STATA and ANOVA

Linear Models in STATA and ANOVA Session 4 Linear Models in STATA and ANOVA Page Strengths of Linear Relationships 4-2 A Note on Non-Linear Relationships 4-4 Multiple Linear Regression 4-5 Removal of Variables 4-8 Independent Samples

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

More information

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

More information

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

Hypothesis testing - Steps

Hypothesis testing - Steps Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

More information

Study Guide for the Final Exam

Study Guide for the Final Exam Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

More information

The importance of graphing the data: Anscombe s regression examples

The importance of graphing the data: Anscombe s regression examples The importance of graphing the data: Anscombe s regression examples Bruce Weaver Northern Health Research Conference Nipissing University, North Bay May 30-31, 2008 B. Weaver, NHRC 2008 1 The Objective

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics.

Business Statistics. Successful completion of Introductory and/or Intermediate Algebra courses is recommended before taking Business Statistics. Business Course Text Bowerman, Bruce L., Richard T. O'Connell, J. B. Orris, and Dawn C. Porter. Essentials of Business, 2nd edition, McGraw-Hill/Irwin, 2008, ISBN: 978-0-07-331988-9. Required Computing

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

GLM I An Introduction to Generalized Linear Models

GLM I An Introduction to Generalized Linear Models GLM I An Introduction to Generalized Linear Models CAS Ratemaking and Product Management Seminar March 2009 Presented by: Tanya D. Havlicek, Actuarial Assistant 0 ANTITRUST Notice The Casualty Actuarial

More information

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.

Example: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not. Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

More information

International Statistical Institute, 56th Session, 2007: Phil Everson

International Statistical Institute, 56th Session, 2007: Phil Everson Teaching Regression using American Football Scores Everson, Phil Swarthmore College Department of Mathematics and Statistics 5 College Avenue Swarthmore, PA198, USA E-mail: peverso1@swarthmore.edu 1. Introduction

More information

Statistical Models in R

Statistical Models in R Statistical Models in R Some Examples Steven Buechler Department of Mathematics 276B Hurley Hall; 1-6233 Fall, 2007 Outline Statistical Models Linear Models in R Regression Regression analysis is the appropriate

More information

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines

hp calculators HP 50g Trend Lines The STAT menu Trend Lines Practice predicting the future using trend lines The STAT menu Trend Lines Practice predicting the future using trend lines The STAT menu The Statistics menu is accessed from the ORANGE shifted function of the 5 key by pressing Ù. When pressed, a CHOOSE

More information

Introduction to Linear Regression

Introduction to Linear Regression 14. Regression A. Introduction to Simple Linear Regression B. Partitioning Sums of Squares C. Standard Error of the Estimate D. Inferential Statistics for b and r E. Influential Observations F. Regression

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

More information

Chapter Four. Data Analyses and Presentation of the Findings

Chapter Four. Data Analyses and Presentation of the Findings Chapter Four Data Analyses and Presentation of the Findings The fourth chapter represents the focal point of the research report. Previous chapters of the report have laid the groundwork for the project.

More information

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

More information

MULTIPLE REGRESSION WITH CATEGORICAL DATA

MULTIPLE REGRESSION WITH CATEGORICAL DATA DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

More information

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics

Course Text. Required Computing Software. Course Description. Course Objectives. StraighterLine. Business Statistics Course Text Business Statistics Lind, Douglas A., Marchal, William A. and Samuel A. Wathen. Basic Statistics for Business and Economics, 7th edition, McGraw-Hill/Irwin, 2010, ISBN: 9780077384470 [This

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

General Regression Formulae ) (N-2) (1 - r 2 YX

General Regression Formulae ) (N-2) (1 - r 2 YX General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics

TRINITY COLLEGE. Faculty of Engineering, Mathematics and Science. School of Computer Science & Statistics UNIVERSITY OF DUBLIN TRINITY COLLEGE Faculty of Engineering, Mathematics and Science School of Computer Science & Statistics BA (Mod) Enter Course Title Trinity Term 2013 Junior/Senior Sophister ST7002

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

A Primer on Forecasting Business Performance

A Primer on Forecasting Business Performance A Primer on Forecasting Business Performance There are two common approaches to forecasting: qualitative and quantitative. Qualitative forecasting methods are important when historical data is not available.

More information

Recall this chart that showed how most of our course would be organized:

Recall this chart that showed how most of our course would be organized: Chapter 4 One-Way ANOVA Recall this chart that showed how most of our course would be organized: Explanatory Variable(s) Response Variable Methods Categorical Categorical Contingency Tables Categorical

More information

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES.

COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. 277 CHAPTER VI COMPARISONS OF CUSTOMER LOYALTY: PUBLIC & PRIVATE INSURANCE COMPANIES. This chapter contains a full discussion of customer loyalty comparisons between private and public insurance companies

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA)

UNDERSTANDING ANALYSIS OF COVARIANCE (ANCOVA) UNDERSTANDING ANALYSIS OF COVARIANCE () In general, research is conducted for the purpose of explaining the effects of the independent variable on the dependent variable, and the purpose of research design

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Simple Methods and Procedures Used in Forecasting

Simple Methods and Procedures Used in Forecasting Simple Methods and Procedures Used in Forecasting The project prepared by : Sven Gingelmaier Michael Richter Under direction of the Maria Jadamus-Hacura What Is Forecasting? Prediction of future events

More information

Orthogonal Diagonalization of Symmetric Matrices

Orthogonal Diagonalization of Symmetric Matrices MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding

More information

Factor Analysis. Chapter 420. Introduction

Factor Analysis. Chapter 420. Introduction Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

More information

Inner product. Definition of inner product

Inner product. Definition of inner product Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product

More information

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA)

INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) INTERPRETING THE ONE-WAY ANALYSIS OF VARIANCE (ANOVA) As with other parametric statistics, we begin the one-way ANOVA with a test of the underlying assumptions. Our first assumption is the assumption of

More information

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d. DEFINITION: A vector space is a nonempty set V of objects, called vectors, on which are defined two operations, called addition and multiplication by scalars (real numbers), subject to the following axioms

More information

[1] Diagonal factorization

[1] Diagonal factorization 8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:

More information

10. Analysis of Longitudinal Studies Repeat-measures analysis

10. Analysis of Longitudinal Studies Repeat-measures analysis Research Methods II 99 10. Analysis of Longitudinal Studies Repeat-measures analysis This chapter builds on the concepts and methods described in Chapters 7 and 8 of Mother and Child Health: Research methods.

More information

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ

1. The parameters to be estimated in the simple linear regression model Y=α+βx+ε ε~n(0,σ) are: a) α, β, σ b) α, β, ε c) a, b, s d) ε, 0, σ STA 3024 Practice Problems Exam 2 NOTE: These are just Practice Problems. This is NOT meant to look just like the test, and it is NOT the only thing that you should study. Make sure you know all the material

More information

1 Simple Linear Regression I Least Squares Estimation

1 Simple Linear Regression I Least Squares Estimation Simple Linear Regression I Least Squares Estimation Textbook Sections: 8. 8.3 Previously, we have worked with a random variable x that comes from a population that is normally distributed with mean µ and

More information

Part II. Multiple Linear Regression

Part II. Multiple Linear Regression Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations

More information

Chapter 23. Inferences for Regression

Chapter 23. Inferences for Regression Chapter 23. Inferences for Regression Topics covered in this chapter: Simple Linear Regression Simple Linear Regression Example 23.1: Crying and IQ The Problem: Infants who cry easily may be more easily

More information

Concepts of Experimental Design

Concepts of Experimental Design Design Institute for Six Sigma A SAS White Paper Table of Contents Introduction...1 Basic Concepts... 1 Designing an Experiment... 2 Write Down Research Problem and Questions... 2 Define Population...

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

More information

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance

Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance Lecture 11: Confidence intervals and model comparison for linear regression; analysis of variance 14 November 2007 1 Confidence intervals and hypothesis testing for linear regression Just as there was

More information

Simple Linear Regression, Scatterplots, and Bivariate Correlation

Simple Linear Regression, Scatterplots, and Bivariate Correlation 1 Simple Linear Regression, Scatterplots, and Bivariate Correlation This section covers procedures for testing the association between two continuous variables using the SPSS Regression and Correlate analyses.

More information

The Method of Least Squares

The Method of Least Squares Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this

More information

Parametric and non-parametric statistical methods for the life sciences - Session I

Parametric and non-parametric statistical methods for the life sciences - Session I Why nonparametric methods What test to use? Rank Tests Parametric and non-parametric statistical methods for the life sciences - Session I Liesbeth Bruckers Geert Molenberghs Interuniversity Institute

More information

SAS Software to Fit the Generalized Linear Model

SAS Software to Fit the Generalized Linear Model SAS Software to Fit the Generalized Linear Model Gordon Johnston, SAS Institute Inc., Cary, NC Abstract In recent years, the class of generalized linear models has gained popularity as a statistical modeling

More information

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables. SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

More information

Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

More information

USING THE ETS MAJOR FIELD TEST IN BUSINESS TO COMPARE ONLINE AND CLASSROOM STUDENT LEARNING

USING THE ETS MAJOR FIELD TEST IN BUSINESS TO COMPARE ONLINE AND CLASSROOM STUDENT LEARNING USING THE ETS MAJOR FIELD TEST IN BUSINESS TO COMPARE ONLINE AND CLASSROOM STUDENT LEARNING Andrew Tiger, Southeastern Oklahoma State University, atiger@se.edu Jimmy Speers, Southeastern Oklahoma State

More information

E3: PROBABILITY AND STATISTICS lecture notes

E3: PROBABILITY AND STATISTICS lecture notes E3: PROBABILITY AND STATISTICS lecture notes 2 Contents 1 PROBABILITY THEORY 7 1.1 Experiments and random events............................ 7 1.2 Certain event. Impossible event............................

More information

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or

2. What is the general linear model to be used to model linear trend? (Write out the model) = + + + or Simple and Multiple Regression Analysis Example: Explore the relationships among Month, Adv.$ and Sales $: 1. Prepare a scatter plot of these data. The scatter plots for Adv.$ versus Sales, and Month versus

More information

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm

Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm Mgt 540 Research Methods Data Analysis 1 Additional sources Compilation of sources: http://lrs.ed.uiuc.edu/tseportal/datacollectionmethodologies/jin-tselink/tselink.htm http://web.utk.edu/~dap/random/order/start.htm

More information

Introduction to Principal Components and FactorAnalysis

Introduction to Principal Components and FactorAnalysis Introduction to Principal Components and FactorAnalysis Multivariate Analysis often starts out with data involving a substantial number of correlated variables. Principal Component Analysis (PCA) is a

More information