# Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur

Save this PDF as:

Size: px
Start display at page:

Download "Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur"

## Transcription

1 Regression Analysis Prof. Soumen Maity Department of Mathematics Indian Institute of Technology, Kharagpur Lecture - 7 Multiple Linear Regression (Contd.) This is my second lecture on Multiple Linear Regression and the content of today's lecture is basically hypothesis testing in multiple linear regression. (Refer Slide Time: 01:06) We will be talking about, how to test the significance of regression model and test on individual regression coefficient and test for several parameter being 0 using the extra sum of squares method. This extra sum of squares method is a very important technique in regression analysis. Anyway, in the last lecture, we have learned how to fit a multiple linear regression model. So, given a set of observations on response variable and say for regressor variable, we know how to estimate the regression coefficients using least squares method. So, once you have a fitted model, the next important job is to test the significant of the model. So, by testing the significance of a multiple linear regression model, I mean whether there is linear relationship between the response variable and the regressor variable.

2 (Refer Slide Time: 02:59) Test for significance of regression model, so as I told that, distinct significance of regression model it means, whether there is a linear relationship between the response and any of the regressive variables. So, this is to test, whether if there is a linear relationship between the response and any of the regressor variables X 1, X 2, X k minus 1. So, this can be tested by testing the hypothesis that H naught, which is a null hypothesis, by testing this null hypothesis beta 1 equal to beta 2 equal to beta k minus 1 equal to 0 against the alternative hypothesis, that beta j is not equal to 0 for at least one j. So, the null hypothesis says that, there is no linear relationship between the response variable and any of the regressor variable and alternative hypothesis says that, no there is at least one regressor variable, which contribute significantly to the model. So, this is the hypothesis we want to test, so the rejecting the null hypothesis here implies that, at least one of the regressor variables X 1, X 2, X k minus 1 contributes significantly to the model. So, here beta j is not equal to 0 means, the regressor variable X j contributes significantly to the model. So, to test these hypotheses, we will be taking the ANOVA approach. We know that, the total sum of square SS T, which is equal to SS regression plus residual. Now, we also know that, the SS T has a degree of freedom n minus 1, SS residual has degree of freedom n minus k and SS regression has degree of freedom k minus.

3 (Refer Slide Time: 08:08) Now, SS regression by sigma square, this follows chi square distribution with degree of freedom k minus 1 and also we know that, SS residual by sigma square, this follows chi square n minus k and they are independent. Then from the definition of F statistics, by the definition of F statistics, F is equal to SS regression by k minus 1 by SS residual by n minus k. This random variable follows F distribution with degree of freedom k minus 1 n minus k and also it can be, of course we know that, MS residual which is basically SS residual by n minus k, this is unbiased estimator of sigma square. So, we know that, expected value of MS residual is equal to sigma square and also it can be proved that, the expected value of, this one is nothing but MS regression, expected value of MS mean squares regression, this is equal to sigma square plus beta star dashed X c prime X c beta star by k minus 1 sigma square. So, just I need to define, what is this beta star and X c.

4 (Refer Slide Time: 11:31) Basically, B star is equal to beta 1, beta 2, beta k minus 1 I mean, the beta vector was like beta naught beta 1 beta 2 beta k minus 1. So, beta star is obtained by excluding beta naught from beta and X c is equal to, it has also been obtained from the X matrix. But, this one is X 11 minus X bar that is, the mean of the observations I mean, all the regressor variables and X 1 k minus 1 minus X bar and X n 1 minus X bar X n k minus 1 minus X bar. (Refer Slide Time: 12:54)

5 So now, look at these two expected value and these two expected values indicate that, if the observer will value of F is large, because see F is nothing but F is equal to MS regression by MS residual and here is the expected value of MS regression and here is the expected value MS residual. So, if the observed value of F is large then there is at least one beta, which is not equal to j. So, the higher value of F indicates that, at least one beta j is not equal to 0. If all the regression coefficient beta j are equal to 0 then this quantity is going to be 0 and F is going to be equal to 1. So, higher value of the observed value I mean, higher value of observed F indicate that, at least one beta j is not equal 0. (Refer Slide Time: 14:39) So, best on this, we reject H naught that is, the null hypothesis that says that, beta 1 equal to beta 2 equal to beta k minus 1 equal to 0. We rejected this null hypothesis if F value is high, high means I mean, F value is greater than F alpha and it has the degree of freedom k minus 1, n minus k, so this value you can get from the statistical table. So, now we just summaries the whole thing using the ANOVA table, so here is the ANOVA table for multiple linear regression.

6 (Refer Slide Time: 15:37) Source of variation, degree of freedom, sum of square, mean square and the F value. Sources of variation, it could be the variation due to, the variation that is explained by the regression, the variation that is remained unexplained that is, SS residual and this the total variation. The degree of freedom for this one is n minus 1, the degree of freedom for regression is k minus 1 and residual is n minus k and this is called SS regression, SS residual, SS T. So, I have explained all these things, what is SS T, what is SS residual, what is SS regression in the previous lecture. So, MS regression is equal to SS regression by the degree of freedom k minus 1. Similarly, MS residual is equal to SS residual by the degree of freedom n minus k and here you have the F value, which is equal to MS regression by MS residual. And we know that, this F follows F distribution with the degree of freedom k minus 1, n minus k. And we reject H naught, if F is greater than I mean, observed F is greater than F tabulated F alpha, k minus 1, n minus k. So, next I move to the test on individual regression coefficients, once you determined that your null hypothesis in the previous test is rejected. That means, there is linear relationship between the response variable and the aggressive variable that means, the null hypothesis is rejected means, there is at least one regressor variable, which has the significance contribution to the response variable.

7 So, once the null hypothesis previous state is rejected, we know that, there is at least one regressor, which has significant contribution to explain the variability in the response variable. Now, the next obvious question is, which regressor variable has significant contribution, so we need to test the regressor coefficient individually. (Refer Slide Time: 20:28) So, test on the individual regression coefficient, so it is also called partial test, partial or marginal test, previous is on global test, forgot to mention that. So here, once you determined that, at least one of the regressor variable is significant, so the next question, which one is significant. So, for this one, we test the hypothesis H naught which says that, beta j equal to 0 against the alternative hypothesis H 1 that is, beta j is not equal to 0. So, basically this one and test this hypothesis test the significance of x j in the presence of other regressor in the model. So, how to test this hypothesis, that beta j is equal to 0 against the alternative hypothesis, that beta j is not equal to 0. We can go for, of course we know that, the unbiased estimator we got is beta j, which is equal to X prime X inverse X prime Y. And we know that, this beta hat follows normal distribution with mean beta and variance sigma square X prime X inverse, that is what I proved in the last class. Now, from here, we can conclude that, beta j hat by sigma square, so this is the total I mean, this is the variance covariance matrix. But, here we are only concerned about beta

8 j, so will be taking the j j th element here. So, sigma square X prime X inverse, you just take the j j th element. So, this one is nothing but this follows, you can say that this follows normal distribution with mean 0 and variance sigma square. Now, of course, see sigma square is not known, so if we replace the sigma square by MS residual then this variable or this random variable is going to follow t distribution. (Refer Slide Time: 25:00) So, the test statistic for this testing is that, t equal to beta j hat by MS residual X prime X inverse the j j th element of this, this follows t distribution with the degree of freedom n minus k, of course this is under H naught. So, I did mistake here, this does not follow normal (0,1), this minus beta j this follows normal (0,1) and under H naught, this beta j is equal to 0. So, under H naught, you can say that, this random variable follows normal (0,1). And of course, we are going to reject the null hypothesis, so H 0 which says that, beta j is equal to 0 is rejected if t value far from 0, if this t value is greater than t alpha by 2, n minus k. So, this is the critical region to test the beta j equal to 0, so the first step is that, you test whether the fitted model is significant. That means, whether there is any linear relationship between the response variable and any of the regressor variable by using the global test that means, the hypothesis we tested at the beginning. So, if you see that, yes the test is significant that means, there is significant contribution of at least one regressor then you go further partial test. I mean, once you know that, at

9 least one of the regressor variables has significant contribution to the response variable then you need to be determine, which regressor has the significant contribution. To determine that, you need to go further partial test, so next we will move to, we want to test for several parameters being 0. (Refer Slide Time: 28:23) So, this is the technique, here is called extra sum of squares method and this one has very important application in the regressor analysis. What to do here is that, we want test for several parameter being 0. So, what I mean by this is that, suppose you have multiple linear regression model say, Y equal to X beta plus epsilon. Here, this beta is k cross 1 vector, so it involves beta naught and k minus 1 regressor coefficients. Now, what you want is that, we would like to determine if some subset of r, of course r less than k minus 1, k minus 1 is the total number of regressors. If some subset of r regressors contribute significantly to the regression model, so what I mean by this one is that, suppose you are given a problem. And in that problem, there are 4 regressor variables and 1 response variable. So, you need to fit the model like Y is equal beta naught plus beta 1 X 1 plus beta 2 X 2 plus beta 3 X 3 plus beta 4 X 4 plus epsilon. And after fitting the model, you feel that, some of the regressor variables are not significant, may be you believe that, X 4 and X 3, they are not significant. So, what you want to test is that, whether beta 3 and beta 4 is equal to 0, again the alternative hypothesis is that, at least one of them is not equal to 0. So, even to test the

10 significance of beta 3 and beta 4 or X 3 and X 4, the regressor variable X 3 and X 4 in the presence of X 1 and X 2. So, extra sum of square technique is use to test such hypothesis. Let me explain again, suppose you have 1 response variable and you have 4 regressor variable X 1, X 2, X 3 and X 4 and you have to fit a model like Y equal beta naught plus beta 1 X 1 plus beta 2 X 2 plus beta 3 X 3 plus beta 4 X 4 plus epsilon. And after fitting the model of this form in a multiple linear regression model involving 4 regressors, you believe you feel that, X 3 and X 4, they are or for example, may be any subset X 1 and X 4, they are not significant, they do not have significant contribution to explain the variability in Y. So, what do you believe is that, you believe that these two say, X 3 and X 4, they are irrelevant for the response variable Y. So, to test this one significant I mean, to test this, whatever you believe in statistically, you need to test the hypothesis like H naught equal to beta 3 beta 4. Even to test, this vector is going to be equal to 0 against the alternative hypothesis that (Refer Time: 34:28) beta 3 beta 4, against alternative hypothesis that beta 3 beta 4 is not equal 0. By not equal to 0 means, at least one of them is not equal to 0. So, this is what, we want to test and this type of hypothesis can be tested using extra sum of square method. (Refer Slide Time: 34:59) In general, how the model Y equal to X beta plus epsilon and my beta is k cross 1 vector, now I want to split this beta into two parts I mean, one is I call it beta 1 and other one is

11 beta 2. So, this beta 1 is again a vector, it has k minus r cross 1 vector and beta 2 is a r cross 1 vector. And you believe that, the last r regressor variables are not significant for Y, using this partition similarly we can divide the matrix X also. X can be also participant to X 1 X 2 and then you can write this model as X 1 beta 1 plus X 2 beta 2 plus epsilon. And the hypothesis we want to test is that, you think that this is enough, you think that Y equal to X 1 beta 1 plus epsilon is enough for Y. That means, what I want to mean by this one is that, here in fact analog thing is that, I am testing beta 2 equal to 0, so beta 2 is a vector, which involves r regression coefficient. So, against the alternative hypothesis H 1, that Y equal to X 1 beta 1 plus X 2 beta 2 plus epsilon, which in another word says that, H 1 is that beta 2 is not equal to 0. So, what you claim that, this we call this one the restricted model, you feel that this restricted model or the first k minus r regressors are enough to explain the variability in Y, you do not need the last r regressors. And the alternative hypothesis is now the last r regressors, r also significant to explain the variability in Y. So, to test this type of hypothesis, we use the extra sum of square technique. So, what we do here is that, we compute SS regression for both the full and restricted model. So, this one is the full model, is involved all the regressors and this is the regressor model, which involve only k minus r minus 1 basically, k minus r minus 1 regressors. So, one thing you have to understand that, SS residual, this one always decreases as the number of regressor variable increases. So, this is the very intuitively, of course it is clear, it says that, the SS residual, this thing decreases as you increase the number of regressor variables, whether it is the newly added regressor variable is relevant for the response variable or not, it does not matter. If you add one more, suppose you have the model with the k regressors, so if you add one more regressors say, if you make k plus 1 regressor variable in the model then SS residual decreases, but if the newly added regressor variable is significant or the very relevant for the model for the response variable then it decreases more, but if it is not that much relevant for the response variable then the SS residual decreases less. The same thing, since SS residual plus SS regression is SS total, which is fixed.

12 In another word I can say that, SS regression increases, as we increase the number of regressor variables. So, again the same statement, the SS regression increases more, if the newly added regressor variable relevant to the response variable, otherwise it increases less. So, this is the basic idea behind the extra sum of square technique, let me compute SS regression. (Refer Slide Time: 42:57) SS regression for the full model first, so SS regression full, we know that SS regression for the full model means, by the full model I mean, Y equal to X beta plus epsilon, so it has all the regressors. So, I know that, the SS regression for the full model is, beta hat prime X prime Y minus n Y bar square, you can refer previous lecture for this one and this has degree of freedom k minus 1. Of course, Y bar nothing but 1 by n summation Y i and also we know that. MS residual for the full model is equal to Y prime Y, this is basically SS residual Y prime Y beta hat prime X prime Y by the degree of freedom, because I wrote MS residual. So, the degree of freedom is n minus k, this is for the full model. Now, under H naught that is, under the restricted model, so the restricted model says that, Y equal to beta 1 X 1 plus epsilon, so we do not have the last r regressors in this model. So, under this restricted model, my SS regression, I said restricted, so this is the rotation SS regression under the restricted model, this is going to be beta 1. The same thing, I will just replace beta hat by beta 1 hat and X by X 1, X 1 dashed Y minus n Y bar square.

13 And this has, here you have not k minus 1 regressors, you have k minus 1 minus r regressors in this model, because we have removed r regressors from this model, so this has degree of freedom k minus 1 minus r. Now, I said that, SS regression increases as the number of regressor variable increases, so that means, the SS regression under the full model is greater than the SS regression under the restricted model. Because, the full model has more regressors variable compared to the restricted model. So, this one you compute SS regression full minus SS regression restricted, this is called the extra sum of square due to beta 2, given that beta 1 is already in the model. So, this is very important, let me explain little bit, I said that, this is the extra sum of square due to beta 2. What is beta 2, beta 2 is the vector, beta we have splitted into two parts, beta 1 and beta 2. So, beta 2 is the regressor, beta 2 is the r cross 1 vector, so if beta 2 is associated with the regressor coefficient for those r regressor variables. This one is, this is regression for the full model, this is regression for the restricted model. So here, you have all the regressor variable, here you have the first k minus 1 minus r regressor variable. So, if you subtract these from here, this will give you the extra regressor sum of square, I should say that. Extra sum of square is basically the extra regression sum of square due to the last k minus 1 minus r regressors. So, I hope you understood, so this is called extra sum of square due to the last k minus 1 minus r regressors, given that this one is on order r cross 1, this one is k minus 1 minus r regressors. So, this is that extra sum of square, this is the extra sum of square due to the last r regressors, given that the first k minus 1 minus r regressors are present in the model. Now, what we can do is that, we can compute the degree of freedom of this one. And the degree of freedom of this one is, see this has degree of freedom k minus 1 and this has the degree of freedom k minus 1 minus r, so this has the degree of freedom r, this minus this one.

14 (Refer Slide Time: 50:36) Now, SS regression for the full model minus SS regression for the restricted model, this is the excess sum of square and this has the degree of freedom r, this by sigma square follows chi square r, because this has degree of freedom r. And SS residual for the full model by sigma square, this follows chi square n minus k and you can check that, they are independent. Now, we are in position to compute the F statistic, which is equal to SS, this extra sum of square SS regression for the full model minus SS regression for the restricted model. You divide this quantity by the degree of freedom r, by the definition of F statistics, so this follows chi square r, so this random variable by r, by this random variable by n minus k. So, SS residual full by n minus k, this thing follows F distribution with the degree of freedom r, n minus k under H naught. So, intuitively it is very clear that, this is the numerator, this portion is the extra regression sum of square due to the last r regressor variable. And if this quantity is more that means, the last r regressors, they have significant contribution in SS regression. That means, they have significant regression, they have significant contribution to explain variability in Y. So, intuitively it is very clear that, if this quantity is large then we are going to reject the null hypothesis. So, the null hypothesis says that, you go for the restricted model, but alternative hypothesis says that, you go for the full model.

15 (Refer Slide Time: 53:44) If this quantity is large then if F is greater than F alpha, r, n minus k then we reject H naught and conclude that, at least one of the regressors in beta 2 is significant. So, I hope you understood extra sum of square, this is very interesting and also very important, that is all for today. Thank you very much.

### Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

### Variance of OLS Estimators and Hypothesis Testing. Randomness in the model. GM assumptions. Notes. Notes. Notes. Charlie Gibbons ARE 212.

Variance of OLS Estimators and Hypothesis Testing Charlie Gibbons ARE 212 Spring 2011 Randomness in the model Considering the model what is random? Y = X β + ɛ, β is a parameter and not random, X may be

### Basic Quantum Mechanics Prof. Ajoy Ghatak Department of Physics. Indian Institute of Technology, Delhi

Basic Quantum Mechanics Prof. Ajoy Ghatak Department of Physics. Indian Institute of Technology, Delhi Module No. # 02 Simple Solutions of the 1 Dimensional Schrodinger Equation Lecture No. # 7. The Free

### " Y. Notation and Equations for Regression Lecture 11/4. Notation:

Notation: Notation and Equations for Regression Lecture 11/4 m: The number of predictor variables in a regression Xi: One of multiple predictor variables. The subscript i represents any number from 1 through

### Introduction to General and Generalized Linear Models

Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby

### Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

### 1. What is the critical value for this 95% confidence interval? CV = z.025 = invnorm(0.025) = 1.96

1 Final Review 2 Review 2.1 CI 1-propZint Scenario 1 A TV manufacturer claims in its warranty brochure that in the past not more than 10 percent of its TV sets needed any repair during the first two years

Lecture 5: Linear least-squares Regression III: Advanced Methods William G. Jacoby Department of Political Science Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Simple Linear Regression

### Simple Linear Regression Inference

Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

### CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY

CHAPTER 11 CHI-SQUARE: NON-PARAMETRIC COMPARISONS OF FREQUENCY The hypothesis testing statistics detailed thus far in this text have all been designed to allow comparison of the means of two or more samples

### DEPARTMENT OF ECONOMICS. Unit ECON 12122 Introduction to Econometrics. Notes 4 2. R and F tests

DEPARTMENT OF ECONOMICS Unit ECON 11 Introduction to Econometrics Notes 4 R and F tests These notes provide a summary of the lectures. They are not a complete account of the unit material. You should also

### Linear Dependence Tests

Linear Dependence Tests The book omits a few key tests for checking the linear dependence of vectors. These short notes discuss these tests, as well as the reasoning behind them. Our first test checks

### Final Exam Practice Problem Answers

Final Exam Practice Problem Answers The following data set consists of data gathered from 77 popular breakfast cereals. The variables in the data set are as follows: Brand: The brand name of the cereal

### One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups

One-Way Analysis of Variance: A Guide to Testing Differences Between Multiple Groups In analysis of variance, the main research question is whether the sample means are from different populations. The

### Review Jeopardy. Blue vs. Orange. Review Jeopardy

Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round \$200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?

### Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur

Probability and Statistics Prof. Dr. Somesh Kumar Department of Mathematics Indian Institute of Technology, Kharagpur Module No. #01 Lecture No. #15 Special Distributions-VI Today, I am going to introduce

### Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Digital System Design Prof. D Roychoudhry Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture - 04 Digital Logic II May, I before starting the today s lecture

### Section 13, Part 1 ANOVA. Analysis Of Variance

Section 13, Part 1 ANOVA Analysis Of Variance Course Overview So far in this course we ve covered: Descriptive statistics Summary statistics Tables and Graphs Probability Probability Rules Probability

### Class 19: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.1)

Spring 204 Class 9: Two Way Tables, Conditional Distributions, Chi-Square (Text: Sections 2.5; 9.) Big Picture: More than Two Samples In Chapter 7: We looked at quantitative variables and compared the

### 1.5 Oneway Analysis of Variance

Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

### Null Hypothesis H 0. The null hypothesis (denoted by H 0

Hypothesis test In statistics, a hypothesis is a claim or statement about a property of a population. A hypothesis test (or test of significance) is a standard procedure for testing a claim about a property

### Example: Boats and Manatees

Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

### Univariate Regression

Univariate Regression Correlation and Regression The regression line summarizes the linear relationship between 2 variables Correlation coefficient, r, measures strength of relationship: the closer r is

### Lecture 5 Hypothesis Testing in Multiple Linear Regression

Lecture 5 Hypothesis Testing in Multiple Linear Regression BIOST 515 January 20, 2004 Types of tests 1 Overall test Test for addition of a single variable Test for addition of a group of variables Overall

### POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression

### Chapter Additional: Standard Deviation and Chi- Square

Chapter Additional: Standard Deviation and Chi- Square Chapter Outline: 6.4 Confidence Intervals for the Standard Deviation 7.5 Hypothesis testing for Standard Deviation Section 6.4 Objectives Interpret

### Regression. Name: Class: Date: Multiple Choice Identify the choice that best completes the statement or answers the question.

Class: Date: Regression Multiple Choice Identify the choice that best completes the statement or answers the question. 1. Given the least squares regression line y8 = 5 2x: a. the relationship between

### Simple Regression Theory II 2010 Samuel L. Baker

SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

### Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario

Statistics 112 Regression Cheatsheet Section 1B - Ryan Rosario I have found that the best way to practice regression is by brute force That is, given nothing but a dataset and your mind, compute everything

### Lecture 2: Simple Linear Regression

DMBA: Statistics Lecture 2: Simple Linear Regression Least Squares, SLR properties, Inference, and Forecasting Carlos Carvalho The University of Texas McCombs School of Business mccombs.utexas.edu/faculty/carlos.carvalho/teaching

### Regression Analysis: A Complete Example

Regression Analysis: A Complete Example This section works out an example that includes all the topics we have discussed so far in this chapter. A complete example of regression analysis. PhotoDisc, Inc./Getty

### Study Guide for the Final Exam

Study Guide for the Final Exam When studying, remember that the computational portion of the exam will only involve new material (covered after the second midterm), that material from Exam 1 will make

### Chapter 8 Introduction to Hypothesis Testing

Chapter 8 Student Lecture Notes 8-1 Chapter 8 Introduction to Hypothesis Testing Fall 26 Fundamentals of Business Statistics 1 Chapter Goals After completing this chapter, you should be able to: Formulate

Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

### ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

### NPTEL STRUCTURAL RELIABILITY

NPTEL Course On STRUCTURAL RELIABILITY Module # 02 Lecture 6 Course Format: Web Instructor: Dr. Arunasis Chakraborty Department of Civil Engineering Indian Institute of Technology Guwahati 6. Lecture 06:

### Instrumental Variables Regression. Instrumental Variables (IV) estimation is used when the model has endogenous s.

Instrumental Variables Regression Instrumental Variables (IV) estimation is used when the model has endogenous s. IV can thus be used to address the following important threats to internal validity: Omitted

### Multiple Hypothesis Testing: The F-test

Multiple Hypothesis Testing: The F-test Matt Blackwell December 3, 2008 1 A bit of review When moving into the matrix version of linear regression, it is easy to lose sight of the big picture and get lost

### Multiple Linear Regression

Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

### Cryptography and Network Security. Prof. D. Mukhopadhyay. Department of Computer Science and Engineering. Indian Institute of Technology, Kharagpur

Cryptography and Network Security Prof. D. Mukhopadhyay Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Module No. # 01 Lecture No. # 12 Block Cipher Standards

### Introduction to Analysis of Variance (ANOVA) Limitations of the t-test

Introduction to Analysis of Variance (ANOVA) The Structural Model, The Summary Table, and the One- Way ANOVA Limitations of the t-test Although the t-test is commonly used, it has limitations Can only

### General Regression Formulae ) (N-2) (1 - r 2 YX

General Regression Formulae Single Predictor Standardized Parameter Model: Z Yi = β Z Xi + ε i Single Predictor Standardized Statistical Model: Z Yi = β Z Xi Estimate of Beta (Beta-hat: β = r YX (1 Standard

### Least Squares Estimation

Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David

### MULTIPLE REGRESSION EXAMPLE

MULTIPLE REGRESSION EXAMPLE For a sample of n = 166 college students, the following variables were measured: Y = height X 1 = mother s height ( momheight ) X 2 = father s height ( dadheight ) X 3 = 1 if

### Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

### Sampling and Hypothesis Testing

Population and sample Sampling and Hypothesis Testing Allin Cottrell Population : an entire set of objects or units of observation of one sort or another. Sample : subset of a population. Parameter versus

### DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9

DEPARTMENT OF PSYCHOLOGY UNIVERSITY OF LANCASTER MSC IN PSYCHOLOGICAL RESEARCH METHODS ANALYSING AND INTERPRETING DATA 2 PART 1 WEEK 9 Analysis of covariance and multiple regression So far in this course,

### Regression step-by-step using Microsoft Excel

Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

### MULTIPLE REGRESSION WITH CATEGORICAL DATA

DEPARTMENT OF POLITICAL SCIENCE AND INTERNATIONAL RELATIONS Posc/Uapp 86 MULTIPLE REGRESSION WITH CATEGORICAL DATA I. AGENDA: A. Multiple regression with categorical variables. Coding schemes. Interpreting

### Notes on Applied Linear Regression

Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:

### Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

### Elementary Statistics Sample Exam #3

Elementary Statistics Sample Exam #3 Instructions. No books or telephones. Only the supplied calculators are allowed. The exam is worth 100 points. 1. A chi square goodness of fit test is considered to

### Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

### 2. What are the theoretical and practical consequences of autocorrelation?

Lecture 10 Serial Correlation In this lecture, you will learn the following: 1. What is the nature of autocorrelation? 2. What are the theoretical and practical consequences of autocorrelation? 3. Since

### e = random error, assumed to be normally distributed with mean 0 and standard deviation σ

1 Linear Regression 1.1 Simple Linear Regression Model The linear regression model is applied if we want to model a numeric response variable and its dependency on at least one numeric factor variable.

### Introduction to Fixed Effects Methods

Introduction to Fixed Effects Methods 1 1.1 The Promise of Fixed Effects for Nonexperimental Research... 1 1.2 The Paired-Comparisons t-test as a Fixed Effects Method... 2 1.3 Costs and Benefits of Fixed

### Chapter 5 Analysis of variance SPSS Analysis of variance

Chapter 5 Analysis of variance SPSS Analysis of variance Data file used: gss.sav How to get there: Analyze Compare Means One-way ANOVA To test the null hypothesis that several population means are equal,

### Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

### NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

### Data Analysis. Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) SS Analysis of Experiments - Introduction

Data Analysis Lecture Empirical Model Building and Methods (Empirische Modellbildung und Methoden) Prof. Dr. Dr. h.c. Dieter Rombach Dr. Andreas Jedlitschka SS 2014 Analysis of Experiments - Introduction

### 1. Complete the sentence with the correct word or phrase. 2. Fill in blanks in a source table with the correct formuli for df, MS, and F.

Final Exam 1. Complete the sentence with the correct word or phrase. 2. Fill in blanks in a source table with the correct formuli for df, MS, and F. 3. Identify the graphic form and nature of the source

### Hypothesis testing - Steps

Hypothesis testing - Steps Steps to do a two-tailed test of the hypothesis that β 1 0: 1. Set up the hypotheses: H 0 : β 1 = 0 H a : β 1 0. 2. Compute the test statistic: t = b 1 0 Std. error of b 1 =

### Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500 6 8480

1) The S & P/TSX Composite Index is based on common stock prices of a group of Canadian stocks. The weekly close level of the TSX for 6 weeks are shown: Week TSX Index 1 8480 2 8470 3 8475 4 8510 5 8500

### Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

### HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

### Inferential Statistics

Inferential Statistics Sampling and the normal distribution Z-scores Confidence levels and intervals Hypothesis testing Commonly used statistical methods Inferential Statistics Descriptive statistics are

### (Refer Slide Time: 2:03)

Control Engineering Prof. Madan Gopal Department of Electrical Engineering Indian Institute of Technology, Delhi Lecture - 11 Models of Industrial Control Devices and Systems (Contd.) Last time we were

### Hypothesis Testing COMP 245 STATISTICS. Dr N A Heard. 1 Hypothesis Testing 2 1.1 Introduction... 2 1.2 Error Rates and Power of a Test...

Hypothesis Testing COMP 45 STATISTICS Dr N A Heard Contents 1 Hypothesis Testing 1.1 Introduction........................................ 1. Error Rates and Power of a Test.............................

### Simple treatment structure

Chapter 3 Simple treatment structure 3.1 Replication of control treatments Suppose that treatment 1 is a control treatment and that treatments 2,..., t are new treatments which we want to compare with

### Outline. Correlation & Regression, III. Review. Relationship between r and regression

Outline Correlation & Regression, III 9.07 4/6/004 Relationship between correlation and regression, along with notes on the correlation coefficient Effect size, and the meaning of r Other kinds of correlation

### Notes for STA 437/1005 Methods for Multivariate Data

Notes for STA 437/1005 Methods for Multivariate Data Radford M. Neal, 26 November 2010 Random Vectors Notation: Let X be a random vector with p elements, so that X = [X 1,..., X p ], where denotes transpose.

### The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information

Chapter 8 The Multiple Regression Model: Hypothesis Tests and the Use of Nonsample Information An important new development that we encounter in this chapter is using the F- distribution to simultaneously

### Multivariate Analysis of Variance (MANOVA): I. Theory

Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the

### Testing for Lack of Fit

Chapter 6 Testing for Lack of Fit How can we tell if a model fits the data? If the model is correct then ˆσ 2 should be an unbiased estimate of σ 2. If we have a model which is not complex enough to fit

### EPS 625 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM

EPS 6 ANALYSIS OF COVARIANCE (ANCOVA) EXAMPLE USING THE GENERAL LINEAR MODEL PROGRAM ANCOVA One Continuous Dependent Variable (DVD Rating) Interest Rating in DVD One Categorical/Discrete Independent Variable

### Calculating P-Values. Parkland College. Isela Guerra Parkland College. Recommended Citation

Parkland College A with Honors Projects Honors Program 2014 Calculating P-Values Isela Guerra Parkland College Recommended Citation Guerra, Isela, "Calculating P-Values" (2014). A with Honors Projects.

### ELEC-E8104 Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems

Stochastics models and estimation, Lecture 3b: Linear Estimation in Static Systems Minimum Mean Square Error (MMSE) MMSE estimation of Gaussian random vectors Linear MMSE estimator for arbitrarily distributed

### Low Power VLSI Circuits and Systems Prof. Pro. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur

Low Power VLSI Circuits and Systems Prof. Pro. Ajit Pal Department of Computer Science and Engineering Indian Institute of Technology, Kharagpur Lecture No. # 14 Pass Transistor Logic Circuits - I Hello

### What is Linear Programming?

Chapter 1 What is Linear Programming? An optimization problem usually has three essential ingredients: a variable vector x consisting of a set of unknowns to be determined, an objective function of x to

### Testing for Granger causality between stock prices and economic growth

MPRA Munich Personal RePEc Archive Testing for Granger causality between stock prices and economic growth Pasquale Foresti 2006 Online at http://mpra.ub.uni-muenchen.de/2962/ MPRA Paper No. 2962, posted

### individualdifferences

1 Simple ANalysis Of Variance (ANOVA) Oftentimes we have more than two groups that we want to compare. The purpose of ANOVA is to allow us to compare group means from several independent samples. In general,

### Solución del Examen Tipo: 1

Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical

### Multivariate normal distribution and testing for means (see MKB Ch 3)

Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................

### MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a

### IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

### 7 Hypothesis testing - one sample tests

7 Hypothesis testing - one sample tests 7.1 Introduction Definition 7.1 A hypothesis is a statement about a population parameter. Example A hypothesis might be that the mean age of students taking MAS113X

### Factor Analysis. Chapter 420. Introduction

Chapter 420 Introduction (FA) is an exploratory technique applied to a set of observed variables that seeks to find underlying factors (subsets of variables) from which the observed variables were generated.

### Lesson 1: Comparison of Population Means Part c: Comparison of Two- Means

Lesson : Comparison of Population Means Part c: Comparison of Two- Means Welcome to lesson c. This third lesson of lesson will discuss hypothesis testing for two independent means. Steps in Hypothesis

### Concepts of Experimental Design

Design Institute for Six Sigma A SAS White Paper Table of Contents Introduction...1 Basic Concepts... 1 Designing an Experiment... 2 Write Down Research Problem and Questions... 2 Define Population...

### COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared. jn2@ecs.soton.ac.uk

COMP6053 lecture: Relationship between two variables: correlation, covariance and r-squared jn2@ecs.soton.ac.uk Relationships between variables So far we have looked at ways of characterizing the distribution

### 1 Sufficient statistics

1 Sufficient statistics A statistic is a function T = rx 1, X 2,, X n of the random sample X 1, X 2,, X n. Examples are X n = 1 n s 2 = = X i, 1 n 1 the sample mean X i X n 2, the sample variance T 1 =

### ANOVA Analysis of Variance

ANOVA Analysis of Variance What is ANOVA and why do we use it? Can test hypotheses about mean differences between more than 2 samples. Can also make inferences about the effects of several different IVs,

### SIMPLE LINEAR CORRELATION. r can range from -1 to 1, and is independent of units of measurement. Correlation can be done on two dependent variables.

SIMPLE LINEAR CORRELATION Simple linear correlation is a measure of the degree to which two variables vary together, or a measure of the intensity of the association between two variables. Correlation

### Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

### Electronics for Analog Signal Processing - II Prof. K. Radhakrishna Rao Department of Electrical Engineering Indian Institute of Technology Madras

Electronics for Analog Signal Processing - II Prof. K. Radhakrishna Rao Department of Electrical Engineering Indian Institute of Technology Madras Lecture - 18 Wideband (Video) Amplifiers In the last class,

### Hypothesis Testing for Beginners

Hypothesis Testing for Beginners Michele Piffer LSE August, 2011 Michele Piffer (LSE) Hypothesis Testing for Beginners August, 2011 1 / 53 One year ago a friend asked me to put down some easy-to-read notes

### Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship