2. Linear regression with multiple regressors

Save this PDF as:

Size: px
Start display at page:

Download "2. Linear regression with multiple regressors"

Transcription

1 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions in the multiple regression model Violations of the assumptions (omitted-variable bias, multicollinearity, heteroskedasticity, autocorrelation) 5

2 2.1. The multiple regression model Intuition: A regression model specifies a functional (parametric) relationship between a dependent (endogenous) variable Y and a set of k independent (exogenous) regressors X 1, X 2,..., X k In a first step, we consider the linear multiple regression model 6

3 Definition 2.1: (Multiple linear regression model) The multiple (linear) regression model is given by Y i = β 0 + β 1 X 1i + β 2 X 2i β k X ki + u i, (2.1) i = 1,..., n, where Y i is the i th observation on the dependent variable, X 1i, X 2i,..., X ki are the i th regressors, u i is the stochastic error term. observations on each of the k The population regression line is the relationship that holds between Y and the X s on average: E(Y i X 1i = x 1, X 2i = x 2,..., X ki = x k ) = β 0 +β 1 x β k x k. 7

4 Meaning of the coefficients: The intercept β 0 is the expected value of Y i (for all i = 1,..., n) when all X-regressors equal 0 β 1,..., β k are the slope coefficients on the respective regressors X 1,..., X k β 1, for example, is the expected change in Y i resulting from changing X 1i by one unit, holding constant X 2i,..., X ki (and analogously β 2,..., β k ) Definition 2.2: (Homoskedasticity, Heteroskedasticity) The error term u i is called homoskedastic if the conditional variance of u i given X 1i,..., X ki, Var(u i X 1i,..., X ki ), is constant for i = 1,..., n and does not depend on the values of X 1i,..., X ki. Otherwise, the error term is called heteroskedastic. 8

5 Example 1: (Student performance) Regression of student performance (Y ) in n = 420 USdistricts on distinct school characteristics (factors) Y i : average test score in the i th district (TEST SCORE) X 1i : average class size in the i th district (measured by the student-teacher ratio, STR) X 2i : percentage of English learners in the i th district (PCTEL) Expected signs of the coefficients: β 1 < 0 β 2 < 0 9

6 Example 2: (House prices) Regression of house prices (Y ) recorded for n = 546 houses sold in Windsor (Canada) on distinct housing characteristics Y i : sale price (in Canadian dollars) of the i th house (SALEPRICE) X 1i : lot size (in square feet) of the i th property (LOTSIZE) X 2i : number of bedrooms in the i th house (BEDROOMS) X 3i : number of bathrooms in the i th house (BATHROOMS) X 4i : number of storeys (excluding the basement) in the i th house (STOREYS) Expected signs of the coefficients: β 1, β 2, β 3, β 4 > 0 10

7 2.2. The OLS estimator in multiple regression Now: Estimation of the coefficients β 0, β 1,..., β k in the multiple regression model on the basis of n observations by applying the Ordinary Least Squares (OLS) technique Idea: Let b 0, b 1,..., b k be estimators of β 0, β 1,..., β k We can predict Y i by b 0 + b 1 X 1i b k X ki The prediction error is Y i b 0 b 1 X 1i... b k X ki 11

8 Idea: [continued] The sum of the squared prediction errors over all n observations is n i=1 (Y i b 0 b 1 X 1i... b k X ki ) 2 (2.2) Definition 2.3: (OLS estimators, predicted values, residuals) The OLS estimators ˆβ 0, ˆβ 1,..., ˆβ k are the values of b 0, b 1,..., b k that minimize the sum of squared prediction errors (2.2). The OLS predicted values Ŷ i and residuals û i (for i = 1,..., n) are and Ŷ i = ˆβ 0 + ˆβ 1 X 1i ˆβ k X ki (2.3) û i = Y i Ŷ i. (2.4) 12

9 Remarks: The OLS estimators ˆβ 0, ˆβ 1,..., ˆβ k and the residuals û i are computed from a sample of n observations of (X 1i,..., X ki, Y i ) for i = 1,..., n They are estimators of the unknown true population coefficients β 0, β 1,..., β k and u i There are closed-form formulas for calculating the OLS estimates from the data (see the lectures Econometrics I+II) In this lecture, we use the software-package EViews 13

10 Regression estimation results (EViews) for the student-performance dataset Dependent Variable: TEST_SCORE Method: Least Squares Date: 07/02/12 Time: 16:29 Sample: Included observations: 420 Variable Coefficient Std. Error t-statistic Prob. C STR PCTEL R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic)

11 Predicted values Ŷ i and residuals û i for the student-performance dataset Residual Actual Fitted 15

12 Regression estimation results (EViews) for the house-prices dataset Dependent Variable: SALEPRICE Method: Least Squares Date: 07/02/12 Time: 16:50 Sample: Included observations: 546 Variable Coefficient Std. Error t-statistic Prob. C LOTSIZE BEDROOMS BATHROOMS STOREYS R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid 1.80E+11 Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic)

13 Predicted values Ŷ i and residuals û i for the house-prices dataset 200, , , ,000 80,000 40,000 80,000 40, ,000-80, Residual Actual Fitted 17

14 OLS assumptions in the multiple regression model (2.1): 1. u i has conditional mean zero given X 1i, X 2i,..., X ki : E(u i X 1i, X 2i,..., X ki ) = 0 2. (X 1i, X 2i,..., X ki, Y i ), i = 1,..., n, are independently and identically distributed (i.i.d.) draws from their joint distribution 3. Large outliers are unlikely: X 1i, X 2i,..., X ki and Y i have nonzero finite fourth moments 4. There is no perfect multicollinearity Remarks: Note that we do not assume any specific parametric distribution for the u i The OLS assumptions imply specific distribution results 18

15 Theorem 2.4: (Unbiasedness, consistency, normality) Given the OLS assumptions the following properties of the OLS estimators ˆβ 0, ˆβ 1,..., ˆβ k hold: 1. ˆβ 0, ˆβ 1,..., ˆβ k are unbiased estimators of β 0,..., β k. 2. ˆβ 0, ˆβ 1,..., ˆβ k are consistent estimators of β 0,..., β k. (Convergence in probability) 3. In large samples ˆβ 0, ˆβ 1,..., ˆβ k are jointly normally distributed and each single OLS estimator ˆβ j, j = 0,..., k, is normally distributed with mean β j and variance σ 2ˆβ j, that is ˆβ j N(β j, σ 2ˆβ j ). 19

16 Remarks: In general, the OLS estimators are correlated This correlation among ˆβ 0, ˆβ 1,..., ˆβ k arises from the correlation among the regressors X 1,..., X k The sampling distribution of the OLS estimators will become relevant in Section 3 (hypothesis-testing, confidence intervals) 20

17 2.3. Measures-of-fit in multiple regression Now: Three well-known summary statistics that measure how well the OLS estimates fit the data Standard error of regression (SER): The SER estimates the standard deviation of the error term u i (under the assumption of homoskedasticity): SER = 1 n k 1 n û 2 i i=1 21

18 Standard error of regression: [continued] We denote the sum of squared residuals by SSR n i=1 û 2 i so that SER = SSR n k 1 Given the OLS assumptions and homoskedasticity the squared SER, (SER) 2, is an unbiased estimator of the unknown constant variance of the u i SER is a measure of the spread of the distribution of Y i around the population regression line Both measures, SER and SSR, are reported in the EViews regression output 22

19 R 2 : The R 2 is the fraction of the sample variance of the Y i explained by the regressors Equivalently, the R 2 is 1 minus the fraction of the variance of the Y i not explained by the regressors (i.e. explained by the residuals) Denoting the explained sum of squares (ESS) and the total sum of squares (TSS) by ESS = n i=1 (Ŷ i Ȳ ) 2 and TSS = respectively, we define the R 2 as R 2 = ESS TSS = 1 SSR TSS n i=1 (Y i Ȳ ) 2, 23

20 R 2 : [continued] In multiple regression, the R 2 increases whenever an additional regressor X k+1 is added to the regression model, unless the estimated coefficient ˆβ k+1 is exactly equal to zero Since in practice it is extremely unusual to have exactly ˆβ k+1 = 0, the R 2 generally increases (and never decreases) when an new regressor is added to the regression model An increase in the R 2 due to the inclusion of a new regressor does not necessarily indicate an actually improved fit of the model 24

21 Adjusted R 2 : The adjusted R 2 (in symbols: R 2 ), deflates the conventional R 2 : R 2 = 1 n 1 SSR n k 1TSS It is always true that R 2 < R 2 (why?) When adding a new regressor X k+1 to the model, the R 2 can increase or decrease (why?) The R 2 can be negative (why?) 25

22 2.4. Omitted-variable bias Now: Discussion of a phenomenon that implies violation of the first OLS assumption on Slide 18 This issue is known under the phrasing omitted-variable bias and is extremely relevant in practice Although theoretically easy to grasp, avoiding this specification problem turns out to be a nontrivial task in many empirical applications 26

23 Definition 2.5: (Omitted-variable bias) Consider the multiple regression model in Definition 2.1 on Slide 7. Omitted-variable bias is the bias in the OLS estimator ˆβ j of the coefficient β j (for j = 1,..., k) that arises when the associated regressor X j is correlated with an omitted variable. More precisely, for omitted-variable bias to occur, the following two conditions must hold: 1. X j is correlated with the omitted variable. 2. The omitted variable is a determinant of the dependent variable Y. 27

24 Example: Consider the house-prices dataset (Slides 16, 17) Using the entire set of regressors, we obtain the OLS estimate ˆβ 2 = for the BEDROOMS-coefficient The correlation coefficients between the regressors are as follows: BEDROOMS BATHROOMS LOTSIZE STOREYS BEDROOMS BATHROOMS LOTSIZE STOREYS

25 Example: [continued] There is positive (significant) correlation between the variable BEDROOMS and all other regressors Excluding the other variables from the regression yields the following OLS-estimates: Dependent Variable: SALEPRICE Method: Least Squares Date: 14/02/12 Time: 16:10 Sample: Included observations: 546 Variable Coefficient Std. Error t-statistic Prob. C BEDROOMS R-squared Mean dependent var Adjusted R-squared S.D. dependent var S.E. of regression Akaike info criterion Sum squared resid 3.36E+11 Schwarz criterion Log likelihood Hannan-Quinn criter F-statistic Durbin-Watson stat Prob(F-statistic) The alternative OLS-estimates of the BEDROOMS-coefficient differ substantially 29

26 Intuitive explanation of the omitted-variable bias: Consider the variable LOTSIZE as omitted LOTSIZE is an important variable for explaining SALEPRICE If we omit LOTSIZE in the regression, it will try to enter in the only way it can, namely through its positive correlation with the included variable BEDROOMS The coefficient on BEDROOMS will confound the effect of BED- ROOMS and LOTSIZE on SALEPRICE 30

27 More formal explanation: Omitted-variable bias means that the first OLS assumption on Slide 18 is violated Reasoning: In the multiple regression model the error term u i represents all factors other than the included regressors X 1,..., X k that are determinants of Y i If an omitted variable is correlated with at least one of the included regressors X 1,..., X k, then u i (which contains this factor) is correlated with the set of regressors This implies that E(u i X 1i,..., X ki ) 0 31

28 Important result: In the case of omitted-variable bias the OLS estimators on the corresponding included regressors are biased in finite samples this bias does not vanish in large samples the OLS estimators are inconsistent Solutions to omitted-variable bias: To be discussed in Section 5 32

29 2.5. Multicollinearity Definition 2.6: (Perfect multicollinearity) Consider the multiple regression model in Definition 2.1 on Slide 7. The regressors X 1,..., X k are said to be perfectly multicollinear if one of the regressors is a perfect linear function of the other regressors. Remarks: Under perfect multicollinearity the OLS estimates cannot be calculated due to division by zero in the OLS formulas Perfect multicollinearity often reflects a logical mistake in choosing the regressors or some unrecognized feature in the data set 33

30 Example: (Dummy variable trap) Consider the student-performance dataset Suppose we partition the school districts into the 3 categories (1) rural, (2) suburban, (3) urban We represent the categories by the dummy regressors { 1 if district i is rural RURAL i = 0 otherwise and by SUBURBAN i and URBAN i analogously defined Since each district belongs to one and only one category, we have for each district i: RURAL i + SUBURBAN i + URBAN i = 1 34

31 Example: [continued] Now, let us define the constant regressor X 0 associated with the intercept coefficient β 0 in the multiple regression model on Slide 7 by X 0i 1 for i = 1,... n Then, for i = 1,..., n, the following relationship holds among the regressors: Perfect multicollinearity X 0i = RURAL i + SUBURBAN i + URBAN i To estimate the regression we must exclude either one of the dummy regressors or the constant regressor X 0 (the intercept β 0 ) from the regression 35

32 Theorem 2.7: (Dummy variable trap) Let there be G different categories in the data set represented by G dummy regressors. If 1. each observation i falls into one and only one category, 2. there is an intercept (constant regressor) in the regression, 3. all G dummy regressors are included as regressors, then regression estimation fails because of perfect multicollinearity. Usual remedy: Exclude one of the dummy regressors (G 1 dummy regressors are sufficient) 36

33 Definition 2.8: (Imperfect multicollinearity) Consider the multiple regression model in Definition 2.1 on Slide 7. The regressors X 1,..., X k are said to be imperfectly multicollinear if two or more of the regressors are highly correlated in the sense that there is a linear function of the regressors that is highly correlated with another regressor. Remarks: Imperfect multicollinearity does not pose any (numeric) problems in calculating OLS estimates However, if regressors are imperfectly multicollinear, then the coefficients on at least one individual regressor will be imprecisely estimated 37

34 Remarks: [continued] Techniques for identifying and mitigating imperfect multicollinearity are presented in econometric textbooks (e.g. Hill et al., 2010, pp ) 38

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

Nonlinear Regression Functions. SW Ch 8 1/54/

Nonlinear Regression Functions. SW Ch 8 1/54/ Nonlinear Regression Functions SW Ch 8 1/54/ The TestScore STR relation looks linear (maybe) SW Ch 8 2/54/ But the TestScore Income relation looks nonlinear... SW Ch 8 3/54/ Nonlinear Regression General

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Forecasting the US Dollar / Euro Exchange rate Using ARMA Models

Forecasting the US Dollar / Euro Exchange rate Using ARMA Models Forecasting the US Dollar / Euro Exchange rate Using ARMA Models LIUWEI (9906360) - 1 - ABSTRACT...3 1. INTRODUCTION...4 2. DATA ANALYSIS...5 2.1 Stationary estimation...5 2.2 Dickey-Fuller Test...6 3.

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Air passenger departures forecast models A technical note

Air passenger departures forecast models A technical note Ministry of Transport Air passenger departures forecast models A technical note By Haobo Wang Financial, Economic and Statistical Analysis Page 1 of 15 1. Introduction Sine 1999, the Ministry of Business,

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

UK GDP is the best predictor of UK GDP, literally.

UK GDP is the best predictor of UK GDP, literally. UK GDP IS THE BEST PREDICTOR OF UK GDP, LITERALLY ERIK BRITTON AND DANNY GABAY 6 NOVEMBER 2009 UK GDP is the best predictor of UK GDP, literally. The ONS s preliminary estimate of UK GDP for the third

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

On the Degree of Openness of an Open Economy Carlos Alfredo Rodriguez, Universidad del CEMA Buenos Aires, Argentina

On the Degree of Openness of an Open Economy Carlos Alfredo Rodriguez, Universidad del CEMA Buenos Aires, Argentina On the Degree of Openness of an Open Economy Carlos Alfredo Rodriguez, Universidad del CEMA Buenos Aires, Argentina car@cema.edu.ar www.cema.edu.ar\~car Version1-February 14,2000 All data can be consulted

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

The relationship between stock market parameters and interbank lending market: an empirical evidence

The relationship between stock market parameters and interbank lending market: an empirical evidence Magomet Yandiev Associate Professor, Department of Economics, Lomonosov Moscow State University mag2097@mail.ru Alexander Pakhalov, PG student, Department of Economics, Lomonosov Moscow State University

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION

HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HYPOTHESIS TESTING: CONFIDENCE INTERVALS, T-TESTS, ANOVAS, AND REGRESSION HOD 2990 10 November 2010 Lecture Background This is a lightning speed summary of introductory statistical methods for senior undergraduate

More information

Solución del Examen Tipo: 1

Solución del Examen Tipo: 1 Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical

More information

IMPACT OF WORKING CAPITAL MANAGEMENT ON PROFITABILITY

IMPACT OF WORKING CAPITAL MANAGEMENT ON PROFITABILITY IMPACT OF WORKING CAPITAL MANAGEMENT ON PROFITABILITY Hina Agha, Mba, Mphil Bahria University Karachi Campus, Pakistan Abstract The main purpose of this study is to empirically test the impact of working

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

The Impact of Privatization in Insurance Industry on Insurance Efficiency in Iran

The Impact of Privatization in Insurance Industry on Insurance Efficiency in Iran The Impact of Privatization in Insurance Industry on Insurance Efficiency in Iran Shahram Gilaninia 1, Hosein Ganjinia, Azadeh Asadian 3 * 1. Department of Industrial Management, Islamic Azad University,

More information

Regression Analysis (Spring, 2000)

Regression Analysis (Spring, 2000) Regression Analysis (Spring, 2000) By Wonjae Purposes: a. Explaining the relationship between Y and X variables with a model (Explain a variable Y in terms of Xs) b. Estimating and testing the intensity

More information

The Effect of Seasonality in the CPI on Indexed Bond Pricing and Inflation Expectations

The Effect of Seasonality in the CPI on Indexed Bond Pricing and Inflation Expectations The Effect of Seasonality in the CPI on Indexed Bond Pricing and Inflation Expectations Roy Stein* *Research Department, Roy Stein roy.stein@boi.org.il, tel: 02-6552559 This research was partially supported

More information

5. Multiple regression

5. Multiple regression 5. Multiple regression QBUS6840 Predictive Analytics https://www.otexts.org/fpp/5 QBUS6840 Predictive Analytics 5. Multiple regression 2/39 Outline Introduction to multiple linear regression Some useful

More information

Competition as an Effective Tool in Developing Social Marketing Programs: Driving Behavior Change through Online Activities

Competition as an Effective Tool in Developing Social Marketing Programs: Driving Behavior Change through Online Activities Competition as an Effective Tool in Developing Social Marketing Programs: Driving Behavior Change through Online Activities Corina ŞERBAN 1 ABSTRACT Nowadays, social marketing practices represent an important

More information

Econometric Principles and Data Analysis

Econometric Principles and Data Analysis Econometric Principles and Data Analysis product: 4339 course code: c230 c330 Econometric Principles and Data Analysis Centre for Financial and Management Studies SOAS, University of London 1999, revised

More information

Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software

Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software STATA Tutorial Professor Erdinç Please follow the directions once you locate the Stata software in your computer. Room 114 (Business Lab) has computers with Stata software 1.Wald Test Wald Test is used

More information

Regression with a Binary Dependent Variable

Regression with a Binary Dependent Variable Regression with a Binary Dependent Variable Chapter 9 Michael Ash CPPA Lecture 22 Course Notes Endgame Take-home final Distributed Friday 19 May Due Tuesday 23 May (Paper or emailed PDF ok; no Word, Excel,

More information

Determinants of Stock Market Performance in Pakistan

Determinants of Stock Market Performance in Pakistan Determinants of Stock Market Performance in Pakistan Mehwish Zafar Sr. Lecturer Bahria University, Karachi campus Abstract Stock market performance, economic and political condition of a country is interrelated

More information

Regression step-by-step using Microsoft Excel

Regression step-by-step using Microsoft Excel Step 1: Regression step-by-step using Microsoft Excel Notes prepared by Pamela Peterson Drake, James Madison University Type the data into the spreadsheet The example used throughout this How to is a regression

More information

August 2012 EXAMINATIONS Solution Part I

August 2012 EXAMINATIONS Solution Part I August 01 EXAMINATIONS Solution Part I (1) In a random sample of 600 eligible voters, the probability that less than 38% will be in favour of this policy is closest to (B) () In a large random sample,

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially

More information

Source engine marketing: A preliminary empirical analysis of web search data

Source engine marketing: A preliminary empirical analysis of web search data Source engine marketing: A preliminary empirical analysis of web search data ABSTRACT Bruce Q. Budd Alfaisal University The purpose of this paper is to empirically investigate a website performance and

More information

Estimation of σ 2, the variance of ɛ

Estimation of σ 2, the variance of ɛ Estimation of σ 2, the variance of ɛ The variance of the errors σ 2 indicates how much observations deviate from the fitted surface. If σ 2 is small, parameters β 0, β 1,..., β k will be reliably estimated

More information

European Journal of Business and Management ISSN 2222-1905 (Paper) ISSN 2222-2839 (Online) Vol.5, No.30, 2013

European Journal of Business and Management ISSN 2222-1905 (Paper) ISSN 2222-2839 (Online) Vol.5, No.30, 2013 The Impact of Stock Market Liquidity on Economic Growth in Jordan Shatha Abdul-Khaliq Assistant Professor,AlBlqa Applied University, Jordan * E-mail of the corresponding author: yshatha@gmail.com Abstract

More information

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries

Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association

More information

Integrated Resource Plan

Integrated Resource Plan Integrated Resource Plan March 19, 2004 PREPARED FOR KAUA I ISLAND UTILITY COOPERATIVE LCG Consulting 4962 El Camino Real, Suite 112 Los Altos, CA 94022 650-962-9670 1 IRP 1 ELECTRIC LOAD FORECASTING 1.1

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Multiple Linear Regression

Multiple Linear Regression Multiple Linear Regression A regression with two or more explanatory variables is called a multiple regression. Rather than modeling the mean response as a straight line, as in simple regression, it is

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

Chapter 6: Multivariate Cointegration Analysis

Chapter 6: Multivariate Cointegration Analysis Chapter 6: Multivariate Cointegration Analysis 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie VI. Multivariate Cointegration

More information

Forecasting Using Eviews 2.0: An Overview

Forecasting Using Eviews 2.0: An Overview Forecasting Using Eviews 2.0: An Overview Some Preliminaries In what follows it will be useful to distinguish between ex post and ex ante forecasting. In terms of time series modeling, both predict values

More information

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011

Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Chicago Booth BUSINESS STATISTICS 41000 Final Exam Fall 2011 Name: Section: I pledge my honor that I have not violated the Honor Code Signature: This exam has 34 pages. You have 3 hours to complete this

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

The Relationship between Life Insurance and Economic Growth: Evidence from India

The Relationship between Life Insurance and Economic Growth: Evidence from India Global Journal of Management and Business Studies. ISSN 2248-9878 Volume 3, Number 4 (2013), pp. 413-422 Research India Publications http://www.ripublication.com/gjmbs.htm The Relationship between Life

More information

Understanding Retention among Private Baccalaureate Liberal Arts Colleges

Understanding Retention among Private Baccalaureate Liberal Arts Colleges Understanding Retention among Private Baccalaureate Liberal Arts Colleges Thursday April 19, 2012 Author: Katherine S. Hanson 1 Abstract This paper attempts to analyze the explanatory variables that best

More information

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS

Ridge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

More information

Part 2: Analysis of Relationship Between Two Variables

Part 2: Analysis of Relationship Between Two Variables Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable

More information

Quick Stata Guide by Liz Foster

Quick Stata Guide by Liz Foster by Liz Foster Table of Contents Part 1: 1 describe 1 generate 1 regress 3 scatter 4 sort 5 summarize 5 table 6 tabulate 8 test 10 ttest 11 Part 2: Prefixes and Notes 14 by var: 14 capture 14 use of the

More information

Price volatility in the silver spot market: An empirical study using Garch applications

Price volatility in the silver spot market: An empirical study using Garch applications Price volatility in the silver spot market: An empirical study using Garch applications ABSTRACT Alan Harper, South University Zhenhu Jin Valparaiso University Raufu Sokunle UBS Investment Bank Manish

More information

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition)

INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) INDIRECT INFERENCE (prepared for: The New Palgrave Dictionary of Economics, Second Edition) Abstract Indirect inference is a simulation-based method for estimating the parameters of economic models. Its

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

17. SIMPLE LINEAR REGRESSION II

17. SIMPLE LINEAR REGRESSION II 17. SIMPLE LINEAR REGRESSION II The Model In linear regression analysis, we assume that the relationship between X and Y is linear. This does not mean, however, that Y can be perfectly predicted from X.

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Clustering in the Linear Model

Clustering in the Linear Model Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple

More information

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052)

Department of Economics Session 2012/2013. EC352 Econometric Methods. Solutions to Exercises from Week 10 + 0.0077 (0.052) Department of Economics Session 2012/2013 University of Essex Spring Term Dr Gordon Kemp EC352 Econometric Methods Solutions to Exercises from Week 10 1 Problem 13.7 This exercise refers back to Equation

More information

Uniwersytet Ekonomiczny

Uniwersytet Ekonomiczny Uniwersytet Ekonomiczny George Matysiak Introduction to modelling & forecasting December 15 th, 2014 Agenda Modelling and forecasting - Models Approaches towards modelling and forecasting Forecasting commercial

More information

FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits

FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits Technical Paper Series Congressional Budget Office Washington, DC FORECASTING DEPOSIT GROWTH: Forecasting BIF and SAIF Assessable and Insured Deposits Albert D. Metz Microeconomic and Financial Studies

More information

Financial Risk Management Exam Sample Questions/Answers

Financial Risk Management Exam Sample Questions/Answers Financial Risk Management Exam Sample Questions/Answers Prepared by Daniel HERLEMONT 1 2 3 4 5 6 Chapter 3 Fundamentals of Statistics FRM-99, Question 4 Random walk assumes that returns from one time period

More information

Correlation of International Stock Markets Before and During the Subprime Crisis

Correlation of International Stock Markets Before and During the Subprime Crisis 173 Correlation of International Stock Markets Before and During the Subprime Crisis Ioana Moldovan 1 Claudia Medrega 2 The recent financial crisis has spread to markets worldwide. The correlation of evolutions

More information

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION

EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY MULTIPLE REGRESSION IN ACTION EDUCATION AND VOCABULARY 5-10 hours of input weekly is enough to pick up a new language (Schiff & Myers, 1988). Dutch children spend 5.5 hours/day

More information

Review of Bivariate Regression

Review of Bivariate Regression Review of Bivariate Regression A.Colin Cameron Department of Economics University of California - Davis accameron@ucdavis.edu October 27, 2006 Abstract This provides a review of material covered in an

More information

Premaster Statistics Tutorial 4 Full solutions

Premaster Statistics Tutorial 4 Full solutions Premaster Statistics Tutorial 4 Full solutions Regression analysis Q1 (based on Doane & Seward, 4/E, 12.7) a. Interpret the slope of the fitted regression = 125,000 + 150. b. What is the prediction for

More information

II. DISTRIBUTIONS distribution normal distribution. standard scores

II. DISTRIBUTIONS distribution normal distribution. standard scores Appendix D Basic Measurement And Statistics The following information was developed by Steven Rothke, PhD, Department of Psychology, Rehabilitation Institute of Chicago (RIC) and expanded by Mary F. Schmidt,

More information

How Far is too Far? Statistical Outlier Detection

How Far is too Far? Statistical Outlier Detection How Far is too Far? Statistical Outlier Detection Steven Walfish President, Statistical Outsourcing Services steven@statisticaloutsourcingservices.com 30-325-329 Outline What is an Outlier, and Why are

More information

Directions for using SPSS

Directions for using SPSS Directions for using SPSS Table of Contents Connecting and Working with Files 1. Accessing SPSS... 2 2. Transferring Files to N:\drive or your computer... 3 3. Importing Data from Another File Format...

More information

From the help desk: Bootstrapped standard errors

From the help desk: Bootstrapped standard errors The Stata Journal (2003) 3, Number 1, pp. 71 80 From the help desk: Bootstrapped standard errors Weihua Guan Stata Corporation Abstract. Bootstrapping is a nonparametric approach for evaluating the distribution

More information

Chapter 7: Simple linear regression Learning Objectives

Chapter 7: Simple linear regression Learning Objectives Chapter 7: Simple linear regression Learning Objectives Reading: Section 7.1 of OpenIntro Statistics Video: Correlation vs. causation, YouTube (2:19) Video: Intro to Linear Regression, YouTube (5:18) -

More information

4. Multiple Regression in Practice

4. Multiple Regression in Practice 30 Multiple Regression in Practice 4. Multiple Regression in Practice The preceding chapters have helped define the broad principles on which regression analysis is based. What features one should look

More information

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become

More information

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares

Outline. Topic 4 - Analysis of Variance Approach to Regression. Partitioning Sums of Squares. Total Sum of Squares. Partitioning sums of squares Topic 4 - Analysis of Variance Approach to Regression Outline Partitioning sums of squares Degrees of freedom Expected mean squares General linear test - Fall 2013 R 2 and the coefficient of correlation

More information

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series.

The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Cointegration The VAR models discussed so fare are appropriate for modeling I(0) data, like asset returns or growth rates of macroeconomic time series. Economic theory, however, often implies equilibrium

More information

Time Series Analysis

Time Series Analysis Time Series Analysis Forecasting with ARIMA models Andrés M. Alonso Carolina García-Martos Universidad Carlos III de Madrid Universidad Politécnica de Madrid June July, 2012 Alonso and García-Martos (UC3M-UPM)

More information

Wooldridge, Introductory Econometrics, 4th ed. Chapter 10: Basic regression analysis with time series data

Wooldridge, Introductory Econometrics, 4th ed. Chapter 10: Basic regression analysis with time series data Wooldridge, Introductory Econometrics, 4th ed. Chapter 10: Basic regression analysis with time series data We now turn to the analysis of time series data. One of the key assumptions underlying our analysis

More information

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction

More information

3.2 Measures of Spread

3.2 Measures of Spread 3.2 Measures of Spread In some data sets the observations are close together, while in others they are more spread out. In addition to measures of the center, it's often important to measure the spread

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

Moderator and Mediator Analysis

Moderator and Mediator Analysis Moderator and Mediator Analysis Seminar General Statistics Marijtje van Duijn October 8, Overview What is moderation and mediation? What is their relation to statistical concepts? Example(s) October 8,

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

Causal Forecasting Models

Causal Forecasting Models CTL.SC1x -Supply Chain & Logistics Fundamentals Causal Forecasting Models MIT Center for Transportation & Logistics Causal Models Used when demand is correlated with some known and measurable environmental

More information

PARTNERSHIP IN SOCIAL MARKETING PROGRAMS. SOCIALLY RESPONSIBLE COMPANIES AND NON-PROFIT ORGANIZATIONS ENGAGEMENT IN SOLVING SOCIETY S PROBLEMS

PARTNERSHIP IN SOCIAL MARKETING PROGRAMS. SOCIALLY RESPONSIBLE COMPANIES AND NON-PROFIT ORGANIZATIONS ENGAGEMENT IN SOLVING SOCIETY S PROBLEMS PARTNERSHIP IN SOCIAL MARKETING PROGRAMS. SOCIALLY RESPONSIBLE COMPANIES AND NON-PROFIT ORGANIZATIONS ENGAGEMENT IN SOLVING SOCIETY S PROBLEMS Corina Şerban The Bucharest Academy of Economic Studies, Romania

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

1.5 Oneway Analysis of Variance

1.5 Oneway Analysis of Variance Statistics: Rosie Cornish. 200. 1.5 Oneway Analysis of Variance 1 Introduction Oneway analysis of variance (ANOVA) is used to compare several means. This method is often used in scientific or medical experiments

More information

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved

problem arises when only a non-random sample is available differs from censored regression model in that x i is also unobserved 4 Data Issues 4.1 Truncated Regression population model y i = x i β + ε i, ε i N(0, σ 2 ) given a random sample, {y i, x i } N i=1, then OLS is consistent and efficient problem arises when only a non-random

More information

SPSS Guide: Regression Analysis

SPSS Guide: Regression Analysis SPSS Guide: Regression Analysis I put this together to give you a step-by-step guide for replicating what we did in the computer lab. It should help you run the tests we covered. The best way to get familiar

More information

Chapter 3 Quantitative Demand Analysis

Chapter 3 Quantitative Demand Analysis Managerial Economics & Business Strategy Chapter 3 uantitative Demand Analysis McGraw-Hill/Irwin Copyright 2010 by the McGraw-Hill Companies, Inc. All rights reserved. Overview I. The Elasticity Concept

More information

MARKETING COMMUNICATION IN ONLINE SOCIAL PROGRAMS: OHANIAN MODEL OF SOURCE CREDIBILITY

MARKETING COMMUNICATION IN ONLINE SOCIAL PROGRAMS: OHANIAN MODEL OF SOURCE CREDIBILITY MARKETING COMMUNICATION IN ONLINE SOCIAL PROGRAMS: OHANIAN MODEL OF SOURCE CREDIBILITY Serban Corina The Bucharest Academy of Economic Studies The Faculty of Marketing The development of the Internet as

More information

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection

Stepwise Regression. Chapter 311. Introduction. Variable Selection Procedures. Forward (Step-Up) Selection Chapter 311 Introduction Often, theory and experience give only general direction as to which of a pool of candidate variables (including transformed variables) should be included in the regression model.

More information

Integrating Financial Statement Modeling and Sales Forecasting

Integrating Financial Statement Modeling and Sales Forecasting Integrating Financial Statement Modeling and Sales Forecasting John T. Cuddington, Colorado School of Mines Irina Khindanova, University of Denver ABSTRACT This paper shows how to integrate financial statement

More information

A General Approach to Variance Estimation under Imputation for Missing Survey Data

A General Approach to Variance Estimation under Imputation for Missing Survey Data A General Approach to Variance Estimation under Imputation for Missing Survey Data J.N.K. Rao Carleton University Ottawa, Canada 1 2 1 Joint work with J.K. Kim at Iowa State University. 2 Workshop on Survey

More information

3.1 Stationary Processes and Mean Reversion

3.1 Stationary Processes and Mean Reversion 3. Univariate Time Series Models 3.1 Stationary Processes and Mean Reversion Definition 3.1: A time series y t, t = 1,..., T is called (covariance) stationary if (1) E[y t ] = µ, for all t Cov[y t, y t

More information

The Basic Two-Level Regression Model

The Basic Two-Level Regression Model 2 The Basic Two-Level Regression Model The multilevel regression model has become known in the research literature under a variety of names, such as random coefficient model (de Leeuw & Kreft, 1986; Longford,

More information