Ordinary Least Squares: the univariate case

Size: px
Start display at page:

Download "Ordinary Least Squares: the univariate case"

Transcription

1 : the univariate case Majeure Economie September 2011

2 1 Introduction 2 The OLS method Objective and principles of OLS Deriving the OLS estimates Do OLS keep their promises? 3 The linear causal model Assumptions Identification and estimation Limits 4 A simulation & applications OLS do not always yield good estimates... But things can be improved... Empirical applications 5 Conclusion and exercises

3 Objectives Objective 1 : to make the best possible guess on a variable Y based on X. Find a function of X which yields good predictions for Y. Given cigarette prices, what will be cigarettes sales in September 2010 in France? Objective 2 : to determine the causal mechanism by which X influences Y. Cetebus paribus type of analysis. Everything else being equal, how a change in X affects Y? By how much one more year of education increases an individual s wage? By how much the hiring of more policemen would decrease the crime rate in Paris? The tool we use = a data set, in which we have the wages and number of years of education of N individuals.

4 Objective and principles of OLS What we have and what we want For each individual in our data set we observe his wage and his number of years of education. Assume we have a graph such as the one below. Relationship between the two variable seems to be linear. We want to find the line which describes best the relationship between these variables Wage Years of Schooling

5 Objective and principles of OLS The principle of OLS A line is characterized by a slope and by an intercept that we denote α and β. Idea = choose for α and β the values which minimize (Yi α β X i ) 2. Estimates. Let us denote Ŷi = α + β X i. It represents the wage of individual i as predicted by our model. We also denote ε i = Y i Ŷi. The ε i are called the estimated residuals and represent the mistake made by our model when predicting individual i s wage based on his number of years of schooling. => the principle of OLS is merely to minimize the sum of the mistakes we make when we use an affine function of X i to predict Y i. Why do we take the square of ε i? Could we have used another function?

6 Objective and principles of OLS A graphical example Wage Years of Schooling

7 Deriving the OLS estimates Finding α and β (Theorem 1.1) We denote Y = 1 N Yi the empirical mean of (Y i ), X the empirical mean of (X i ), V e (X ) = 1 N X 2 i ( 1 ) 2 N Xi the empirical variance of (X i ) and finally cov e (X, Y ) = 1 N Xi Y i X Y the empirical covariance of (X i ) and (Y i ). We want to minimize f ( α, β) = (Y i α β X i ) 2. Solution: β = cov e(x,y ) cove(x,y ) V e(x ) and α = Y V e(x ) X. Can we compute β from the sample? Any problem with the computations? Any idea to interpret this result?

8 Deriving the OLS estimates An example Compute β in this simple example: Individual Years of Schooling Wage

9 Do OLS keep their promises? Do OLS attain objectives 1 and 2? Objective 1: find the best prediction for Y based on X / find a function P(X i ) of X i which yields good predictions for Y i. Objective 2: determine the causal mechanism by which X influences Y.

10 Do OLS keep their promises? OLS partially reach objective 1. Once agreed that a good prediction is a prediction which minimizes the square of errors, OLS yield by construction the best prediction function for Y, among all affine functions of X. But: the criterion can be challenged: minimize ε i instead of 2 εi. This is not so big an issue. Quantile regression models minimize ε i and results usually close from OLS. even if the criterion is accepted, OLS yield the best prediction function among all affine functions of X, not among all functions of X. There might for instance exist a polynomial function of X : α + β X + γ X 2 which yields errors ε i such that ( ε i ) 2 < εi 2. Not so big an issue neither, see next chapter. How to measure the extent to which Objective 1 is reached?

11 Do OLS keep their promises? The R 2 : a measure of the quality of our predictions SST = (Y i Ȳ ) 2 : the dispersion of wages. SSE = (Ŷi Ȳ ) 2 : the dispersion of predicted wages. SSR = (Y i Ŷi) 2 : the sum of the square of the errors. SST = (Y i Ȳ ) 2 = (Y i Ŷi + Ŷi Ȳ ) 2 = (Yi Ŷi) 2 + (Ŷi Ȳ ) ε i (Ŷi Ȳ ) = SSE + SSR + 2 α ε i + 2 β ε i X i 2Y ε i. According to FOC1, ε i = 0, according to FOC2, εi x i = 0. Therefore, SST = SSE + SSR. R 2 = SSE SST. The R2 is always included between 0 and 1 (why?). It is a measure of the share of the variance observed in the sample our model is able to account for, of the quality of our predictions for Y based on X. However, a model with a low R-square can still be helpful and models with high R-squared can be helpless.

12 Do OLS keep their promises? But OLS do not necessarily reach objective Wage Years of Schooling Individuals with more schooling have higher wages. Does it imply that schooling has a causal impact on wages?

13 Do OLS keep their promises? But OLS do not necessarily reach objective 2. The line can be inverted causality goes in the other direction. Reverse causality. Here, not an issue: higher wages cannot cause longer education because schooling takes place before labor market participation. Individuals with many years of schooling make more money than those with few years of schooling. But do those two groups only differ on their number of years of schooling? Probably not. For instance, those with more years of schooling might have richer parents, or might also be more clever. this correlation between wages and education, is it only due to the effect of education on wages, or to the fact that those with more education are also more clever and have richer parents? Omitted variable bias.

14 Do OLS keep their promises? A causal framework Parents wage Well paid parents can afford sending their children to school, then to college and finally to university Well paid parents have good networking skills, know how to get good positions => can help their children Children s education Education increases children s productivity + ability to find a well paid job (signalling theory) Children s wage True causal impact of education on wages = green cell. If this framework is true, does β, i.e. the correlation between children s education and wage measures the green cell only? Does it overestimate or underestimate the green cell?

15 Assumptions Positing a linear causal model We assume that for every individual, his income is generated according to the following model: Income = α + β Number of Years of Education + ε More formally: Y i = α + β X i + ε i. Y i is the dependent variable, X i the explanatory variable, and ε i the error term: all other determinants of income (cleverness, gender...). Assumption 1. β measures by how much wage changes when education of an individual increases by one year and all the other determinants of income (ε) remain unchanged (cetebus paribus impact of education), i.e. the causal impact of education on income. Assuming that education has an influence on income does not seem to be too big an assumption. However, we assume that this influence is linear, when the number of years of education is increased by 1, wage increases by β. Realistic? Moreover, we assume that this influence is the same for everyone: β does not depend on i. Realistic?

16 Assumptions Why is linearity not so stupid an assumption... If the relationship between the data does not look linear at all, you can try to estimate a different equation: Y i = α + βxi 2 + ε i for instance if the relationship is quadratic. If the data looks as in the graph below, which relationship do you want to estimate?

17 Assumptions Other assumptions Assumption 2 : random sampling. (X i, ε i ) is independent from (X j, ε j ). This amounts to say that the number of years of education completed by Mr Dupont, or his marital status, is not related to Mr Duchamp s who lives fifty kilometers from him and whom he does not know. This seems fairly credible. Assumption 3 : sample variation. In our sample, not all the X i are equal. Trivial assumption: if it is not verified, that is to say if all the individuals in our sample have the same number of years of education, it is impossible to determine the impact of education on wage from our data. This implies that V e (X i ) > 0. Assumption 4: ε i X i Question: in our example of wage and education, do you believe that ε i X i?

18 Identification and estimation What is identification? Identification amounts to finding a formula relating an unknown parameter (here this unknown parameter will be β, the causal impact of education on wages) to quantities that we can estimate from the data.

19 Identification and estimation Identification of the linear model Theorem: under assumption 1 to 4, β is identified. Proof: cov(y i, X i ) = cov(α + β X i + ε i, X i ) according to assumption 1 = cov(α, X i ) + βcov(x i, X i ) + cov(ε i, X i ) according to the properties of covariance = βv (X i ) since cov(α, X i ) = 0 and cov(ε i, X i ) = 0 according to assumption 5. Therefore, β = cov(y i,x i ) V (X i ).

20 Identification and estimation How to estimate β? As shown above, β = cov(y i,x i ) V (X i ). Any idea on a good estimator β?

21 Identification and estimation Consistency of β β = cove(y i,x i ) V e(x i ). Law of large numbers: cov e (Y i, X i ) cov(y i, X i ) and V e (X i ) V (X i ). Therefore: β β = cov(y i,x i ) V (X i ) when the number of observations in the sample goes to infinity.

22 Identification and estimation Asymptotic normality of β The OLS estimators are asymptotically normal, in the sense that σ N( β β) N(0, 2 V (X )) (central limit theorem) The meaning of this is that when the size of the sample is large, we can state that N( β β) is approximately normally distributed. Proof at page 177 of your text book. This result is important to build up confidence intervals for β.

23 Identification and estimation Variance of β Let us denote σ 2 = V (ε i ). The variance of β is equal to σ 2 (Xi X ) 2 (you can find a proof at page 55 of the textbook): It is increasing with σ 2. The more the error term is spread, the harder it is to estimate precisely β. For instance, assume that unobserved determinants of wage (ambition, ability, age...) play an important role in wage setting. For some individuals, ε i will take very high positive values, and for others it will take very low negative values. We will therefore be likely to be faced to individuals with low levels of education and high wages and conversely, which will make the estimation of β difficult. The more X i is volatile in our sample, the more precisely we estimate β. Finally, (X i X ) 2 is increasing with N, the number of people in our sample.

24 Identification and estimation Estimating σ 2 In next session we will need to use an estimator of the variance of the error term. Usually, to estimate for instance a theoretical mean, we use the empirical one. Here, we use the same idea: to estimate the variance of the error term, a natural idea would be to use the empirical 1 variance of the estimated residuals: 2 N εi. This estimator indeed converges to σ 2 (LLN). However it is biased: one can show that E( 1 2 N εi ) = N 2 N σ2. Thus, we prefer to use the following unbiased estimator σ 2 = 1 2 N 2 εi. It is easy to show that this estimator also converges to σ 2.

25 Limits Link with OLS In the linear model, β represents the causal impact of X on Y. Under various (very strong) assumptions, one can show that β = cov(y i,x i ) V (X i ), which can be estimated from the sample by the quantity β = cove(x,y ) V e(x ). As you may have noticed, this estimator β is the same as the quantity we derived in section 2 with the OLS method. => if the linear model assumptions are verified, then predictions based on OLS are not only the best predictions for Y based on X, but β also describes the causal impact of X on Y. But are the linear model assumptions credible?

26 Limits Review of the assumptions of the linear model Assumption 1: fairly credible up to the linear approximation (impact of education on wage might not be linear) and to the constant effect assumption Assumption 2 and 3: credible. Assumption 4: extremely strong assumption. Amounts to stating that X is not correlated to all other determinants of Y. Credible in the wage / education example?

27 Limits What happens if assumption 4 is not verified? Theorem: If assumption 5 is not verified, then the OLS estimator β is not a consistent estimator of β, the causal impact of X on Y. Proof: cov(y i, X i ) = cov(α + β X i + ε i, X i ) = βcov(x i, X i ) + cov(ε i, X i ) Therefore, β = cov(y i,x i ) V (X i ) cov(ε i,x i ) V (X i ). Since β cov(y i,x i ) V (X i ), β is not consistent. The asymptotic bias, that is to say the difference between the limit of β and β is equal to cov(ε i,x i ) V (X i ) : the stronger the correlation between ε and X, the larger the bias. If X and ε are positively (resp. negatively) related, β overestimates (resp. underestimates) β. In the wage / education example, do you think β over or underestimates β?

28 OLS do not always yield good estimates... Generating 18 random pairs for wage and education (1/2) Open an Excel file, write in cell A1 to A18 = 2000 (alea() 0, 5) if you have the French version of Excel. The 18 random numbers you have generated thus stand for the ε in our model. They are supposed to be independent. Do they verify the other assumptions we made on the ε? What kind of distribution do they follow? What is their expectation and their variance? Then, write from cell B1 to B18 = ent(10 + alea() 10). These 18 random numbers stand for the number of schooling years. Do they verify the assumptions we imposed on the X i? Finally, write in cell C1 = B1 + A1, and extend this formula until C18. What do these 18 numbers stand for? Do the X i truly have a causal impact on the Y i here? In this experiment, what are the true values of α and β?

29 OLS do not always yield good estimates... Generating 18 random pairs for wage and education (2/2) Select cell B1 to C18, go to the assistant graphique and make a graph, choosing the option nuage de points. Once this is done, select your graph and go to the graphic menu, select the Ajouter une courbe de tendance option. Choose the linear type of curve and go to options. Select Afficher l équation sur le graphique and Afficher le coefficient de détermination sur le graphique. Once this is done, write down on a sheet of paper the values for β that appears on the graphic. Is it close to the trueβ? Any idea of why it is the case?

30 OLS do not always yield good estimates... What I get y = 32,612x ,6 R 2 = 0,0369 Wage Years of Schooling

31 But things can be improved... Illustrating some points of the course In the first column, write = 200 (alea() 0, 5) instead of = 2000 (alea() 0, 5). Is your new estimate β closer from the true β? What is your intuition to explain this result? Now write = 4000 (alea() 0, 5) in cell A1 and extend the formulas in cells A1, B1 and C1 up to A200, B200 and C200. Draw a new graph similar to the previous one but selecting cells from B1 to C200. Is your new estimate β closer from the true β? What is your intuition to explain this result?

32 But things can be improved... What I get y = 101,08x ,8 R 2 = 0,9661 Wage Years of Schooling

33 But things can be improved... What I get Wage y = 97,233x ,2 R 2 = 0, Years of Schooling

34 Empirical applications Consequences of smoking when pregnant In a sample of American mothers who gave birth to a child in 1988, we estimate the following relationship: weight of the child in grams = α + β daily cigarettes smoked by mother during pregnancy + ε. Results: α = 3395, β = 14, 57. How to interpret β? Are the various assumptions needed for OLS to be unbiased etc. verified here according to you?

35 Empirical applications Consequences of attending a class on exam grade Assume we want to estimate the following model among students attending an econometric course: final grade = α + β number of classes attended + ε. Do you think that the estimated value β would estimate properly the true causal impact of attendance on final grade?

36 Conclusion Today, we have seen the OLS technique to make a prediction for Y based on X. We have seen that up to two small limits, this prediction is the best we can make => our first goal was reached. However, we have seen that OLS estimators also describe the causal impact of X on Y iif a very restrictive assumption is made, which is that X is uncorrelated to all other determinants of Y. But in many situations, unlikely to hold => in most cases we will not be able to achieve our second goal with OLS. Finally, we have seen with some simulations that even in situations where all OLS assumptions are verified (which we can be sure of because we used data generated by the computer), OLS estimators can be far from the true values when the sample size is small. => do not do statistics with small samples! References Clément for this de Chaisemartin chapter: chapter Ordinary2Least and Squares 5 of your textbook.

Econometrics Simple Linear Regression

Econometrics Simple Linear Regression Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight

More information

2. Linear regression with multiple regressors

2. Linear regression with multiple regressors 2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions

More information

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the

More information

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written

More information

Financial Risk Management Exam Sample Questions/Answers

Financial Risk Management Exam Sample Questions/Answers Financial Risk Management Exam Sample Questions/Answers Prepared by Daniel HERLEMONT 1 2 3 4 5 6 Chapter 3 Fundamentals of Statistics FRM-99, Question 4 Random walk assumes that returns from one time period

More information

Section 14 Simple Linear Regression: Introduction to Least Squares Regression

Section 14 Simple Linear Regression: Introduction to Least Squares Regression Slide 1 Section 14 Simple Linear Regression: Introduction to Least Squares Regression There are several different measures of statistical association used for understanding the quantitative relationship

More information

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2

ECON 142 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE #2 University of California, Berkeley Prof. Ken Chay Department of Economics Fall Semester, 005 ECON 14 SKETCH OF SOLUTIONS FOR APPLIED EXERCISE # Question 1: a. Below are the scatter plots of hourly wages

More information

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade

Answer: C. The strength of a correlation does not change if units change by a linear transformation such as: Fahrenheit = 32 + (5/9) * Centigrade Statistics Quiz Correlation and Regression -- ANSWERS 1. Temperature and air pollution are known to be correlated. We collect data from two laboratories, in Boston and Montreal. Boston makes their measurements

More information

Multiple Linear Regression in Data Mining

Multiple Linear Regression in Data Mining Multiple Linear Regression in Data Mining Contents 2.1. A Review of Multiple Linear Regression 2.2. Illustration of the Regression Process 2.3. Subset Selection in Linear Regression 1 2 Chap. 2 Multiple

More information

Empirical Methods in Applied Economics

Empirical Methods in Applied Economics Empirical Methods in Applied Economics Jörn-Ste en Pischke LSE October 2005 1 Observational Studies and Regression 1.1 Conditional Randomization Again When we discussed experiments, we discussed already

More information

Simple Linear Regression Inference

Simple Linear Regression Inference Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation

More information

Econometrics Problem Set #2

Econometrics Problem Set #2 Econometrics Problem Set #2 Nathaniel Higgins nhiggins@jhu.edu Assignment The homework assignment was to read chapter 2 and hand in answers to the following problems at the end of the chapter: 2.1 2.5

More information

Simple Regression Theory II 2010 Samuel L. Baker

Simple Regression Theory II 2010 Samuel L. Baker SIMPLE REGRESSION THEORY II 1 Simple Regression Theory II 2010 Samuel L. Baker Assessing how good the regression equation is likely to be Assignment 1A gets into drawing inferences about how close the

More information

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r),

Chapter 10. Key Ideas Correlation, Correlation Coefficient (r), Chapter 0 Key Ideas Correlation, Correlation Coefficient (r), Section 0-: Overview We have already explored the basics of describing single variable data sets. However, when two quantitative variables

More information

CALCULATIONS & STATISTICS

CALCULATIONS & STATISTICS CALCULATIONS & STATISTICS CALCULATION OF SCORES Conversion of 1-5 scale to 0-100 scores When you look at your report, you will notice that the scores are reported on a 0-100 scale, even though respondents

More information

Module 5: Multiple Regression Analysis

Module 5: Multiple Regression Analysis Using Statistical Data Using to Make Statistical Decisions: Data Multiple to Make Regression Decisions Analysis Page 1 Module 5: Multiple Regression Analysis Tom Ilvento, University of Delaware, College

More information

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression

Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Unit 31 A Hypothesis Test about Correlation and Slope in a Simple Linear Regression Objectives: To perform a hypothesis test concerning the slope of a least squares line To recognize that testing for a

More information

Session 7 Bivariate Data and Analysis

Session 7 Bivariate Data and Analysis Session 7 Bivariate Data and Analysis Key Terms for This Session Previously Introduced mean standard deviation New in This Session association bivariate analysis contingency table co-variation least squares

More information

Simple linear regression

Simple linear regression Simple linear regression Introduction Simple linear regression is a statistical method for obtaining a formula to predict values of one variable from another where there is a causal relationship between

More information

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples

More information

Section 1: Simple Linear Regression

Section 1: Simple Linear Regression Section 1: Simple Linear Regression Carlos M. Carvalho The University of Texas McCombs School of Business http://faculty.mccombs.utexas.edu/carlos.carvalho/teaching/ 1 Regression: General Introduction

More information

Covariance and Correlation

Covariance and Correlation Covariance and Correlation ( c Robert J. Serfling Not for reproduction or distribution) We have seen how to summarize a data-based relative frequency distribution by measures of location and spread, such

More information

CAPM, Arbitrage, and Linear Factor Models

CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, Linear Factor Models 1/ 41 Introduction We now assume all investors actually choose mean-variance e cient portfolios. By equating these investors

More information

4. Continuous Random Variables, the Pareto and Normal Distributions

4. Continuous Random Variables, the Pareto and Normal Distributions 4. Continuous Random Variables, the Pareto and Normal Distributions A continuous random variable X can take any value in a given range (e.g. height, weight, age). The distribution of a continuous random

More information

Lecture 15. Endogeneity & Instrumental Variable Estimation

Lecture 15. Endogeneity & Instrumental Variable Estimation Lecture 15. Endogeneity & Instrumental Variable Estimation Saw that measurement error (on right hand side) means that OLS will be biased (biased toward zero) Potential solution to endogeneity instrumental

More information

3.1 Least squares in matrix form

3.1 Least squares in matrix form 118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression

More information

Solución del Examen Tipo: 1

Solución del Examen Tipo: 1 Solución del Examen Tipo: 1 Universidad Carlos III de Madrid ECONOMETRICS Academic year 2009/10 FINAL EXAM May 17, 2010 DURATION: 2 HOURS 1. Assume that model (III) verifies the assumptions of the classical

More information

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem

Chapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become

More information

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS

MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MULTIPLE REGRESSION AND ISSUES IN REGRESSION ANALYSIS MSR = Mean Regression Sum of Squares MSE = Mean Squared Error RSS = Regression Sum of Squares SSE = Sum of Squared Errors/Residuals α = Level of Significance

More information

Chapter 13 Introduction to Linear Regression and Correlation Analysis

Chapter 13 Introduction to Linear Regression and Correlation Analysis Chapter 3 Student Lecture Notes 3- Chapter 3 Introduction to Linear Regression and Correlation Analsis Fall 2006 Fundamentals of Business Statistics Chapter Goals To understand the methods for displaing

More information

The Method of Least Squares

The Method of Least Squares Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this

More information

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis

Economics of Strategy (ECON 4550) Maymester 2015 Applications of Regression Analysis Economics of Strategy (ECON 4550) Maymester 015 Applications of Regression Analysis Reading: ACME Clinic (ECON 4550 Coursepak, Page 47) and Big Suzy s Snack Cakes (ECON 4550 Coursepak, Page 51) Definitions

More information

Zero: If P is a polynomial and if c is a number such that P (c) = 0 then c is a zero of P.

Zero: If P is a polynomial and if c is a number such that P (c) = 0 then c is a zero of P. MATH 11011 FINDING REAL ZEROS KSU OF A POLYNOMIAL Definitions: Polynomial: is a function of the form P (x) = a n x n + a n 1 x n 1 + + a x + a 1 x + a 0. The numbers a n, a n 1,..., a 1, a 0 are called

More information

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling

What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling What s New in Econometrics? Lecture 8 Cluster and Stratified Sampling Jeff Wooldridge NBER Summer Institute, 2007 1. The Linear Model with Cluster Effects 2. Estimation with a Small Number of Groups and

More information

Fairfield Public Schools

Fairfield Public Schools Mathematics Fairfield Public Schools AP Statistics AP Statistics BOE Approved 04/08/2014 1 AP STATISTICS Critical Areas of Focus AP Statistics is a rigorous course that offers advanced students an opportunity

More information

Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS

Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS DUSP 11.203 Frank Levy Microeconomics Sept. 16, 2010 NOTES ON CALCULUS AND UTILITY FUNCTIONS These notes have three purposes: 1) To explain why some simple calculus formulae are useful in understanding

More information

Regression III: Advanced Methods

Regression III: Advanced Methods Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models

More information

Standard Deviation Estimator

Standard Deviation Estimator CSS.com Chapter 905 Standard Deviation Estimator Introduction Even though it is not of primary interest, an estimate of the standard deviation (SD) is needed when calculating the power or sample size of

More information

Statistics 151 Practice Midterm 1 Mike Kowalski

Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Mike Kowalski Statistics 151 Practice Midterm 1 Multiple Choice (50 minutes) Instructions: 1. This is a closed book exam. 2. You may use the STAT 151 formula sheets and

More information

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra: Partial Fractions Combining fractions over a common denominator is a familiar operation from algebra: From the standpoint of integration, the left side of Equation 1 would be much easier to work with than

More information

1 Another method of estimation: least squares

1 Another method of estimation: least squares 1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i

More information

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics

Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics Institute of Actuaries of India Subject CT3 Probability and Mathematical Statistics For 2015 Examinations Aim The aim of the Probability and Mathematical Statistics subject is to provide a grounding in

More information

1 Short Introduction to Time Series

1 Short Introduction to Time Series ECONOMICS 7344, Spring 202 Bent E. Sørensen January 24, 202 Short Introduction to Time Series A time series is a collection of stochastic variables x,.., x t,.., x T indexed by an integer value t. The

More information

Time Series and Forecasting

Time Series and Forecasting Chapter 22 Page 1 Time Series and Forecasting A time series is a sequence of observations of a random variable. Hence, it is a stochastic process. Examples include the monthly demand for a product, the

More information

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions

Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions Wooldridge, Introductory Econometrics, 3d ed. Chapter 12: Serial correlation and heteroskedasticity in time series regressions What will happen if we violate the assumption that the errors are not serially

More information

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number

1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number 1) Write the following as an algebraic expression using x as the variable: Triple a number subtracted from the number A. 3(x - x) B. x 3 x C. 3x - x D. x - 3x 2) Write the following as an algebraic expression

More information

Correlation key concepts:

Correlation key concepts: CORRELATION Correlation key concepts: Types of correlation Methods of studying correlation a) Scatter diagram b) Karl pearson s coefficient of correlation c) Spearman s Rank correlation coefficient d)

More information

Forecasting Methods. What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes?

Forecasting Methods. What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes? Forecasting Methods What is forecasting? Why is forecasting important? How can we evaluate a future demand? How do we make mistakes? Prod - Forecasting Methods Contents. FRAMEWORK OF PLANNING DECISIONS....

More information

Some useful concepts in univariate time series analysis

Some useful concepts in univariate time series analysis Some useful concepts in univariate time series analysis Autoregressive moving average models Autocorrelation functions Model Estimation Diagnostic measure Model selection Forecasting Assumptions: 1. Non-seasonal

More information

2013 MBA Jump Start Program. Statistics Module Part 3

2013 MBA Jump Start Program. Statistics Module Part 3 2013 MBA Jump Start Program Module 1: Statistics Thomas Gilbert Part 3 Statistics Module Part 3 Hypothesis Testing (Inference) Regressions 2 1 Making an Investment Decision A researcher in your firm just

More information

Zeros of Polynomial Functions

Zeros of Polynomial Functions Zeros of Polynomial Functions The Rational Zero Theorem If f (x) = a n x n + a n-1 x n-1 + + a 1 x + a 0 has integer coefficients and p/q (where p/q is reduced) is a rational zero, then p is a factor of

More information

Coefficient of Determination

Coefficient of Determination Coefficient of Determination The coefficient of determination R 2 (or sometimes r 2 ) is another measure of how well the least squares equation ŷ = b 0 + b 1 x performs as a predictor of y. R 2 is computed

More information

Introduction to Regression and Data Analysis

Introduction to Regression and Data Analysis Statlab Workshop Introduction to Regression and Data Analysis with Dan Campbell and Sherlock Campbell October 28, 2008 I. The basics A. Types of variables Your variables may take several forms, and it

More information

Module 3: Correlation and Covariance

Module 3: Correlation and Covariance Using Statistical Data to Make Decisions Module 3: Correlation and Covariance Tom Ilvento Dr. Mugdim Pašiƒ University of Delaware Sarajevo Graduate School of Business O ften our interest in data analysis

More information

An Introduction to Regression Analysis

An Introduction to Regression Analysis The Inaugural Coase Lecture An Introduction to Regression Analysis Alan O. Sykes * Regression analysis is a statistical tool for the investigation of relationships between variables. Usually, the investigator

More information

Concepts in Investments Risks and Returns (Relevant to PBE Paper II Management Accounting and Finance)

Concepts in Investments Risks and Returns (Relevant to PBE Paper II Management Accounting and Finance) Concepts in Investments Risks and Returns (Relevant to PBE Paper II Management Accounting and Finance) Mr. Eric Y.W. Leung, CUHK Business School, The Chinese University of Hong Kong In PBE Paper II, students

More information

CURVE FITTING LEAST SQUARES APPROXIMATION

CURVE FITTING LEAST SQUARES APPROXIMATION CURVE FITTING LEAST SQUARES APPROXIMATION Data analysis and curve fitting: Imagine that we are studying a physical system involving two quantities: x and y Also suppose that we expect a linear relationship

More information

Example: Boats and Manatees

Example: Boats and Manatees Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant

More information

CORRELATION ANALYSIS

CORRELATION ANALYSIS CORRELATION ANALYSIS Learning Objectives Understand how correlation can be used to demonstrate a relationship between two factors. Know how to perform a correlation analysis and calculate the coefficient

More information

Introduction to Quantitative Methods

Introduction to Quantitative Methods Introduction to Quantitative Methods October 15, 2009 Contents 1 Definition of Key Terms 2 2 Descriptive Statistics 3 2.1 Frequency Tables......................... 4 2.2 Measures of Central Tendencies.................

More information

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13

Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Missing Data: Part 1 What to Do? Carol B. Thompson Johns Hopkins Biostatistics Center SON Brown Bag 3/20/13 Overview Missingness and impact on statistical analysis Missing data assumptions/mechanisms Conventional

More information

Chapter 4: Vector Autoregressive Models

Chapter 4: Vector Autoregressive Models Chapter 4: Vector Autoregressive Models 1 Contents: Lehrstuhl für Department Empirische of Wirtschaftsforschung Empirical Research and und Econometrics Ökonometrie IV.1 Vector Autoregressive Models (VAR)...

More information

Second Order Linear Nonhomogeneous Differential Equations; Method of Undetermined Coefficients. y + p(t) y + q(t) y = g(t), g(t) 0.

Second Order Linear Nonhomogeneous Differential Equations; Method of Undetermined Coefficients. y + p(t) y + q(t) y = g(t), g(t) 0. Second Order Linear Nonhomogeneous Differential Equations; Method of Undetermined Coefficients We will now turn our attention to nonhomogeneous second order linear equations, equations with the standard

More information

Describing Relationships between Two Variables

Describing Relationships between Two Variables Describing Relationships between Two Variables Up until now, we have dealt, for the most part, with just one variable at a time. This variable, when measured on many different subjects or objects, took

More information

Multiple Regression: What Is It?

Multiple Regression: What Is It? Multiple Regression Multiple Regression: What Is It? Multiple regression is a collection of techniques in which there are multiple predictors of varying kinds and a single outcome We are interested in

More information

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability).

1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability). Examples of Questions on Regression Analysis: 1. Suppose that a score on a final exam depends upon attendance and unobserved factors that affect exam performance (such as student ability). Then,. When

More information

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS

Chapter Seven. Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Chapter Seven Multiple regression An introduction to multiple regression Performing a multiple regression on SPSS Section : An introduction to multiple regression WHAT IS MULTIPLE REGRESSION? Multiple

More information

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1.

So, using the new notation, P X,Y (0,1) =.08 This is the value which the joint probability function for X and Y takes when X=0 and Y=1. Joint probabilit is the probabilit that the RVs & Y take values &. like the PDF of the two events, and. We will denote a joint probabilit function as P,Y (,) = P(= Y=) Marginal probabilit of is the probabilit

More information

Chapter 5. Conditional CAPM. 5.1 Conditional CAPM: Theory. 5.1.1 Risk According to the CAPM. The CAPM is not a perfect model of expected returns.

Chapter 5. Conditional CAPM. 5.1 Conditional CAPM: Theory. 5.1.1 Risk According to the CAPM. The CAPM is not a perfect model of expected returns. Chapter 5 Conditional CAPM 5.1 Conditional CAPM: Theory 5.1.1 Risk According to the CAPM The CAPM is not a perfect model of expected returns. In the 40+ years of its history, many systematic deviations

More information

Difference in differences and Regression Discontinuity Design

Difference in differences and Regression Discontinuity Design Difference in differences and Regression Discontinuity Design Majeure Economie September 2011 1 Difference in differences Intuition Identification of a causal effect Discussion of the assumption Examples

More information

Slope-Intercept Equation. Example

Slope-Intercept Equation. Example 1.4 Equations of Lines and Modeling Find the slope and the y intercept of a line given the equation y = mx + b, or f(x) = mx + b. Graph a linear equation using the slope and the y-intercept. Determine

More information

2. Simple Linear Regression

2. Simple Linear Regression Research methods - II 3 2. Simple Linear Regression Simple linear regression is a technique in parametric statistics that is commonly used for analyzing mean response of a variable Y which changes according

More information

Earnings in private jobs after participation to post-doctoral programs : an assessment using a treatment effect model. Isabelle Recotillet

Earnings in private jobs after participation to post-doctoral programs : an assessment using a treatment effect model. Isabelle Recotillet Earnings in private obs after participation to post-doctoral programs : an assessment using a treatment effect model Isabelle Recotillet Institute of Labor Economics and Industrial Sociology, UMR 6123,

More information

5. Linear Regression

5. Linear Regression 5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4

More information

The Bivariate Normal Distribution

The Bivariate Normal Distribution The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included

More information

Zeros of Polynomial Functions

Zeros of Polynomial Functions Zeros of Polynomial Functions Objectives: 1.Use the Fundamental Theorem of Algebra to determine the number of zeros of polynomial functions 2.Find rational zeros of polynomial functions 3.Find conjugate

More information

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014.

University of Ljubljana Doctoral Programme in Statistics Methodology of Statistical Research Written examination February 14 th, 2014. University of Ljubljana Doctoral Programme in Statistics ethodology of Statistical Research Written examination February 14 th, 2014 Name and surname: ID number: Instructions Read carefully the wording

More information

Penalized regression: Introduction

Penalized regression: Introduction Penalized regression: Introduction Patrick Breheny August 30 Patrick Breheny BST 764: Applied Statistical Modeling 1/19 Maximum likelihood Much of 20th-century statistics dealt with maximum likelihood

More information

PART A: For each worker, determine that worker's marginal product of labor.

PART A: For each worker, determine that worker's marginal product of labor. ECON 3310 Homework #4 - Solutions 1: Suppose the following indicates how many units of output y you can produce per hour with different levels of labor input (given your current factory capacity): PART

More information

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization

Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization Lecture 2. Marginal Functions, Average Functions, Elasticity, the Marginal Principle, and Constrained Optimization 2.1. Introduction Suppose that an economic relationship can be described by a real-valued

More information

Linear and quadratic Taylor polynomials for functions of several variables.

Linear and quadratic Taylor polynomials for functions of several variables. ams/econ 11b supplementary notes ucsc Linear quadratic Taylor polynomials for functions of several variables. c 010, Yonatan Katznelson Finding the extreme (minimum or maximum) values of a function, is

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Module 7 Test Name MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. You are given information about a straight line. Use two points to graph the equation.

More information

Quadratic forms Cochran s theorem, degrees of freedom, and all that

Quadratic forms Cochran s theorem, degrees of freedom, and all that Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us

More information

1 Teaching notes on GMM 1.

1 Teaching notes on GMM 1. Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in

More information

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots

Correlational Research. Correlational Research. Stephen E. Brock, Ph.D., NCSP EDS 250. Descriptive Research 1. Correlational Research: Scatter Plots Correlational Research Stephen E. Brock, Ph.D., NCSP California State University, Sacramento 1 Correlational Research A quantitative methodology used to determine whether, and to what degree, a relationship

More information

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds

Expression. Variable Equation Polynomial Monomial Add. Area. Volume Surface Space Length Width. Probability. Chance Random Likely Possibility Odds Isosceles Triangle Congruent Leg Side Expression Equation Polynomial Monomial Radical Square Root Check Times Itself Function Relation One Domain Range Area Volume Surface Space Length Width Quantitative

More information

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra

U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009. Notes on Algebra U.C. Berkeley CS276: Cryptography Handout 0.1 Luca Trevisan January, 2009 Notes on Algebra These notes contain as little theory as possible, and most results are stated without proof. Any introductory

More information

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY

Lean Six Sigma Analyze Phase Introduction. TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY TECH 50800 QUALITY and PRODUCTIVITY in INDUSTRY and TECHNOLOGY Before we begin: Turn on the sound on your computer. There is audio to accompany this presentation. Audio will accompany most of the online

More information

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question.

MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. Review MULTIPLE CHOICE. Choose the one alternative that best completes the statement or answers the question. 1) All but one of these statements contain a mistake. Which could be true? A) There is a correlation

More information

SYSTEMS OF REGRESSION EQUATIONS

SYSTEMS OF REGRESSION EQUATIONS SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations

More information

INTRODUCTION TO MULTIPLE CORRELATION

INTRODUCTION TO MULTIPLE CORRELATION CHAPTER 13 INTRODUCTION TO MULTIPLE CORRELATION Chapter 12 introduced you to the concept of partialling and how partialling could assist you in better interpreting the relationship between two primary

More information

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group MISSING DATA TECHNIQUES WITH SAS IDRE Statistical Consulting Group ROAD MAP FOR TODAY To discuss: 1. Commonly used techniques for handling missing data, focusing on multiple imputation 2. Issues that could

More information

Chapter 3: The Multiple Linear Regression Model

Chapter 3: The Multiple Linear Regression Model Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics

More information

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name:

Good luck! BUSINESS STATISTICS FINAL EXAM INSTRUCTIONS. Name: Glo bal Leadership M BA BUSINESS STATISTICS FINAL EXAM Name: INSTRUCTIONS 1. Do not open this exam until instructed to do so. 2. Be sure to fill in your name before starting the exam. 3. You have two hours

More information

LOGIT AND PROBIT ANALYSIS

LOGIT AND PROBIT ANALYSIS LOGIT AND PROBIT ANALYSIS A.K. Vasisht I.A.S.R.I., Library Avenue, New Delhi 110 012 amitvasisht@iasri.res.in In dummy regression variable models, it is assumed implicitly that the dependent variable Y

More information

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)

Summary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4) Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume

More information

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( )

NCSS Statistical Software Principal Components Regression. In ordinary least squares, the regression coefficients are estimated using the formula ( ) Chapter 340 Principal Components Regression Introduction is a technique for analyzing multiple regression data that suffer from multicollinearity. When multicollinearity occurs, least squares estimates

More information

1 Mathematical Models of Cost, Revenue and Profit

1 Mathematical Models of Cost, Revenue and Profit Section 1.: Mathematical Modeling Math 14 Business Mathematics II Minh Kha Goals: to understand what a mathematical model is, and some of its examples in business. Definition 0.1. Mathematical Modeling

More information

AP Physics 1 and 2 Lab Investigations

AP Physics 1 and 2 Lab Investigations AP Physics 1 and 2 Lab Investigations Student Guide to Data Analysis New York, NY. College Board, Advanced Placement, Advanced Placement Program, AP, AP Central, and the acorn logo are registered trademarks

More information