THE TWO-VARIABLE LINEAR REGRESSION MODEL



Similar documents
Properties of MLE: consistency, asymptotic normality. Fisher information.

I. Chi-squared Distributions

Hypothesis testing. Null and alternative hypotheses


5: Introduction to Estimation

Case Study. Normal and t Distributions. Density Plot. Normal Distributions

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Chapter 7 - Sampling Distributions. 1 Introduction. What is statistics? It consist of three major areas:

1 Correlation and Regression Analysis

Chapter 6: Variance, the law of large numbers and the Monte-Carlo method

1. C. The formula for the confidence interval for a population mean is: x t, which was

Chapter 7 Methods of Finding Estimators

Confidence Intervals for One Mean

Output Analysis (2, Chapters 10 &11 Law)

PSYCHOLOGICAL STATISTICS

Inference on Proportion. Chapter 8 Tests of Statistical Hypotheses. Sampling Distribution of Sample Proportion. Confidence Interval

One-sample test of proportions

Lesson 17 Pearson s Correlation Coefficient

Confidence Intervals. CI for a population mean (σ is known and n > 30 or the variable is normally distributed in the.

Center, Spread, and Shape in Inference: Claims, Caveats, and Insights

Soving Recurrence Relations

THE REGRESSION MODEL IN MATRIX FORM. For simple linear regression, meaning one predictor, the model is. for i = 1, 2, 3,, n

Non-life insurance mathematics. Nils F. Haavardsson, University of Oslo and DNB Skadeforsikring

University of California, Los Angeles Department of Statistics. Distributions related to the normal distribution

Maximum Likelihood Estimators.

1 Computing the Standard Deviation of Sample Means

A Test of Normality. 1 n S 2 3. n 1. Now introduce two new statistics. The sample skewness is defined as:

Normal Distribution.

Now here is the important step

Overview of some probability distributions.

Statistical inference: example 1. Inferential Statistics

Practice Problems for Test 3

Chapter 7: Confidence Interval and Sample Size

In nite Sequences. Dr. Philippe B. Laval Kennesaw State University. October 9, 2008

Incremental calculation of weighted mean and variance

PROCEEDINGS OF THE YEREVAN STATE UNIVERSITY AN ALTERNATIVE MODEL FOR BONUS-MALUS SYSTEM

Determining the sample size

Overview. Learning Objectives. Point Estimate. Estimation. Estimating the Value of a Parameter Using Confidence Intervals

Exam 3. Instructor: Cynthia Rudin TA: Dimitrios Bisias. November 22, 2011

CS103A Handout 23 Winter 2002 February 22, 2002 Solving Recurrence Relations

Chapter 14 Nonparametric Statistics

Week 3 Conditional probabilities, Bayes formula, WEEK 3 page 1 Expected value of a random variable

Approximating Area under a curve with rectangles. To find the area under a curve we approximate the area using rectangles and then use limits to find

The Stable Marriage Problem

BASIC STATISTICS. f(x 1,x 2,..., x n )=f(x 1 )f(x 2 ) f(x n )= f(x i ) (1)

Sampling Distribution And Central Limit Theorem

UC Berkeley Department of Electrical Engineering and Computer Science. EE 126: Probablity and Random Processes. Solutions 9 Spring 2006

The following example will help us understand The Sampling Distribution of the Mean. C1 C2 C3 C4 C5 50 miles 84 miles 38 miles 120 miles 48 miles

A probabilistic proof of a binomial identity

Annuities Under Random Rates of Interest II By Abraham Zaks. Technion I.I.T. Haifa ISRAEL and Haifa University Haifa ISRAEL.

Measures of Spread and Boxplots Discrete Math, Section 9.4

LECTURE 13: Cross-validation

, a Wishart distribution with n -1 degrees of freedom and scale matrix.

Math C067 Sampling Distributions

Chapter 5: Inner Product Spaces

Our aim is to show that under reasonable assumptions a given 2π-periodic function f can be represented as convergent series

A Mathematical Perspective on Gambling

Sequences and Series

Department of Computer Science, University of Otago

Discrete Mathematics and Probability Theory Spring 2014 Anant Sahai Note 13

Definition. A variable X that takes on values X 1, X 2, X 3,...X k with respective frequencies f 1, f 2, f 3,...f k has mean

Chapter 5: Basic Linear Regression

Confidence intervals and hypothesis tests

Central Limit Theorem and Its Applications to Baseball

THE ROLE OF EXPORTS IN ECONOMIC GROWTH WITH REFERENCE TO ETHIOPIAN COUNTRY

GCSE STATISTICS. 4) How to calculate the range: The difference between the biggest number and the smallest number.

CHAPTER 7: Central Limit Theorem: CLT for Averages (Means)

Lesson 15 ANOVA (analysis of variance)

Infinite Sequences and Series

Quadrat Sampling in Population Ecology

Asymptotic Growth of Functions

A Recursive Formula for Moments of a Binomial Distribution

MEI Structured Mathematics. Module Summary Sheets. Statistics 2 (Version B: reference to new book)

WHEN IS THE (CO)SINE OF A RATIONAL ANGLE EQUAL TO A RATIONAL NUMBER?

CHAPTER 3 THE TIME VALUE OF MONEY

MARTINGALES AND A BASIC APPLICATION

Confidence Intervals

Factoring x n 1: cyclotomic and Aurifeuillian polynomials Paul Garrett <garrett@math.umn.edu>

Systems Design Project: Indoor Location of Wireless Devices

THE ABRACADABRA PROBLEM

Universal coding for classes of sources

Section 11.3: The Integral Test

Subject CT5 Contingencies Core Technical Syllabus

The analysis of the Cournot oligopoly model considering the subjective motive in the strategy selection

Multi-server Optimal Bandwidth Monitoring for QoS based Multimedia Delivery Anup Basu, Irene Cheng and Yinzhe Yu

Lecture 4: Cauchy sequences, Bolzano-Weierstrass, and the Squeeze theorem

Theorems About Power Series

Hypergeometric Distributions

Unit 8: Inference for Proportions. Chapters 8 & 9 in IPS

Here are a couple of warnings to my students who may be here to get a copy of what happened on a day that you missed.

A modified Kolmogorov-Smirnov test for normality

Plug-in martingales for testing exchangeability on-line

Lecture 5: Span, linear independence, bases, and dimension

Ekkehart Schlicht: Economic Surplus and Derived Demand

*The most important feature of MRP as compared with ordinary inventory control analysis is its time phasing feature.

SAMPLE QUESTIONS FOR FINAL EXAM. (1) (2) (3) (4) Find the following using the definition of the Riemann integral: (2x + 1)dx

Trigonometric Form of a Complex Number. The Complex Plane. axis. ( 2, 1) or 2 i FIGURE The absolute value of the complex number z a bi is

Parametric (theoretical) probability distributions. (Wilks, Ch. 4) Discrete distributions: (e.g., yes/no; above normal, normal, below normal)

Transcription:

THE TWO-VARIABLE LINEAR REGRESSION MODEL Herma J. Bieres Pesylvaia State Uiversity April 30, 202. Itroductio Suppose you are a ecoomics or busiess maor i a college close to the beach i the souther part of the US, for example souther Califoria, where the weather is almost always ice the whole year aroud. I order to support yourself through college, you have started your ow (weeked) busiess: a ice cream parlor o the beach. You have experieced that o hot weekeds you usually sell more ice cream tha o cold weekeds. Also, you have recorded the average temperature ad the sales of ice cream durig eight weekeds. Let Y be the sales of ice cream o weeked, measured i $00, ad let X be the average temperature o weeked, measured i uits of 0 degrees Fahreheit: Table : Ice cream data Sales (uit = $00) Temperature (uit = 0 degrees) Y = 8 X = 5 Y 2 = 0 X 2 = 7 Y 3 = 8 X 3 = 6 Y 4 = 3 X 4 = 8 Y 5 = 5 X 5 = 0 Y 6 = 4 X 6 = 9 Y 7 = X 7 = 7 Y 8 = 9 X 8 = 8 You wat to use this iformatio to forecast ext weeked's sales of ice cream, give a good forecast of ext weeked's temperature. Such a forecast of the sales will eable you to These lecture otes are based o lecture otes that I wrote while teachig at the Uiversity of Califoria, Sa Diego, i the witer of 987.

reduce your cost by adustig your purchase of ice cream to the expected demad, because the ice cream you do't sell has to be throw away. Let your forecastig scheme be ˆ Y ' ˆα % ˆβ.X, i.e., give the temperature of X times 0 degrees ad give the values of α^ ad β^, Yˆ times $00 will be your forecast of the sales of ice cream. This forecastig scheme together with the poits (X,Y ), ',2,...,8, is plotted i Figure : Figure Scatter plot of (X,Y ), ',2,...,8, together with the lie Y ˆ ' ˆα % ˆβ.X. The best values for α^ ad β^ are those for which the forecast error (= actual sales mius forecasted sales) is miimal. However, you do ot kow yet the actual sales i the ext weeked, but you do kow the actual sales i the eight weekeds for which you have recorded your sales ad the correspodig temperature. So what you could do is to forecast the sales of ice cream o each of these eight weekeds ad to determie α^ ad β^ such that the forecast errors are miimal. Because forecast errors ca be positive ad egative, as ca be see from Figure, the sum of the forecast errors is ot a good measure of the performace of your forecastig 2

scheme, because large positive errors ca be offset by large egative errors. Therefore, use the sum of squared errors as your measure of the accuracy of your forecasts: Q(ˆα,ˆβ) ' (Y & Yˆ ) 2 ' (Y & ˆα & ˆβX ) 2, where is the sample size ( = 8 i our example), ad miimize Q(α^,β^) to show (see the Appedix) that Q(α^,β^) is miimal for $$ ' ' (X & X )(Y & Ȳ ) ' (X & X ) 2 ' ' (X & X )Y ' (X & X ) 2 ˆα ad ˆβ. It ca be $" ' Ȳ & $$ X, where X ' (/)' X ad Ȳ ' (/)' Y. () I the ice cream parlor case we have ' 8, X ' 7.5, Ȳ ', ' X 2 ' 468, ' X Y ' 687, ' (X & X )(Y & Ȳ ) ' ' X Y &. X.Ȳ ' 27, ' (X & X ) 2 ' ' X 2 &. X 2 ' 8, so that ˆβ '.5, ˆα ' &0.25. Thus, our best forecastig scheme is Y ˆ ' &0.25 %.5X. This is the straight lie i Figure. hece ˆ Y x $00 Now suppose that the forecast of ext weeked's temperature is 75 degrees. The X = 7.5, ˆ Y ' &0.25 %.5 (7.5) = $,00. =. Therefore, the best forecast of ext weeked's sales is: 2. The two-variable liear regressio model. I order to aswer the questio how good this forecast is, we have to make assumptios about the true relatioship betwee the depedet variable Y ad the idepedet variable X, (also called the explaatory variable). The true relatioship we are goig to assume is the twovariable liear regressio model: Y ' " % $.X % U, ',2,...,. (2) 3

The U 's are radom error variables, called error terms, for which we assume: Assumptio I: The U 's are idepedet ad idetically distributed (i.i.d) radom variables. Assumptio II: The mathematical expectatio of U equals zero: E(U ) = 0 for =,2,...,. Assumptio III: The variace σ 2 ' var(u ) ' E[(U &E(U )) 2 ] ' E[U 2 ] of the U 's is costat ad fiite. Regardig the explaatory variables X we shall assume for the time beig that Assumptio IV: The idepedet variables X are o-radom. This assumptio is ot strictly ecessary, ad is actually quite urealistic i ecoomics, but will be made for the sake of coveiece, as it will ease the argumet. Fially, we will assume that the errors are ormally distributed: Assumptio V: The errors U 's are N(0, σ 2 ) distributed. I particular, we shall eed the latter assumptio i order to say somethig about the reliability of the forecast. These assumptios will be relaxed later o. 3. The properties of ˆα ad ˆβ. Although we have motivated model (2) by the eed to forecast out-of-sample values of the depedet variables Y, a liear regressio model is more ofte used for testig ecoomic hypotheses. For example, let Y be the hourly wage of wage earer i a radom sample of size of wage earers, ad let X be a geder idicator, say X ' if perso is a female, ad X ' 0 if perso is a male. If you suspect geder discrimiatio i the workplace, you ca test this suspicio by testig the ull hypothesis that β = 0 (o geder discrimiatio) agaist oe of 4

three possible alterative hypotheses: (a) (b) (c) β 0: wome are paid differet hourly wages tha me, either higher or lower; β > 0: wome are paid higher hourly wages tha me; β < 0: wome are paid lower hourly wages tha me. The last hypothesis is usually what is meat by geder discrimiatio. A test for the ull hypothesis β = 0 agaist oe of these alterative hypotheses ca be based o the estimate ˆβ of β, provided that we kow how ˆβ is related to β. It will be show below that ˆα ad ˆβ are ideed reasoable approximatios of α ad β, respectively, possessig particular desirable properties. I geeral a estimator of a ukow parameter is a fuctio of the data that serves as a approximatio of the parameter ivolved. It follows from () that ˆα ad ˆβ are fuctios of the data, (Y,X ),...,(Y,X ). Because ˆα ad ˆβ will be used as approximatios of α ad β, respectively, ad were obtaied by miimizig the squared errors, we will call ˆα ad ˆβ the Ordiary 2 Least Squares (OLS) estimators of α ad β, respectively. 3. Ubiasedess The first property of ˆα ad ˆβ is that they are ubiased estimators of α ad β: Propositio. Uder Assumptios II ad IV the OLS estimators ˆα ad ˆβ are ubiased, which meas that E[ˆα] = α ad E[ˆβ] ' β. This result follows from the fact that we ca write $" ' " % X(X & & X ).U ' i' (X i &, $$ ' $ % ' (X & X )U. X ) 2 ' i' (X i & (3) X ) 2 See the Appedix. 2 The estimators ˆα ad ˆβ are called "Ordiary" least squares estimators to distiguish them from "Noliear" least squares estimators. 5

3.2 The variaces of ˆα ad ˆβ. Our ext issue cocers the variaces of followig two lemmas are coveiet. ˆα ad ˆβ. For derivig these variaces the Lemma. Let U, U 2,...,U be idepedet radom variables with zero mathematical expectatio (thus E(U ) = 0) ad variace σ 2. (Thus E[(U -E(U )) 2 ] = E(U 2 ) = σ 2 ). Let v, v 2,...,v ad w,w 2,...,w be give costats. The E[(' v U )(' w U )] ' σ2 ' v w. Proof. See the Appedix. Note that if we choose v ' w for ',2,..., i Lemma the it reads: Lemma 2. Let U, U 2,...,U be idepedet radom variables with zero mathematical expectatio ad variace σ 2. Let w,w 2,...,w be give costats. The E[(' w U )2 ] ' σ 2 ' w 2. Usig (3) ad Lemmas ad 2 it ca be show that Propositio 2. Uder the assumptios I - IV, var($") ' F 2 ' X 2 ' (X & X) 2 ' F 2 $", say, var( $$) ' F 2 ' (X & X) 2 ' F 2 $$, say, ad cov($",$$) ' &F 2 X ' (X & X) 2. (4) Proof. See the Appedix 3.3 Normality of ˆα ad ˆβ. If we also assume ormality of the error terms U the ˆα ad ˆβ are also ormally distributed. This result follows from the followig lemma. 6

Lemma 3. Let Z, Z 2,...Z m be idepedet N(µ,σ 2 ) distributed radom variables ad let w,..,w m be costats. The ' m w Z is distributed N[(' m w )µ,('m w 2 )σ2 ]. The proof of this lemma requires advaced probability theory ad is therefore omitted. It follows ow straightforwardly from Propositio 2, Lemma 3, ad (3) that: Propositio 3. Uder the assumptios I - V, $" & " - N 0, F 2 ' X 2 F, $$ & $ - N 0, 2, ' (X & X ) 2 ' (X & X ) 2 (5) where - is the symbol for is distributed as. Moreover, applyig Lemma 3 agai for m = it follows from (5) (Exercise: Why?) that Propositio 4. Uder the assumptios I - V, ($" & ") ' (X & X ) 2 F. ' X 2 - N[0,], ($$ & $) ' (X & X ) 2 F - N[0,]. (6) These results play a key-role i testig hypotheses about α ad β. The oly problem that prevets us from usig these results for testig is that σ is ukow. This problem will be addressed i the ext sectio. 4. How to estimate the error variace σ 2? If α ad β were kow the we could estimate σ 2 by #F 2 ' (Y & " & $.X ) 2 ' U 2. (7) However, α ad β are ot kow, but we do have OLS estimators of α ad β. This suggests to 7

replace α ad β i (7) by their OLS estimators: where #F 2 ' (Y & $" & $$.X ) 2 ' $U 2, (8) $U ' Y & $" & $$.X (9) is called the regressio residual. However, the estimator (8) is biased, due to the fact that Propositio 5. Uder the assumptios I - V, E[' Û 2 ] ' ( & 2)σ2. Proof: See the Appedix. This result suggests to use $F 2 ' &2 $U 2 (0) as a estimator of σ 2 istead of (8), because the by Propositio 5, ˆσ 2 is a ubiased estimator of σ 2 : The sum ' Û 2 Residual Sum of Squares (RSS), ad shortly SER. Thus, E[$F 2 ] ' F 2. () is called the Sum of Squares Residuals, shortly SSR, or also called the ˆσ ' ˆσ 2 is called the Stadard Error of the Residuals, SSR ' ' $ U 2, SER ' ' $ U 2 &2 ' SSR &2 (' $F). (2) Fially, ote that the sum of squared residuals ca be computed as follows: See the Appedix. SSR ' (Y & Ȳ ) 2 & $$ 2 (X & X ) 2. (3) 8

5. Stadard errors, t-values ad p-values of the OLS estimators The variaces of ˆα ad ˆβ ca ow be estimated by replacig σ 2 i (4) by ˆσ 2 : Estimated var( $") ' Estimated var( $$) ' $F 2 ' X 2 ' (X & X) 2 $F 2 ' (X & X) 2 ' $F 2 $", say, ' $F 2 $$, say. (4) The ˆσˆα ' ˆσ 2ˆα is called the stadard error of ˆα, also deoted by SE(ˆα), ad ˆσˆβ ' ˆσ2ˆβ is called the stadard error of ˆβ, also deoted by SE(ˆβ). chage: If we replace σ i Propositio 4 by the SER, ˆσ, the stadard ormality results ivolved Propositio 6. Uder the assumptios I - V, $"&" $F $" ' ($" & ") ' (X & X ) 2 $F. ' X 2 - t &2, $$&$ $F $ $ ' ( $$ & $) ' (X & X ) 2 $F - t &2. (5) The proof of Propositio 6 is based o the fact that uder these assumptios, SSR/σ 2 is distributed χ 2 &2 ad is idepedet of ˆα ad ˆβ, but the proof ivolved requires advaced probability theory ad is therefore omitted. Because for large degrees of freedom the t distributio is approximately equal to the stadard ormal distributio, ad due to the cetral limit theorem, Propositio 4 holds if is large ad the errors are ot ormally distributed, we also have: Propositio 7. If the sample size is large the uder the assumptios I - IV we have approximately, $"&" $F $" ' ($" & ") ' (X & X ) 2 $F. ' X 2 - N(0,), $$&$ $F $ $ ' ( $$ & $) ' (X & X ) 2 $F - N(0,). (6) 9

The results i Propositio 6 ow eable us to test hypotheses about α ad β. I particular the ull hypothesis that β = 0 is of importace, because this hypothesis implies that X has o effect o Y. The test statistic for testig this hypothesis is the t-value (or t-statistic) of ˆβ: def. $t $ (' t&value of $$ $$) ' $F $ ' $$ ' (X & X ) 2 $F - t &2 if $ ' 0. (7) If β > 0 ad 6 4 the the t-value of ˆβ coverges i probability to +4, ad if β < 0 ad 6 4 the the t-value of ˆβ coverges i probability to!4. Moreover, if the sample size is large the by Propositio 7 we may use the stadard ormal distributio istead of the t distributio to fid critical values of the test. Similarly, def. t $ $" (' t&value of $") ' $" $F $" - t &2 if " ' 0. (8) However, the hypothesis α = 0 is ofte of o iterest. I the ice cream example, ' (X & X ) 2 ' 8 Y ' (X & X ) 2 ' 8. 4.24264, ad by (3), ' (Y & Ȳ ) 2 ' ' Y 2 &.Ȳ 2 ' 020 & 8 2 ' 52 ˆσ 2 ' &2 Û 2 ' &2 (Y & Ȳ ) 2 & ˆβ 2 &2 (X & X ) 2 Hece, ' 52 & (.5)2.8 8&2 '.5 6..96667 Y ˆσ..384437 $$ t $ $ ' ' (X & X ) 2 $F '.5 4.24264.384437. 4.597 (9) Assumig that the coditios of Propositio 6 hold, the ull hypothesis H 0 : β ' 0 ca be tested 0

agaist the alterative hypothesis H : β 0 usig the two-sided t-test at say the 5% sigificace level, as follows. Uder the ull hypothesis, (9) is a radom drawig from the t distributio with!2 = 6 degrees of freedom. Look up i the table of the t distributio the value t ( such that for T - t 6, P[ T > t ( ] ' 0.05. This value is t ( ' 2.447. The accept the ull hypothesis if &t ( ' &2.447 # ˆtˆβ # 2.447 ' t (, ad reect the ull hypothesis i favor of the alterative hypothesis if hypothesis ˆt β > t ( ' 2.447. Thus, i the ice cream example we reect the ull H 0 : β ' 0 because ˆt β ' 4.597 > 2.447 ' t (. This test is illustrated i Figure 2 below. The curved lie i Figure 2 is the desity of the t distributio with 6 degrees of freedom. The grey areas are each 0.025, so that the total grey area is 0.05. Figure 2 Two-sided t-test of H 0 : β ' 0 agaist the alterative hypothesis H : β 0. The ull hypothesis H 0 : β ' 0 ca be tested agaist the alterative hypothesis H : β > 0 at the 5% sigificace level by the right-sided t-test. Now look up i the table of the t distributio the value t ( such that for T - t 6, P[T > t ( ] ' 0.05. This value correspods to the critical value of the two-sided t-test at the 0% sigificace level: ull hypothesis if hypothesis if t ( '.943. The accept the ˆtˆβ # t ( '.943, ad reect the ull hypothesis i favor of the alterative ˆt β > t ( '.943. Thus, i the ice cream case we reect the ull hypothesis

H 0 : β ' 0 i favor of the alterative hypothesis H : β > 0. This right-sided t-test is illustrated i Figure 3 below. Agai, the curved lie i Figure 3 is the desity of the t distributio with 6 degrees of freedom, ad the grey area is 0.05. Figure 3 Right-sided t-test of H 0 : β ' 0 agaist the alterative hypothesis H : β > 0. If the sample size is large, so that ˆtˆβ - N(0,) if β ' 0, the a alterative way of testig the ull hypothesis β = 0 agaist the alterative hypothesis β 0 is to use the (two-sided) p-value: For example, if def. $p $ (' p&value of $$) ' P[ U > $ t $ ], where U - N(0,). (20) ˆpˆβ < 0.05 we reect the ull hypothesis β = 0 i favor of the alterative hypothesis β 0 at the 5% sigificace level, ad if = 0. The p-value for ˆα is defied ad used similarly. ˆpˆβ $ 0.05 we accept the ull hypothesis β Although a t-value is a test statistics of the ull hypothesis that the correspodig coefficiet i the regressio model is zero, it is quite easy to rebuild the t-value for testig other ull hypotheses, as follows. Suppose you wat to test the ull hypothesis that is a give umber, for example β 0 '. The β ' β 0, where β 0 2

$$&$ 0 $F $ $ ' $ $ $F $ $ & $ 0 $F $ $ ' $ $ $F $ $ & $ 0 $ $ $$$F $ $ ' $ $ $F $ $ & $ 0 $$ ' $$&$ 0 $$. $ t $ $, (2) so that by Propositio 5, $t $ $,$'$ 0 ' $$&$ 0 $$. $ t $ $ - t &2. (22) For example, suppose that i the ice cream case we wat to test the ull hypothesis H 0 : β '. The t $ $,$' $ ' $& $.$t $ $$ '.5&.5 4.597..532, (23) which uder the ull hypothesis H 0 : β ' is a radom drawig from the t distributio with 6 degrees of freedom. Note that the value of this test statistic is i the acceptace regios i Figures 2 ad 3. This trick is useful if the ecoometric software you are usig oly reports the t-values but ot the stadard errors. If the stadard errors are reported, you ca compute ˆtˆβ,β'β directly as 0 ˆtˆβ,β'β ' (ˆβ&β Of course, if oly the stadard errors are reported ad ot the t-values you 0 0 )/ˆσˆβ. ca compute the t-value of ˆβ as ˆtˆβ ' ˆβ/ ˆσˆβ. 6. The R 2 The R 2 of a regressio model compares the sum of squared residuals (SSR) of the model with the SSR of a regressio model without regressors: Y ' " % U, ',2,...,. (24) It is easy to verify that the OLS estimator α of α is ust the sample mea of the Y s: #" ' Ȳ ' Y. 3

Therefore, the SSR of regressio model (24) is Squares (TSS), is ' (Y &Ȳ )2, which is called the Total Sum of The R 2 is ow defied as: TSS ' (Y & Ȳ ) 2. (26) R 2 def. ' & SSR TSS. (27) The R 2 is always betwee zero ad oe, because SSR # TSS. (Exercise: Why?) If SSR = TSS, so that R 2 = 0, the model (24) explais the depedet variable other words, the explaatory variables s equally well as model (2). I Y i (2) do ot matter: β = 0. The other extreme case is where R 2 =, which correspods to SSR = 0. The the depedet variable X i model (2) is completely explaied by X, without error: / Thus, the R 2 Y α % βx. measures how well the explaatory variables X are able to explai the correspodig depedet variables Y. For example, i the ice cream case, SSR =.5 ad TSS = 52, hece R 2 = 0.778846. Loosely speakig, this meas that about 78% of the variatio of ice cream sales ca be explaied by the variatio i temperature. Y 7. Presetig regressio results Whe you eed to report regressio results you should iclude, ext to the OLS estimates of course, either the correspodig t-values or the stadard errors, the sample size, the stadard error of the residuals (SER), ad the R 2, because this iformatio will eable the reader to udge your results. For example, our ice cream estimatio results should be displayed as either Sales ' &0.25 %.5Temp., ' 8, SER '.384437, R 2 ' 0.778846 (&0.00) (4.597) or (t&values betwee brackets) 4

Sales ' &0.25 %.5Temp., ' 8, SER '.384437, R 2 ' 0.778846 (2.49583) (0.32632) (stadard errors betwee brackets) It is helpful to the reader if you would idicate whether you have displayed the t-values betwee brackets or the stadard errors, but you oly eed to metio this oce. 8. Out-of-sample forecastig The liear regressio model was itroduced as a forecastig scheme. The questio we ow address is: How reliable is a out-of-sample forecast? Cosider the liear regressio model (2), ad suppose we observe X %. The the forecast of is Yˆ % ' ˆα % ˆβ.X %, where the OLS estimators ˆα ad ˆβ are computed o the basis of Y % the observatios for =,2,...,. The actual but ukow value of so that the forecast error is: Y % = α + β.x % % U %, Y % is Y % & $ Y % ' U % & ($"&") & ($$&$).X % ' U % & % (X % & X )(X & X ).U ' i' (X i &. (28) X ) 2 See the Appedix for the latter equality. It follows ow from Lemma 3 that uder Assumptios I through V, Y % & ˆ Y % - N[0,σ 2 Y % &Ŷ % ], where F 2 Y % & $Y % ' F 2 % % (X % & X ) 2 ' (X & X ) 2. (29) See the Appedix. Deotig, $F 2 Y % & $Y % ' $F 2 % % (X % & X ) 2 ' (X & X ) 2, (30) it follows ow similar to Propositio 6 that 5

Propositio 8. Uder assumptios I - V, (Y % & ˆ Y % )/ ˆσ Y% &Ŷ % - t &2. This result ca be used to costruct a 95% cofidece iterval, say, of Y %. Look up i the table of the t distributio the critical value t ( of the two-sided t-test with!2 degrees of freedom. The it follows from Propositio 7 that 0.95 ' P[&t ( # (Y % & $ Y % )/$F Y% & $Y % # t ( ] ' P[&t ( $F Y% & $Y % # Y % & $ Y % # t ( $F Y% & $Y % ] (3) ' P[ $ Y % & t ( $F Y% & $Y % # Y % # $ Y % % t ( $F Y% & $Y % ] Thus, the 95% cofidece iterval of Y % is [ ˆ Y % & t (ˆσ Y% &Ŷ %, ˆ Y % % t (ˆσ Y% &Ŷ % ]. Observe from (30) that ˆσ Y% &Ŷ icreases with (X ad so does the width of the % % & X ) 2, cofidece iterval. Thus, the father X % is away from X, the more ureliable the forecast Yˆ % of Y % becomes. Also observe from (30) that ˆσ Y% &Ŷ $ ˆσ, ad that ˆσ gets close to % Y% &Ŷ ˆσ % if is large because lim 64 ' (X & X ) 2 ' 4. 9. Relaxig the o-radom regressor assumptio As said before, the assumptio that the regressors X are o-radom is too strog a assumptio i ecoomics. Therefore, we ow assume that the X s are radom variables. This requires the followig modificatios of the Assumptios I-V: Assumptio I * : The pairs (X,Y ), ',2,3,...,, are idepedet ad idetically distributed. Assumptio II * : The coditioal expectatios E[U X ] are equal to zero: E[U X ] / 0. Assumptio III * : The coditioal expectatios fiite, costat ad equal: assumptio.) E[U 2 X ] / σ 2 < 4. E[U 2 X ] do ot deped o the X 's ad are (This is called the homoscedasticity 6

Assumptio IV * : Coditioal o X, U is N(0,σ 2 ) distributed. The Assumptios I * ad II * imply that for =,...,, E[U X,X 2,...,X ] / 0, (32) ad similarly the Assumptios I * ad III * imply that for =,...,, E[U 2 X,X 2,...,X ] / F 2. (33) Because (loosely speakig) coditioig o X,X 2,...,X is effectively the same as treatig them as give costats, most of the previous propositios carry over: Propositio 9. Uder Assumptios I * -IV *, Propositios ad 4 through 7 carry over, ad the results i Propositios 2 ad 3 ow hold coditioal o X,X 2,...,X. However, without Assumptio IV * we eed a additioal coditio i Propositio 6 i order to use the cetral limit theorem, amely: Propositio 0. If the sample size is large the uder the assumptios I * - III * ad the additioal coditio E[X 2 ] < 4 the approximate ormality results i Propositio 7 carry over. Moreover, without Assumptio IV * the Propositios 6 ad 8 are o loger true. As to Propositio 6, this ot a big deal, as i large samples we ca still use Propositio 7, but without Assumptio IV * we ca o loger derive cofidece itervals for the forecasts, as these cofidece itervals are based o Propositio 8. It is therefore importat to test the ormality assumptio. 0. Testig the ormality assumptio that For a ormal radom variable U with zero expectatio ad variace σ 2 it ca be show 7

def. Kurtosis ' def. Skewess ' E[U 4 ]/F 4 & 3 ' 0, E[U 3 ] ' 0 (34) Therefore, the ormality coditio ca be tested by testig whether the kurtosis ad the skewess of the model errors are zero, usig the residuals. This is the idea behid the Jarque-Bera 3 ad Kiefer-Salmo 4 tests. Uder the ull hypothesis (34) the test statistic ivolved has a χ 2 2 distributio. Heteroscedasticity 5 does ot hold: We say that the errors U of regressio model (2) are heteroskedastic if assumptio III * E[U 2 X ] ' R(X ) for some fuctio R(.). (35) Heteroscedasticity ofte occurs i practice. It is actually the rule rather tha the exceptio. The mai cosequece of heteroscedasticity is that the coditioal variace formulas i Propositios 2 ad 3 do o loger hold, although the ubiasedess result i Propositio is ot affected by heteroscedasticity. Therefore, the Propositios 4-8 are o loger valid as well. I particular, the coditioal variace of ˆβ [see (60)] uder heteroscedasticity takes the form var($$ X,...,X ) ' E[($$&$) 2 X,...,X ] ' ' (X & X ) 2 R(X ) ' i' (X i & X ) 2 2. (36) A cure for the heteroscedasticity problem is to replace the stadard error of ˆβ by 3 Jarque, C.M.ad A.K. Bera, (980), "Efficiet Tests for Normality, Homoscedasticity ad Serial Idepedece of Regressio Residuals". Ecoomics Letters 6, 255BB259. 4 Kiefer, N. ad M. Salmo (983), "Testig Normality i Ecoometric Models", Ecoomic Letters, 23-27. 5 Also spelled as "Heteroskedasticity." 8

#F $ $ ' &2 ' (X & X ) 2 $U 2 ' i' (X i & X ) 2 2. (37) This is kow as the Heteroscedasticity Cosistet (H.C.) stadard error. The H.C. t-value the becomes tˆβ ' ˆβ/ σˆβ. Uder the ull hypothesis β = 0 this t-value is o loger t distributed, but the stadard ormal approximatio remais valid if the sample size is large. A popular test for heteroscedasticity is the Breusch-Paga 6 test. Give that E[U 2 X ] ' g(( 0 % ( X ) for some ukow fuctio g(.). (38) the Breusch-Paga test tests the ull hypothesis agaist the alterative hypothesis H 0 : ( ' 0 ] E[U 2 X ] ' g(( 0 ) ' F 2, say (39) H 0 : ( 0 ] E[U 2 X ] ' g(( 0 %( X ) ' R(X ), say. (40) Uder the ull hypothesis (39) of homoskedasticity the test statistic of the Breusch-Paga test has a χ 2 distributio 7, ad the test is coducted right-sided. 2. How close are OLS estimators? The ice cream data i Table is ot based o ay actual observatios o sales ad temperature; I have picked the umbers for X ad Y quite arbitrarily. Therefore, there is o way to fid out how close the OLS estimates ˆα ' &0.25, ˆβ '.5 are to the ukow parameters α ad β. Actually, we do ot kow either whether the liear regressio model (2) ad its assumptios are applicable to this artificial data. I order to show how well OLS estimators approximate the correspodig parameters I 6 Breusch, T. ad A. Paga (979), "A Simple Test for Heteroscedasticity ad Radom Coefficiet Variatio", Ecoometrica 47, 287-294. 7 I the multiple regressio case the degrees of freedom is equal to the umber of parameters mius for the itercept. 9

have geerated radom samples 8 (Y,X ),...,(Y,X ) for three sample sizes: = 0, = 00 ad = 000, as follows. The explaatory variables distributio, the regressio errors distributio, ad the U X have bee draw idepedetly from the χ 2 have bee draw idepedetly from the N(0,) Y s have bee geerated by Y ' % X % U, ',2,...,. (4) Thus, i this case the parameters α ad β i model (2) are α = ad β =, ad the stadard error of U is σ =. Moreover, ote that the Assumptios I * -IV * hold for model (4). The true R 2 ca be defied by R 2 0 ' & E[SSR] E[TSS] ' & (&2)σ 2. ' E[(Y &Ȳ )2 ] I the case (4), σ 2 ', µ Y ' E(Y ) ' % E(X ) ' 2, ' E[(Y &Ȳ )2 ] ' E' (Y &µ Y ) & (Ȳ&µ Y ) 2 ' E' (Y &µ Y )2 & (Ȳ&µ Y ) 2 ' (&)var(y ) ad var(y ) ' E[(X & % U ) 2 ] ' E[(X & ) 2 ] % E[U 2 ] ' E[(X & ) 2 ] % ' 3, because X is χ 2 distributed ad therefore has the same distributio as U 2, ad it ca be show that for stadard ormal radom variables is U, E[(U 2 &)2 ] ' 2. Thus, the true R 2 i this case R 2 0 ' & &2 3(&) ' 2& 3&3. 0.7037 for ' 0 0.6700 for ' 00 0.6670 for ' 000 The estimatio results ivolved are give i Table 2: 8 Via the EasyReg Iteratioal meus File 6 Choose a iput file 6 Create artificial data. Rather tha geeratig oe radom sample of size = 000 ad the usig subsamples of sizes = 0 ad = 00, these samples have bee geerates separately for = 0, = 00 ad = 000. 20

Table 2: Artificial regressio estimatio results ˆβ ˆα SER (' ˆσ) R 2 estimate:.748 0.5592 0.99045 0.8842 0 (t&value): (7.87) (.675) estimate:.03309 0.96028 0.992502 0.8284 00 (t&value): (2.753) (8.237) estimate:.02360 0.9858 0.983608 0.6899 000 (t&value): (47.24) (26.037) Eve for a sample size of = 0 the OLS estimator ˆβ is already pretty close to its true value, ad the same applies to ˆσ, but ˆα is too far away from the true value α =. However, for = 00 the OLS estimators ˆβ ad ˆα deviate oly about ±4% from their true values α = β =, ad deviates about -% from its true value. I the case = 000 these deviatios reduce to about ±2%. The R 2 's are too high, ad oly for = 000 is the R 2 reasoably close to its true value. However, the R 2 is oly a descriptive statistic; it does ot play a role i hypotheses testig, so that the ureliability of the R 2 i small samples is harmless. Notice the quite dramatic icrease of the t-values. Recall that these t-values are the test statistics of the ull hypotheses that the correspodig parameters are zero. Because the true parameters are equal to, what you see i Table 2 is the icrease of the power of the t-test with the sample size. ˆσ 2

APPENDIX Proof of (): The first-order coditios for a miimum of Q(ˆα,ˆβ) ' ' (Y & ˆα & ˆβX ) 2 are: dq($",$$)/d$" ' 0 ] 2(Y & $" & $$X )(&) ' 0 ] (Y & $" & $$X ) ' 0 ] Y & $" & ($$X ) ' 0 ] Y ' $" % $$ X ' 0 ] Ȳ ' $" % $$. X, (42) ad dq($",$$)/d$$ ' 0 ] ] ] ] ] 2(Y & $" & $$X )(&X ) ' 0 (Y X & $"X & $$X 2 ) ' 0 X Y & $" X Y ' $" X & $$ X % $$ X Y ' $" X % $$ X 2 ' 0 X 2 X 2 (43) where X ' (/)' X ad Ȳ ' (/)' Y are the sample meas of the X 's ad Y 's, respectively. The last equatios i (42) ad (43) are called the ormal equatios: Ȳ ' $" % $$. X, (44) X Y ' $". X % $$ X 2. (45) To solve these ormal equatios, substitute ˆα ' Ȳ & ˆβ. X i (45). The we get 22

hece X Y ' (Ȳ & ˆβ X) X % ˆβ X 2 ' Ȳ. X & ˆβ X 2 % ˆβ ' X.Ȳ % ˆβ X 2 X 2 & X 2 X Y & X.Ȳ ' $$ X 2 & X 2. (46) Equatio (46) ca also be writte as (X & X)(Y & Ȳ ) ' $$ (X & X ) 2, (47) because ad similarly (X & X)(Y & Ȳ ) ' ' ' X Y & X.Y & X.Ȳ % X.Ȳ X Y & X. X Y & X.Ȳ (X & X ) 2 ' X 2 & X 2. Y & Ȳ. X % X.Ȳ (48) (49) Moreover, (X & X)(Y & Ȳ ) ' (X & X)Y & (X & X)Ȳ ' (X & X)Y & ( X & X)Ȳ ' (X & X)Y (50) 23

The result () ow follows from (44) ad (46) through (50). Proof of Propositio. Recall from () that $$ ' ' (X & X )Y ' (X & X ) 2. (5) Substitute model (2) i (5). The $$ ' ' (X & X )("%$X %U ) ' (X & X ) 2 ' "' (X & X ) % $' (X & X )X % ' (X & X )U ' (X & X ) 2 ' $. ' (X & X )X ' (X & X ) 2 % ' (X & X )U ' (X & X ) 2 (52) ' $ % ' (X & X )U ' (X & X ) 2, where the last step follows from the fact that similar to (50), (X & X) 2 ' (X & X)(X & X) ' (X & X )X. (53) Now take the mathematical expectatio at both sides of (52). The, E[$$] ' $ % E ' (X & X)U ' (X & X) 2 ' $ % ' (X & X)E(U ) ' $, ' (X & X) 2 (54) because takig the mathematical expectatio of a costat (β) does ot effect that costat, ad takig the mathematical expectatio of a liear fuctio of radom variables is equal to takig the liear fuctio of the mathematical expectatio of these radom variables. The last coclusio i (54) follows from assumptio II, ad the secod step i (54) ca be take because 24

we have assumed that the X 's are o-radom (assumptio IV). Next cosider ˆα. We have already established that ˆα ' Ȳ & ˆβ. X. Substitutig the right- had side of (52) for ˆβ i this equatio yields $" ' Ȳ & $ % ' (X & X )U ' (X & X ) 2. X ' Ȳ & $. X & ' X(X & X )U. ' (X & X ) 2 (55) Substitutig i (55) yields Ȳ ' Y ' (α%βx %U ) ' α % β. X % U $" ' " % U & ' X(X & X )U ' i' (X i & X ) 2 ' " % X(X & & X ).U ' i' (X i &. X ) 2 (56) Similar as for ˆβ we therefore have: E[$"] ' " % X(X & & X ) E[U ' i' (X i & ] ' ". X ) 2 (57) This completes the proof of Propositio. Proof of Lemma : We have E ' v U ' w U ' E' i' ' v i w U i U where the last equality i (58) follows from ' ' i' v w F 2, v i w E(U i U ) E(U i U ) ' E(U i )E(U ) ' 0 if i, (58) ' E(U 2 ) ' F2 if i '. (59) 25

Proof of Propositio 2: It follows from formula (52) ad Lemma 2 that var($$) ' E[($$&$) 2 ] ' E X & X ' i' (X i & X ) 2 U 2 ' F 2 X & X ' i' (X i & X ) 2 ' F 2 ' (X & X ) 2 ' i' (X i & X ' ' ) 2 2 F2 (X & X ) 2 ' (X & X ' F2. ) 2 2 ' (X & X ) 2 2 (60) Similarly, it follows from formula (56) ad Lemma 2 that var($") ' E[($"&") 2 ] ' E & X(X & X) ' i' (X i & X) 2 U 2 ' F 2 & X(X & X) ' i' (X i & X) 2 2 ' F 2 2 & 2 X(X & X) % ' i' (X i & X) 2 X 2 (X & X ) 2 ' i' (X i & X ) 2 2 ' F 2 & 2 X(/)' (X & X) % ' i' (X i & X) 2 X 2 ' (X & X ) 2 ' i' (X i & X ) 2 2 (6) ' F 2 % X 2 ' (X & X ) 2 ' F 2 (/)' (X & X ) 2 % X 2 ' (X & X ) 2 ' F 2 ' X 2 ' (X & X) 2, where the last equality follows from the fact that (/)' (X & X ) 2 ' (/)' X 2 & X 2. Fially, it follows from Lemma ad the formulas (52) ad (56) that 26

cov($",$$) ' E[($"&")($$&$)] ' E ' F 2 & X(X & X) X(X & & X ) (X & X ) ' i' (X i & X ) 2 ' i' (X i & X ) 2 U ' i' (X i & X) 2 X & X ' i' (X i & X ) 2 U (62) which ca be rewritte as (/)' cov($",$$) ' F 2 (X & X ) & X' (X & X ) 2 ' i' (X i & X ) 2 2 Proof of Propositio 5. Observe first from (44) ad (9) that ' &F 2. X ' (X & X ) 2. (63) so that we ca write $U ' Ȳ & $" & $$. X ' 0 (64) $U ' $U & i' $U i ' (Y & Ȳ ) & $$.(X & X ). (65) Next, observe from (2) that Substitutig the former equatio i (65) yields hece Y & Ȳ ' U & Ū % β.(x & X ), where Ū ' (/)' U. $U ' (U & Ū ) & ($$&$)(X & X ), (66) $U 2 ' (U &Ū ) & ($$&$)(X & X ) 2 ' (U &Ū ) 2 & 2($$&$) (X & X )(U &Ū ) % ($$&$) 2 (X & X ) 2 (67) ' (U &Ū ) 2 & 2($$&$) (X & X )U % ($$&$) 2 (X & X ) 2, 27

where the last equality follows from the fact that ' (X & X )Ū ' 0. It follows from (52), (67) ad the equality ' (U &Ū )2 ' ' U 2 & Ū 2 that $U 2 ' ' (U &Ū ) 2 & ($$&$) 2 (X & X ) 2 ' U 2 U 2 & ' i' U i 2 & ( $$&$) 2 (X & X ) 2. & Ū 2 & ($$&$) 2 (X & X ) 2. (68) Takig expectatios ad usig Lemma 2 ad Propositio 2 it follows ow from (68) that E[' $ U 2 ] ' ' E[U 2 ] & E ' i' U 2 i & E( $$&$) 2 ' (X & X ) 2 ' F 2 & F 2 & F 2 ' (&2)F 2. (69) Proof of (3): SSR ' ' ' ' $U 2 ' (Y & $" & $$.X ) 2 ' (Y & Ȳ ) & $$.(X & X ) 2 (Y & Ȳ ) 2 & 2$$ (Y & Ȳ ) 2 & $$ 2 (X & X ) 2. (Y & (Ȳ&$$. X ) & $$.X ) 2 (Y & Ȳ )(X & X ) % $$ 2 (X & X ) 2 (70) Proof of (28): It follows from (3) that Y % & $ Y % ' U % & ($"&") & ($$&$).X % ' U % & X(X & & X ) ' i' (X i & X ) 2.U & X % (X & X ) ' i' (X i & X ) 2 U (7) ' U % & % (X % & X )(X & X ).U ' i' (X i &. X ) 2 28

Proof of (29): It follows from (28) ad Lemma 3 that F 2 Y % & $Y % ' F 2 % % (X % & X )(X & X ) ' i' (X i & X ) 2 2.F 2 ' F 2 % % 2. (X % & X )' (X & X ) ' i' (X i & X ) 2 % (X % & X )2 ' (X & X ) 2 (' i' (X i & X ) 2 ) 2 (72) ' F 2 % % (X % & X ) 2 ' (X & X ) 2. 29