The simple linear Regression Model



Similar documents
Simple Linear Regression

STATISTICAL PROPERTIES OF LEAST SQUARES ESTIMATORS. x, where. = y - ˆ " 1

ANOVA Notes Page 1. Analysis of Variance for a One-Way Classification of Data

APPENDIX III THE ENVELOPE PROPERTY

Curve Fitting and Solution of Equation

ADAPTATION OF SHAPIRO-WILK TEST TO THE CASE OF KNOWN MEAN

n. We know that the sum of squares of p independent standard normal variables has a chi square distribution with p degrees of freedom.

SHAPIRO-WILK TEST FOR NORMALITY WITH KNOWN MEAN

Regression Analysis. 1. Introduction

Average Price Ratios

Basic statistics formulas

Lecture 7. Norms and Condition Numbers

CHAPTER 13. Simple Linear Regression LEARNING OBJECTIVES. USING Sunflowers Apparel

IDENTIFICATION OF THE DYNAMICS OF THE GOOGLE S RANKING ALGORITHM. A. Khaki Sedigh, Mehdi Roudaki

1. The Time Value of Money

CIS603 - Artificial Intelligence. Logistic regression. (some material adopted from notes by M. Hauskrecht) CIS603 - AI. Supervised learning

Preparation of Calibration Curves

Constrained Cubic Spline Interpolation for Chemical Engineering Applications

3 Multiple linear regression: estimation and properties

Taylor & Francis, Ltd. is collaborating with JSTOR to digitize, preserve and extend access to The Journal of Experimental Education.

Credibility Premium Calculation in Motor Third-Party Liability Insurance

Report 52 Fixed Maturity EUR Industrial Bond Funds

Forecasting Trend and Stock Price with Adaptive Extended Kalman Filter Data Fusion

Data Analysis Toolkit #10: Simple linear regression Page 1

The Gompertz-Makeham distribution. Fredrik Norström. Supervisor: Yuri Belyaev

10.5 Future Value and Present Value of a General Annuity Due

Beta. A Statistical Analysis of a Stock s Volatility. Courtney Wahlstrom. Iowa State University, Master of School Mathematics. Creative Component

Optimal replacement and overhaul decisions with imperfect maintenance and warranty contracts

Settlement Prediction by Spatial-temporal Random Process

Stanislav Anatolyev. Intermediate and advanced econometrics: problems and solutions

MDM 4U PRACTICE EXAMINATION

Chapter = 3000 ( ( 1 ) Present Value of an Annuity. Section 4 Present Value of an Annuity; Amortization

Numerical Methods with MS Excel

ISyE 512 Chapter 7. Control Charts for Attributes. Instructor: Prof. Kaibo Liu. Department of Industrial and Systems Engineering UW-Madison

Normal Distribution.

Design of Experiments

Statistical Pattern Recognition (CE-725) Department of Computer Engineering Sharif University of Technology

CH. V ME256 STATICS Center of Gravity, Centroid, and Moment of Inertia CENTER OF GRAVITY AND CENTROID

Proceedings of the 2010 Winter Simulation Conference B. Johansson, S. Jain, J. Montoya-Torres, J. Hugan, and E. Yücesan, eds.

T = 1/freq, T = 2/freq, T = i/freq, T = n (number of cash flows = freq n) are :

Questions? Ask Prof. Herz, General Classification of adsorption

Reinsurance and the distribution of term insurance claims

A Hierarchical Fuzzy Linear Regression Model for Forecasting Agriculture Energy Demand: A Case Study of Iran

SPATIAL INTERPOLATION TECHNIQUES (1)

Fundamentals of Mass Transfer

Application of geographic weighted regression to establish flood-damage functions reflecting spatial variation

Are International Remittances Altruism or Insurance? Evidence from Guyana Using Multiple-Migrant Households* August 1999

Properties of MLE: consistency, asymptotic normality. Fisher information.

Chapter 3. AMORTIZATION OF LOAN. SINKING FUNDS R =

Optimal multi-degree reduction of Bézier curves with constraints of endpoints continuity

Abraham Zaks. Technion I.I.T. Haifa ISRAEL. and. University of Haifa, Haifa ISRAEL. Abstract

Preprocess a planar map S. Given a query point p, report the face of S containing p. Goal: O(n)-size data structure that enables O(log n) query time.

CSSE463: Image Recognition Day 27

ECONOMIC CHOICE OF OPTIMUM FEEDER CABLE CONSIDERING RISK ANALYSIS. University of Brasilia (UnB) and The Brazilian Regulatory Agency (ANEEL), Brazil

An Introduction To Error Propagation: Derivation, Meaning and Examples C Y

International Journal of Business and Social Science Vol. 2 No. 21 [Special Issue November 2011]

CHAPTER 2. Time Value of Money 6-1

IAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results

The paper presents Constant Rebalanced Portfolio first introduced by Thomas

Z-TEST / Z-STATISTIC: used to test hypotheses about. µ when the population standard deviation is unknown

Speeding up k-means Clustering by Bootstrap Averaging

Measuring the Quality of Credit Scoring Models

Security Analysis of RAPP: An RFID Authentication Protocol based on Permutation

of the relationship between time and the value of money.

Dynamic Two-phase Truncated Rayleigh Model for Release Date Prediction of Software

Finite Difference Method

M. Salahi, F. Mehrdoust, F. Piri. CVaR Robust Mean-CVaR Portfolio Optimization

Near Neighbor Distribution in Sets of Fractal Nature

A NON-PARAMETRIC COPULA ANALYSIS ON ESTIMATING RETURN DISTRIBUTION FOR PORTFOLIO MANAGEMENT: AN APPLICATION WITH THE US AND BRAZILIAN STOCK MARKETS 1

Classic Problems at a Glance using the TVM Solver

Response surface methodology

Methods and Data Analysis

Conversion of Non-Linear Strength Envelopes into Generalized Hoek-Brown Envelopes

The analysis of annuities relies on the formula for geometric sums: r k = rn+1 1 r 1. (2.1) k=0

Software Aging Prediction based on Extreme Learning Machine

Sampling Distribution And Central Limit Theorem

Chapter Eight. f : R R

Confidence Intervals for One Mean

An Approach to Evaluating the Computer Network Security with Hesitant Fuzzy Information

AP Statistics 2006 Free-Response Questions Form B

Relaxation Methods for Iterative Solution to Linear Systems of Equations

OPTIMAL KNOWLEDGE FLOW ON THE INTERNET

A New Bayesian Network Method for Computing Bottom Event's Structural Importance Degree using Jointree

The Digital Signature Scheme MQQ-SIG

Bayesian Network Representation

Transcription:

The smple lear Regresso Model Correlato coeffcet s o-parametrc ad just dcates that two varables are assocated wth oe aother, but t does ot gve a deas of the kd of relatoshp. Regresso models help vestgatg bvarate ad multvarate relatoshps betwee varables, where we ca hpothesze that 1 varable depeds o aother varable or a combato of other varables. Normall relatoshps betwee varables poltcal scece ad ecoomcs are ot eact uless true b defto, but relatoshps clude most ofte a o-structural or radom compoet, due to the probablstc ature of theores ad hpotheses PolSc, measuremet errors etc. Regresso aalss eables to fd average relatoshps that ma ot be obvous b just ee-ballg the data eplct formulato of structural ad radom compoets of a hpotheszed relatoshp betwee varables. Eample: postve relatoshp betwee uemplomet ad govermet spedg

Smple lear regresso aalss Lear relatoshp betwee (eplaator varable) ad (depedet varable) Epslo descrbes the radom compoet of the lear relatoshp betwee ad -10-5 0 5 10 15-2 0 2 4 6

Y s the value of the depedet varable (spedg) observato (e.g. the UK) Y s determed b 2 compoets: 1. the o-radom/ structural compoet alpha+beta* where s the depedet/ eplaator varable (uemplomet) observato (UK) ad alpha ad beta are fed quattes, the parameters of the model; alpha s called costat or tercept ad measures the value where the regresso le crosses the -as; beta s called coeffcet/ slope, ad measures the steepess of the regresso le. 2. the radom compoet called dsturbace or error term epslo observato

A smple eample: has 10 observatos: 0,1,2,3,4,5,6,7,8,9 The true relatoshp betwee ad s: =5+1*, thus, the true takes o the values: 5,6,7,8,9,10,11,12,13,14 There s some dsturbace e.g. a measuremet error, whch s stadard ormall dstrbuted: thus the we ca measure takes o the values: 6.95,5.22,6.36,7.03,9.71,9.67,10.69,13.85, 13.21,14.82 whch are close to the true values, but for a gve observato the observed values are a lttle larger or smaller tha the true values. the relatoshp betwee ad should hold o average true but s ot eact Whe we do our aalss, we do t kow the true relatoshp ad the true, we just have the observed ad. We kow that the relatoshp betwee ad should have the followg form: =alpha+beta*+epslo (we hpothesze a lear relatoshp) The regresso aalss estmates the parameters alpha ad beta b usg the gve observatos for ad. The smplest form of estmatg alpha ad beta s called ordar least squares (OLS) regresso

OLS-Regresso: Draw a le through the scatter plot a wa to mmze the devatos of the sgle observatos from the le: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 0 2 4 6 8 10 alpha 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 1 hat1 0 1 2 3 4 5 6 7 8 9 10 epslo7 Ftted values 7 hat7 Mmze the sum of all squared devatos from the le (squared resduals) ŷ ˆ ˆ ˆ ˆ ˆ ˆ Ths s doe mathematcall b the statstcal program at had the values of the depedet varable (values o the le) are called predcted values of the regresso (hat): 4.97,6.03,7.10,8.16,9.22, 10.28,11.34,12.41,13.47,14.53 these are ver close to the true values ; the estmated alpha = 4.97 ad beta = 1.06

OLS regresso Ordar least squares regresso: mmzes the squared resduals ŷ ˆ ˆ ˆ ˆ 2 2 ˆ 1 1 (Y Y ) ( ) m Compoets: DY: ; at least 1 IV: Costat or tercept term: alpha Regresso coeffcet, slope: beta Error term, resduals: epslo Compoet plus resdual -10-5 0 5 10 15-2 0 2 4 6

Dervato of the OLS-Parameters alpha ad beta: The relatoshp betwee ad s descrbed b the fucto: The dfferece betwee the depedet varable ad the estmated sstematc fluece of o s amed the resdual: e ˆ ˆ To receve the optmal estmates for alpha ad beta we eed a choce-crtero; the case of OLS ths crtero s the sum of squared resduals: we calculate alpha ad beta for the case whch the sum of all squared devatos (resduals) s mmal 2 2 ˆ ˆ ˆ ˆ m e m S, ˆ, ˆ ˆ, ˆ 1 1 Takg the squares of the resdual s ecessar sce a) postve ad egatve devato do ot cacel each other out, b) postve ad egatve estmato errors eter wth the same weght due to the squarg dow, t s therefore rrelevat whether the epected value for observato s uderestmated or overestmates Sce the measure s addtve o value s of outmost relevace. Especall large resduals receve a stroger weght due to squarg.

Mmzg the fucto requres to calculate the frst order codtos wth respect to alpha ad beta ad set them zero: ˆ ˆ I : 2 0 S, ˆ ˆ ˆ 1 ˆ ˆ II : 2 0 S, ˆ ˆ ˆ 1 Ths s just a lear sstem of two equatos wth two ukows alpha ad beta, whch we ca mathematcall solve for alpha: 1 I : ˆ ˆ 0 1 ˆ ˆ ˆ ˆ

ad beta: 1 1 1 II : ˆ ˆ 0 ˆ ˆ 0 2 ˆ ˆ 0 2 ˆ ˆ 2 ˆ ˆ 1 1 0 0 ˆ ˆ 0 ˆ ˆ 1 1 1 1 1 1 1 2 Cov, V ar 1 X 'X X '

Naturall we stll have to verf whether ˆ ad ˆ reall mmze the sum of squared resduals ad satsf the secod order codtos of the mmzg problem. Thus we eed the secod dervatves of the two fuctos wth respect to alpha ad beta whch are gve b the so called Hessa matr (matr of secod dervatves). (I spare the mathematcal dervato) The Hessa matr has to be postve defte (the determat must be larger tha 0) so that ˆ ad ˆ globall mmze the sum of squared resduals. Ol ths case alpha ad beta are optmal estmates for the relatoshp betwee the depedet varable ad the depedet varable.

Regresso coeffcet: ˆ 1 ( )( ) 1 ( ) 2 Beta equals the covarace betwee ad dvded b the varace of.

Iterpretato of regresso results: reg Source SS df MS Number of obs = 100 -------------+---------------------------------------------- F( 1, 98) = 89.78 Model 1248.96129 1 1248.96129 Prob > F = 0.0000 Resdual 1363.2539 98 13.9107541 R-squared = 0.4781 -------------+---------------------------------------------- Adj R-squared = 0.4728 Total 2612.21519 99 26.386012 Root MSE = 3.7297 ---------------------------------------------------------------------------------------------------- Coef. Std. Err. t P> t [95% Cof. Iterval] -------------+------------------------------------------------------------------------------------- 1.941914.2049419 9.48 0.000 1.535213 2.348614 _cos.8609647.4127188 2.09 0.040.0419377 1.679992 ---------------------------------------------------------------------------------------------------- If creases b 1 ut, creases b 1.94 uts: the terpretato s lear ad straghtforward

Iterpretato: eample alpha=4.97, beta=1.06 Educato ad eargs: o educato gves ou a mmal hourl wage of aroud 5 pouds. Each addtoal ear of educato creases the hourl wage b app. 1 poud: alpha 0 1 2 3 4 5 6 7 8 9 10 11 12 1314 15 beta=1.06 0 1 2 3 4 5 6 7 8 9 10 Ftted values

Propertes of the OLS estmator: Sce alpha ad beta are estmates of the ukow parameters, ŷ ˆ ˆ estmates the mea fucto or the sstematc part of the regresso equato. Sce a radom varable ca be predcted best b the mea fucto (uder the mea squared error crtero), hat ca be terpreted as the best predcto of. the dfferece betwee the depedet varable ad ts least squares predcto s the least squares resdual: e=-hat =-(alpha+beta*). A large resdual e ca ether be due to a poor estmato of the parameters of the model or to a large usstematc part of the regresso equato For the OLS model to be the best estmator of the relatoshp betwee ad several codtos (full deal codtos, Gauss- Markov codtos) have to be met. If the full deal codtos are met oe ca argue that the OLSestmator mtates the propertes of the ukow model of the populato. Ths meas e.g. that the eplaator varables ad the error term are ucorrelated.