Regression Analysis. Regression Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013
|
|
- Brianna Francis
- 7 years ago
- Views:
Transcription
1 Lecture 6: Regression Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Regression Analysis 1
2 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 2
3 Multiple Linear Regression: Setup Data Set n cases i = 1, 2,..., n 1 Response (dependent) variable y i, i = 1, 2,..., n p Explanatory (independent) variables x i = (x i,1, x i,2,..., x i,p ) T, i = 1, 2,..., n Goal of Regression Analysis: Extract/exploit relationship between y i and x i. Examples Prediction Causal Inference Approximation Functional Relationships MIT 18.S096 Regression Analysis 3
4 General Linear Model: For each case i, the conditional distribution [y i x i ] is given by y i = ŷ i + ɛ i where ŷ i = β 1 x i,1 + β 2 x i,2 + + β i,p x i,p β = (β 1, β 2,..., β p ) T are p regression parameters (constant over all cases) ɛ i Residual (error) variable (varies over all cases) Extensive breadth of possible models Polynomial approximation (x i,j = (x i ) j, explanatory variables are different powers of the same variable x = x i ) Fourier Series: (x i,j = sin(jx i ) or cos(jx i ), explanatory variables are different sin/cos terms of a Fourier series expansion) Time series regressions: time indexed by i, and explanatory variables include lagged response values. Note: Linearity of ŷ i (in regression parameters) maintained with non-linear x. MIT 18.S096 Regression Analysis 4
5 Steps for Fitting a Model (1) Propose a model in terms of Response variable Y (specify the scale) Explanatory variables X 1, X 2,... X p (include different functions of explanatory variables if appropriate) Assumptions about the distribution of ɛ over the cases (2) Specify/define a criterion for judging different estimators. (3) Characterize the best estimator and apply it to the given data. (4) Check the assumptions in (1). (5) If necessary modify model and/or assumptions and go to (1). MIT 18.S096 Regression Analysis 5
6 Specifying Assumptions in (1) for Residual Distribution Gauss-Markov: zero mean, constant variance, uncorrelated Normal-linear models: ɛ i are i.i.d. N(0, σ 2 ) r.v.s Generalized Gauss-Markov: zero mean, and general covariance matrix (possibly correlated,possibly heteroscedastic) Non-normal/non-Gaussian distributions (e.g., Laplace, Pareto, Contaminated normal: some fraction (1 δ) of the ɛ i are i.i.d. N(0, σ 2 ) r.v.s the remaining fraction (δ) follows some contamination distribution). MIT 18.S096 Regression Analysis 6
7 Specifying Estimator Criterion in (2) Least Squares Maximum Likelihood Robust (Contamination-resistant) Bayes (assume β j are r.v. s with known prior distribution) Accommodating incomplete/missing data Case Analyses for (4) Checking Assumptions Residual analysis Model errors ɛ i are unobservable Model residuals for fitted regression parameters β j are: e i = y i [β 1x i,1 + β 2 x i,2 + + β px i,p ] Influence diagnostics (identify cases which are highly influential?) Outlier detection MIT 18.S096 Regression Analysis 7
8 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 8
9 Ordinary Least Squares Estimates Least Squares Criterion: For β = (β 1, β 2,..., β p ), define N Q(β) = i=1 [y i ŷ i] 2 = N i=1 [y i (β 1 x i,1 + β 2 x i,2 + + β i,p x 2 i,p)] Ordinary Least-Squares (OLS) estimate βˆ: minimizes Q(β). Matrix Notation y x x 1 1,1 x 1,2 1,p y = y 2 x 2,1 x 2,2 x 2,p X =. β = y n x n,2 x p,n x n,1 T β 1 β p MIT 18.S096 Regression Analysis 9
10 Solving for OLS Estimate βˆ ŷ = ŷ 1 ŷ 2. ŷ n = Xβ and Q(β) = n i=1 (y i ŷ i ) 2 = (y ŷ) T (y ŷ) = (y Xβ) T (y Xβ) OLS ˆβ solves Q(β) β j =0, j = 1, 2,..., p Q(β) β = ( n j β j i=1 [y i (x i,1 β 1 + x i,2 β 2 + x i,p β p )] 2) = n i=1 2( x i,j)[y i (x i,1 β 1 + x i,2 β 2 + x i,p β p )] = 2(X [j] ) T (y Xβ) where X [j] is the jth column of X MIT 18.S096 Regression Analysis 10
11 Solving for OLS Estimate βˆ Q β = Q β 1 Q β 2. Q βp XT [1] (y Xβ) X T [2](y Xβ) = 2 = 2X T (y Xβ). X T (y Xβ) So the OLS Estimate βˆ solves the Normal Equations X T (y Xβ) = 0 X T Xβˆ = X T y = βˆ = (X T X) 1 X T y [p] N.B. For βˆ to exist (uniquely) (X T X) must be invertible X must have Full Column Rank MIT 18.S096 Regression Analysis 11
12 (Ordinary) Least Squares Fit OLS Estimate: βˆ1 βˆ = βˆ 2. T = (X X) 1 X T y Fitted Values: Where βˆ p ŷ x βˆ + + x βˆ 1 1,1 1 1,p p ŷ 2 x βˆ + + x βˆ y = 2 1 ˆ,1 2,p p =.. ŷ n xn,1βˆ x n,p βˆ p = Xβˆ = X(X T X) 1 X T y = Hy H = X(X T X) 1 X T is the n n Hat Matrix MIT 18.S096 Regression Analysis 12
13 (Ordinary) Least Squares Fit The Hat Matrix H projects R n onto the column-space of X Residuals: ɛˆi = y i ŷ i, i = 1, 2,..., n ɛˆ1 ˆɛ = = y ŷ = (I ɛˆ2 n. H)y ɛˆn 0 Normal Equations: X T (y Xβˆ) = X T ˆɛ = 0 p =. 0 N.B. The Least-Squares Residuals vector ˆɛ is orthogonal to the column space of X MIT 18.S096 Regression Analysis 13
14 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 14
15 : Assumptions y1 x1, 1 x 1,2 x 1,p y 2 Data y = and X = x 2,1 x 2,2 x 2,p y n x n,1 x n,2 x p,n follow a linear model satisfying the Gauss-Markov Assumptions if y is an observation of random vector Y = (Y 1, Y 2,... Y N ) T and E(Y X, β) = Xβ, where β = (β 1, β T 2,... β p ) is the p-vector of regression parameters. Cov(Y X, β) = σ 2 I n, for some σ 2 > 0. I.e., the random variables generating the observations are uncorrelated and have constant variance σ 2 (conditional on X, and β). MIT 18.S096 Regression Analysis 15
16 For known constants c 1, c 2,..., c p, c p+1, consider the problem of estimating θ = c 1 β 1 + c 2 β 2 + c p β p + c p+1. Under the Gauss-Markov assumptions, the estimator θˆ = c1βˆ 1 + c 2 βˆ2 + cpβˆ p + c p+1, where βˆ1, βˆ 2,... β ˆ p are the least squares estimates is 1) An Unbiased Estimator of θ 2) A Linear Estimator of θ, that is θˆ = n i=1 b iy i, for some known (given X) constants b i. Theorem: Under the Gauss-Markov Assumptions, the estimator θˆ has the smallest (Best) variance among all Linear Unbiased Estimators of θ, i.e., θˆ is BLUE. MIT 18.S096 Regression Analysis 16
17 : Proof Proof: Without loss of generality, assume c p+1 = 0 and define c =(c 1, c T 2,..., c p ). The Least Squares Estimate of θ = c T β is: θˆ = c T βˆ = c T (X T X) 1 X T y d T y a linear estimate in y given by coefficients d = (d T 1, d 2,..., d n ). Consider an alternative linear estimate of θ: θ = b T y with fixed coefficients given by b = (b T 1,..., b n ). Define f = b d and note that θ = b T y = (d + f) T y = θˆ + f T y If θ is unbiased then because θˆ is unbiased 0 = E(f T y) = d T E(y) = f T (Xβ) for all β R p = f is orthogonal to column space of X = f is orthogonal to d = X(X T X) 1 c MIT 18.S096 Regression Analysis 17
18 If θ is unbiased then The orthogonality of f to d implies Var(θ ) = Var(b T y) = Var(d T y + f T y) = Var(d T y) + Var(f T y) + 2Cov(d T y, f T y) = Var(θˆ) + Var(f T y) + 2d T Cov(y)f = Var(θˆ) + Var(f T y) + 2d T (σ 2 I n )f = Var(θˆ) + Var(f T y) + 2σ 2 d T f = Var(θˆ) + Var(f T y) + 2σ 2 0 Var(θˆ) MIT 18.S096 Regression Analysis 18
19 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 19
20 Estimates Consider generalizing the Gauss-Markov assumptions for the linear regression model to Y = Xβ + ɛ where the random n-vector ɛ: E[ɛ] = 0 n and E[ɛɛ ] = σ 2 Σ. σ 2 is an unknown scale parameter Σ is a known (n n) positive definite matrix specifying the relative variances and correlations of the component observations. 1 Transform the data (Y, X) to Y = Σ 2 Y and X = Σ 1 2 X and the model becomes Y = X β + ɛ, where E[ɛ ] = 0 n and E[ɛ (ɛ ) ] = σ 2 I n By the, the BLUE ( GLS ) of β is βˆ = [(X ) T (X )] 1 (X ) T (Y ) = [X T Σ 1 X] 1 (X T Σ 1 Y) MIT 18.S096 Regression Analysis 20
21 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 21
22 Normal Linear Regression Models Distribution Theory Y i = x i,1 β 1 + x i,2 β 2 + x i,p β p + ɛ i = µ i + ɛ i Assume {ɛ 1, ɛ 2,..., ɛ n } are i.i.d N(0, σ 2 ). = [Y i x i,1, x i,2,..., x 2 i,p, β, σ ] N(µ i, σ 2 ), independent over i = 1, 2,... n. Conditioning on X, β, and σ 2 ɛ 1 Y = Xβ + ɛ, where ɛ = ɛ 2 N 2 n(o n, σ I n ). ɛ n MIT 18.S096 Regression Analysis 22
23 Distribution Theory Regression Analysis µ 1 µ =. µ n = E(Y X, β, σ 2 ) = Xβ MIT 18.S096 Regression Analysis 23
24 σ σ Σ = Cov(Y X, β, σ 2 ) = 0 0 σ σ 2 That is, Σ i,j = Cov(Y i, Y j X, β, σ 2 ) = σ 2 δ i,j. = σ 2 I n Apply Moment-Generating Functions (MGFs) to derive Joint distribution of Y = (Y 1, Y 2,..., Y n ) T Joint distribution of βˆ = (βˆ1, βˆ2,..., βˆp) T. MIT 18.S096 Regression Analysis 24
25 MGF of Y For the n-variate r.v. Y, and constant n vector t = (t 1,..., t n ) T, T M Y (t) = E(e t Y ) = E(e t 1Y 1 +t 2 Y 2 + t ny n ) = E(e t 1Y 1 ) E(e t 2Y 2 ) E(e tnyn ) = M Y 1 (t 1 ) M Y2 (t 2 ) M Yn (t n ) n 1 = =1 et i µ i + 2 i t2 i σ2 = e n i=1 t i µ i + 1 n 2 i,k=1 t i Σ i,k t k = e tt u+ 1 t T Σt 2 = Y N n (µ, Σ) Multivariate Normal with mean µ and covariance Σ MIT 18.S096 Regression Analysis 25
26 MGF of βˆ For the p-variate r.v. βˆ, and constant p vector τ = (τ 1,..., τ p ) T, M (τ ) = E(e τ T βˆ ) = E(e τ 1βˆ1+τ2βˆ 2 + τ pβ p βˆ ) Defining A = (X T X) 1 X T we can express βˆ = (X T X) 1 X T y = AY and T ˆ Mˆ(τ β ) = E(e τ β ) = E(e τ T AY ) T = E(e t Y ), with t = A T τ = M Y (t) 1 = e tt u+ 2 tt Σt MIT 18.S096 Regression Analysis 26
27 MGF of βˆ Regression Analysis For Plug in: Gives: M βˆ(τ ) = E(e τ T βˆ) = e tt u+ 1 t T Σt 2 t = A T τ = X(X T X) 1 τ µ = Xβ Σ = σ 2 I n t T µ = τ T β t T Σt = τ T (X T X) 1 X T [σ 2 I n ]X(X T X) 1 τ = τ T [σ 2 (X T X) 1 ]τ So the MGF of βˆ is 1 M βˆ(τ ) = e τ T β+ 2 τ T [σ 2 (X T X) 1 ]τ ˆ 2 T 1 MIT 18.S096 Regression Analysis 27
28 Marginal Distributions of Least Squares Estimates Because βˆ N p (β, σ 2 (X T X) 1 ) the marginal distribution of each βˆj is: βˆj N(β j, σ 2 C j,j ) where C j.j = jth diagonal element of (X T X) 1 MIT 18.S096 Regression Analysis 28
29 The Q-R Decomposition of X Consider expressing the (n p) matrix X of explanatory variables as X = Q R where Q is an (n p) orthonormal matrix, i.e., Q T Q = I p. R is a (p p) upper-triangular matrix. The columns of Q = [Q [1], Q [2],..., Q [p] ] can be constructed by performing the Gram-Schmidt Orthonormalization procedure on the columns of X = [X [1], X [2],..., X [p] ] MIT 18.S096 Regression Analysis 29
30 If Regression Analysis r1,1 r 1,2 r1,p 1 r 1,p 0 r 2,2 r 2,p 1 r 2,p R = , then 0 0 r p 1,p 1 rp 1,p r p,p X [1] = Q [1] r 1,1 = r1,1 2 = X T [1] X [1] Q [1] = X [1] /r 1,1 X [2] = Q [1] r 1,2 + Q [2] r 2,2 = Q[1] T X [2] = Q[1] T Q[1] r 1,2 + Q[1] T [2]r 2,2 = 1 r 1,2 + 0 r 2,2 = r 1,2 (known since Q [1] specfied) MIT 18.S096 Regression Analysis 30
31 With r 1,2 and Q [1] specfied we can solve for r 2,2 : = Q [2] r 2,2 = X [2] Q [1] r 1,2 Take squared norm of both sides: r 2 = X T X 2r Q T X + r 2 2,2 [2] [2] 1,2 [1] [2] 1,2 (all terms on RHS are known) With r 2,2 specified = Q [2] = 1 r 2,2 [ X[2] r 1,2 Q [1] ] Etc. (solve for elements of R, and columns of Q) MIT 18.S096 Regression Analysis 31
32 With the Q-R Decomposition X = QR (Q T Q = I p, and R is p p upper-triangular) ˆβ = (X T X) 1 X T y = R 1 Q T y (plug in X = QR and simplify) Cov(ˆβ) = σ 2 (X T X) 1 = σ 2 R 1 (R 1 ) T H = X(X T X) 1 X T = QQ T (giving ŷ = Hy and ˆɛ = (I n H)y) MIT 18.S096 Regression Analysis 32
33 More Distribution Theory Assume y = Xβ + ɛ, where {ɛ i } are i.i.d. N(0, σ 2 ), i.e., ɛ N n (0 n, σ 2 I n ) or y N n (Xβ, σ 2 I n ) Theorem* For any (m n) matrix A of rank m n, the random normal vector y transformed by A, z = Ay is also a random normal vector: z N m (µ z, Σ z ) where µ z = AE(y) = AXβ, and Σ z = ACov(y)A T = σ 2 AA T. Earlier, A = (X T X) 1 X T yields the distribution of βˆ = Ay With a different definition of A (and z) we give an easy proof of: MIT 18.S096 Regression Analysis 33
34 Theorem For the normal linear regression model y = Xβ + ɛ, where X (n p) has rank p and ɛ N n (0 2 n, σ I n ). (a) βˆ = (X T X) 1 X T y and ˆɛ = y Xβˆ are independent r.v.s (b) βˆ N p (β, σ 2 (X T X) 1 ) n (c) i=1 ɛˆ2i = ˆɛ T ˆɛ σ 2 χ 2 n p (Chi-squared r.v.) (d) For each j = 1, 2,..., p βˆj j ˆ β t j = t ˆσC n p (t distribution) j,j where ˆσ 2 = 1 n n p i=1 ɛˆ2i C j,j = [(X T X) 1 ] j,j MIT 18.S096 Regression Analysis 34
35 Proof: Note that (d) follows immediately from (a), (b), (c) [ ] Q T Define A =, where W T A is an (n n) orthogonal matrix (i.e. A T = A 1 ) Q is the column-orthonormal matrix in a Q-R decomposition of X Note: W can be constructed by continuing the Gram-Schmidt Orthonormalization process (which was used to construct Q from X) with X = [ X I n ]. Then, consider [ ] [ ] Q T y z Q ( p 1) z = Ay = T = W y (n p) 1 z W MIT 18.S096 Regression Analysis 35
36 The distribution of z = Ay is N n (µ z, Σ z ) where [ ] T Q µ z = [A][Xβ] = T [Q R β] W [ ] Q T Q = W T [R β] [ Q ] Ip = [R β] [ 0 (n p) p ] R β = 0 (n p) p 2 Σ z = A [σ I n ] A T = σ 2 [AA T ] = σ 2 I n since A T = A 1 MIT 18.S096 Regression Analysis 36
37 ( ) [( ) ] zq Rβ Thus z = N σ z 2 n, In W On p = z Q N p [(Rβ), σ 2 I p ] 2 z W N(n p)[(o (n p), σ I (n p) ] and z Q and z W are independent. The Theorem follows by showing (a*) βˆ = R 1 z Q and ˆɛ = Wz W, (i.e. βˆ and ˆɛ are functions of different independent vecctors). (b*) Deducing the distribution of βˆ = R 1 z Q, applying Theorem* with A = R 1 and y = z Q (c*) ˆɛ T ˆɛ = zw T z W = sum of (n p) squared r.v s which are i.i.d. N(0, σ 2 ). σ 2 χ 2, a scaled Chi-Squared r.v. (n p) MIT 18.S096 Regression Analysis 37
38 Proof of (a*) βˆ = R 1 z Q follows from βˆ = (X T X) 1 Xy and X = QR with Q : Q T Q = I p ˆɛ = y ŷ = y Xβˆ = y (QR) (R 1 z Q ) = y Qz Q = y QQ T y = (In QQ T )y = WW T y (since I n = A T A = QQ T + WW T ) = Wz W MIT 18.S096 Regression Analysis 38
39 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 39
40 Maximum-Likelihood Estimation Consider the normal linear regression model: y = Xβ + ɛ, where {ɛ i } are i.i.d. N(0, σ 2 ), i.e., ɛ N n (0 n, σ 2 I n ) or y N n (Xβ, σ 2 I n ) Definitions: The likelihood function is L(β, σ 2 ) = p(y X, B, σ 2 ) where p(y X, B, σ 2 ) is the joint probability density function (pdf) of the conditional distribution of y given data X, (known) and parameters (β, σ 2 ) (unknown). The maximum likelihood estimates of (β, σ 2 ) are the values maximizing L(β, σ 2 ), i.e., those which make the observed data y most likely in terms of its pdf. MIT 18.S096 Regression Analysis 40
41 Because the y i are independent r.v. s with y i N(µ 2 i, σ ) where p µ i = j=1 β j x i,j, L(β, σ 2 ) = n i=1 p [ (y 2 i β, σ ) n 2σ 2 (y i j=1 β j x i,j ) 2] 1 = 1 i=1 2πσ 2 e 1 (y Xβ) = 1 e 1 (y Xβ) T (σ 2 I n) (2πσ 2 ) n/2 2 The maximum likelihood estimates (βˆ, σˆ2) maximize the log-likeliood function (dropping constant terms) logl(β, σ 2 ) = n 2 log(σ2 ) 1 2 (y Xβ)T (σ 2 I n ) 1 (y Xβ) = n 2 log(σ2 ) 1 Q(β) 2σ 2 where Q(β) = (y Xβ) T (y Xβ) ( Least-Squares Criterion!) The OLS estimate ˆβ is also the ML-estimate. The ML estimate of σ 2 solves 2 log L(βˆ,σ ) (σ 2 = 0,i.e., n 1 ) 2 σ 2 1 ( 1)(σ 2 ) 2 Q(βˆ) = 0 2 = σ ˆ2 ML = Q(βˆ)/n = ( n i=1 ɛ ˆ2)/n i (biased!) MIT 18.S096 Regression Analysis 41
42 Outline Regression Analysis 1 Regression Analysis MIT 18.S096 Regression Analysis 42
43 For data y, X fit the linear regression model y i = x T i β + ɛ i, i = 1, 2,..., n. by specifying β = βˆ to minimize Q(β) = n i=1 h(y i, x 2 i, β, σ ) The choice of the function h( ) distinguishes different estimators. (1) Least Squares: h(y i, x i, β, σ 2 ) = (y i x T i β) 2 (2) Mean Absolue Deviation (MAD): h(y i, x 2 T i, β, σ ) = y i x i β (3) Maximum Likelihood (ML): Assume the y i are independent with pdf s p(y i β, x i, σ 2 ), h(y i, x i, β, σ 2 ) = log p(y i β, x i, σ 2 ) (4) Robust M Estimator: h(y i, x i, β, σ 2 ) = χ(y i x T i β) χ( ) is even, monotone increasing on (0, ). MIT 18.S096 Regression Analysis 43
44 (5) Quantile Estimator: For { τ : 0 < τ < 1, a fixed quantile τ y T 2 i x i β, if y h(y i x i, x i, β, σ ) = i β (1 τ) y i x T i β, if y i < x i β E.g., τ = 0.90 corresponds to the 90th quantile / upper-decile. τ = 0.50 corresponds to the MAD Estimator MIT 18.S096 Regression Analysis 44
45 MIT OpenCourseWare 18.S096 Topics in Mathematics with Applications in Finance Fall 2013 For information about citing these materials or our Terms of Use, visit:
Introduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationLeast Squares Estimation
Least Squares Estimation SARA A VAN DE GEER Volume 2, pp 1041 1045 in Encyclopedia of Statistics in Behavioral Science ISBN-13: 978-0-470-86080-9 ISBN-10: 0-470-86080-4 Editors Brian S Everitt & David
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance
More informationLogistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression
Logistic Regression Department of Statistics The Pennsylvania State University Email: jiali@stat.psu.edu Logistic Regression Preserve linear classification boundaries. By the Bayes rule: Ĝ(x) = arg max
More informationQuadratic forms Cochran s theorem, degrees of freedom, and all that
Quadratic forms Cochran s theorem, degrees of freedom, and all that Dr. Frank Wood Frank Wood, fwood@stat.columbia.edu Linear Regression Models Lecture 1, Slide 1 Why We Care Cochran s theorem tells us
More informationLecture 3: Linear methods for classification
Lecture 3: Linear methods for classification Rafael A. Irizarry and Hector Corrada Bravo February, 2010 Today we describe four specific algorithms useful for classification problems: linear regression,
More informationLecture 5 Least-squares
EE263 Autumn 2007-08 Stephen Boyd Lecture 5 Least-squares least-squares (approximate) solution of overdetermined equations projection and orthogonality principle least-squares estimation BLUE property
More informationTime Series Analysis III
Lecture 12: Time Series Analysis III MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Time Series Analysis III 1 Outline Time Series Analysis III 1 Time Series Analysis III MIT 18.S096 Time Series Analysis
More informationEconometrics Simple Linear Regression
Econometrics Simple Linear Regression Burcu Eke UC3M Linear equations with one variable Recall what a linear equation is: y = b 0 + b 1 x is a linear equation with one variable, or equivalently, a straight
More informationLinear Models for Continuous Data
Chapter 2 Linear Models for Continuous Data The starting point in our exploration of statistical models in social research will be the classical linear model. Stops along the way include multiple linear
More informationMultivariate normal distribution and testing for means (see MKB Ch 3)
Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................
More information3.1 Least squares in matrix form
118 3 Multiple Regression 3.1 Least squares in matrix form E Uses Appendix A.2 A.4, A.6, A.7. 3.1.1 Introduction More than one explanatory variable In the foregoing chapter we considered the simple regression
More informationFactor Analysis. Factor Analysis
Factor Analysis Principal Components Analysis, e.g. of stock price movements, sometimes suggests that several variables may be responding to a small number of underlying forces. In the factor model, we
More informationLinear Regression. Guy Lebanon
Linear Regression Guy Lebanon Linear Regression Model and Least Squares Estimation Linear regression is probably the most popular model for predicting a RV Y R based on multiple RVs X 1,..., X d R. It
More informationMATH4427 Notebook 2 Spring 2016. 2 MATH4427 Notebook 2 3. 2.1 Definitions and Examples... 3. 2.2 Performance Measures for Estimators...
MATH4427 Notebook 2 Spring 2016 prepared by Professor Jenny Baglivo c Copyright 2009-2016 by Jenny A. Baglivo. All Rights Reserved. Contents 2 MATH4427 Notebook 2 3 2.1 Definitions and Examples...................................
More informationSections 2.11 and 5.8
Sections 211 and 58 Timothy Hanson Department of Statistics, University of South Carolina Stat 704: Data Analysis I 1/25 Gesell data Let X be the age in in months a child speaks his/her first word and
More informationExample: Credit card default, we may be more interested in predicting the probabilty of a default than classifying individuals as default or not.
Statistical Learning: Chapter 4 Classification 4.1 Introduction Supervised learning with a categorical (Qualitative) response Notation: - Feature vector X, - qualitative response Y, taking values in C
More information1 2 3 1 1 2 x = + x 2 + x 4 1 0 1
(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which
More informationGenerating Random Numbers Variance Reduction Quasi-Monte Carlo. Simulation Methods. Leonid Kogan. MIT, Sloan. 15.450, Fall 2010
Simulation Methods Leonid Kogan MIT, Sloan 15.450, Fall 2010 c Leonid Kogan ( MIT, Sloan ) Simulation Methods 15.450, Fall 2010 1 / 35 Outline 1 Generating Random Numbers 2 Variance Reduction 3 Quasi-Monte
More informationLeast-Squares Intersection of Lines
Least-Squares Intersection of Lines Johannes Traa - UIUC 2013 This write-up derives the least-squares solution for the intersection of lines. In the general case, a set of lines will not intersect at a
More informationExploratory Factor Analysis and Principal Components. Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016
and Principal Components Pekka Malo & Anton Frantsev 30E00500 Quantitative Empirical Research Spring 2016 Agenda Brief History and Introductory Example Factor Model Factor Equation Estimation of Loadings
More informationData Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression
Data Mining and Data Warehousing Henryk Maciejewski Data Mining Predictive modelling: regression Algorithms for Predictive Modelling Contents Regression Classification Auxiliary topics: Estimation of prediction
More informationMaximum Likelihood Estimation
Math 541: Statistical Theory II Lecturer: Songfeng Zheng Maximum Likelihood Estimation 1 Maximum Likelihood Estimation Maximum likelihood is a relatively simple method of constructing an estimator for
More informationEstimating an ARMA Process
Statistics 910, #12 1 Overview Estimating an ARMA Process 1. Main ideas 2. Fitting autoregressions 3. Fitting with moving average components 4. Standard errors 5. Examples 6. Appendix: Simple estimators
More informationChapter 3: The Multiple Linear Regression Model
Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics
More informationSimple Linear Regression Inference
Simple Linear Regression Inference 1 Inference requirements The Normality assumption of the stochastic term e is needed for inference even if it is not a OLS requirement. Therefore we have: Interpretation
More informationInner product. Definition of inner product
Math 20F Linear Algebra Lecture 25 1 Inner product Review: Definition of inner product. Slide 1 Norm and distance. Orthogonal vectors. Orthogonal complement. Orthogonal basis. Definition of inner product
More informationMATHEMATICAL METHODS OF STATISTICS
MATHEMATICAL METHODS OF STATISTICS By HARALD CRAMER TROFESSOK IN THE UNIVERSITY OF STOCKHOLM Princeton PRINCETON UNIVERSITY PRESS 1946 TABLE OF CONTENTS. First Part. MATHEMATICAL INTRODUCTION. CHAPTERS
More information2. Linear regression with multiple regressors
2. Linear regression with multiple regressors Aim of this section: Introduction of the multiple regression model OLS estimation in multiple regression Measures-of-fit in multiple regression Assumptions
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More informationPrinciple of Data Reduction
Chapter 6 Principle of Data Reduction 6.1 Introduction An experimenter uses the information in a sample X 1,..., X n to make inferences about an unknown parameter θ. If the sample size n is large, then
More information3. Regression & Exponential Smoothing
3. Regression & Exponential Smoothing 3.1 Forecasting a Single Time Series Two main approaches are traditionally used to model a single time series z 1, z 2,..., z n 1. Models the observation z t as a
More informationVariance Reduction. Pricing American Options. Monte Carlo Option Pricing. Delta and Common Random Numbers
Variance Reduction The statistical efficiency of Monte Carlo simulation can be measured by the variance of its output If this variance can be lowered without changing the expected value, fewer replications
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationFactor analysis. Angela Montanari
Factor analysis Angela Montanari 1 Introduction Factor analysis is a statistical model that allows to explain the correlations between a large number of observed correlated variables through a small number
More informationEconometric Methods for Panel Data
Based on the books by Baltagi: Econometric Analysis of Panel Data and by Hsiao: Analysis of Panel Data Robert M. Kunst robert.kunst@univie.ac.at University of Vienna and Institute for Advanced Studies
More informationα = u v. In other words, Orthogonal Projection
Orthogonal Projection Given any nonzero vector v, it is possible to decompose an arbitrary vector u into a component that points in the direction of v and one that points in a direction orthogonal to v
More informationChapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
More informationChapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )
Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 ) and Neural Networks( 類 神 經 網 路 ) 許 湘 伶 Applied Linear Regression Models (Kutner, Nachtsheim, Neter, Li) hsuhl (NUK) LR Chap 10 1 / 35 13 Examples
More information5. Linear Regression
5. Linear Regression Outline.................................................................... 2 Simple linear regression 3 Linear model............................................................. 4
More informationChapter 10: Basic Linear Unobserved Effects Panel Data. Models:
Chapter 10: Basic Linear Unobserved Effects Panel Data Models: Microeconomic Econometrics I Spring 2010 10.1 Motivation: The Omitted Variables Problem We are interested in the partial effects of the observable
More informationIAPRI Quantitative Analysis Capacity Building Series. Multiple regression analysis & interpreting results
IAPRI Quantitative Analysis Capacity Building Series Multiple regression analysis & interpreting results How important is R-squared? R-squared Published in Agricultural Economics 0.45 Best article of the
More information8. Linear least-squares
8. Linear least-squares EE13 (Fall 211-12) definition examples and applications solution of a least-squares problem, normal equations 8-1 Definition overdetermined linear equations if b range(a), cannot
More informationTrend and Seasonal Components
Chapter 2 Trend and Seasonal Components If the plot of a TS reveals an increase of the seasonal and noise fluctuations with the level of the process then some transformation may be necessary before doing
More informationSales forecasting # 1
Sales forecasting # 1 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting
More informationPrinciple Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression
Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression Saikat Maitra and Jun Yan Abstract: Dimension reduction is one of the major tasks for multivariate
More informationAnalyzing Structural Equation Models With Missing Data
Analyzing Structural Equation Models With Missing Data Craig Enders* Arizona State University cenders@asu.edu based on Enders, C. K. (006). Analyzing structural equation models with missing data. In G.
More informationRidge Regression. Patrick Breheny. September 1. Ridge regression Selection of λ Ridge regression in R/SAS
Ridge Regression Patrick Breheny September 1 Patrick Breheny BST 764: Applied Statistical Modeling 1/22 Ridge regression: Definition Definition and solution Properties As mentioned in the previous lecture,
More informationDepartment of Mathematics, Indian Institute of Technology, Kharagpur Assignment 2-3, Probability and Statistics, March 2015. Due:-March 25, 2015.
Department of Mathematics, Indian Institute of Technology, Kharagpur Assignment -3, Probability and Statistics, March 05. Due:-March 5, 05.. Show that the function 0 for x < x+ F (x) = 4 for x < for x
More informationCredit Risk Models: An Overview
Credit Risk Models: An Overview Paul Embrechts, Rüdiger Frey, Alexander McNeil ETH Zürich c 2003 (Embrechts, Frey, McNeil) A. Multivariate Models for Portfolio Credit Risk 1. Modelling Dependent Defaults:
More informationAn Introduction to Machine Learning
An Introduction to Machine Learning L5: Novelty Detection and Regression Alexander J. Smola Statistical Machine Learning Program Canberra, ACT 0200 Australia Alex.Smola@nicta.com.au Tata Institute, Pune,
More informationSummary of Formulas and Concepts. Descriptive Statistics (Ch. 1-4)
Summary of Formulas and Concepts Descriptive Statistics (Ch. 1-4) Definitions Population: The complete set of numerical information on a particular quantity in which an investigator is interested. We assume
More informationA Basic Introduction to Missing Data
John Fox Sociology 740 Winter 2014 Outline Why Missing Data Arise Why Missing Data Arise Global or unit non-response. In a survey, certain respondents may be unreachable or may refuse to participate. Item
More informationExample: Boats and Manatees
Figure 9-6 Example: Boats and Manatees Slide 1 Given the sample data in Table 9-1, find the value of the linear correlation coefficient r, then refer to Table A-6 to determine whether there is a significant
More information1 Teaching notes on GMM 1.
Bent E. Sørensen January 23, 2007 1 Teaching notes on GMM 1. Generalized Method of Moment (GMM) estimation is one of two developments in econometrics in the 80ies that revolutionized empirical work in
More information1 Prior Probability and Posterior Probability
Math 541: Statistical Theory II Bayesian Approach to Parameter Estimation Lecturer: Songfeng Zheng 1 Prior Probability and Posterior Probability Consider now a problem of statistical inference in which
More informationForecasting in supply chains
1 Forecasting in supply chains Role of demand forecasting Effective transportation system or supply chain design is predicated on the availability of accurate inputs to the modeling process. One of the
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationClustering in the Linear Model
Short Guides to Microeconometrics Fall 2014 Kurt Schmidheiny Universität Basel Clustering in the Linear Model 2 1 Introduction Clustering in the Linear Model This handout extends the handout on The Multiple
More information1 Another method of estimation: least squares
1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationLecture 21. The Multivariate Normal Distribution
Lecture. The Multivariate Normal Distribution. Definitions and Comments The joint moment-generating function of X,...,X n [also called the moment-generating function of the random vector (X,...,X n )]
More informationStandard errors of marginal effects in the heteroskedastic probit model
Standard errors of marginal effects in the heteroskedastic probit model Thomas Cornelißen Discussion Paper No. 320 August 2005 ISSN: 0949 9962 Abstract In non-linear regression models, such as the heteroskedastic
More informationSYSTEMS OF REGRESSION EQUATIONS
SYSTEMS OF REGRESSION EQUATIONS 1. MULTIPLE EQUATIONS y nt = x nt n + u nt, n = 1,...,N, t = 1,...,T, x nt is 1 k, and n is k 1. This is a version of the standard regression model where the observations
More informationLecture 14: GLM Estimation and Logistic Regression
Lecture 14: GLM Estimation and Logistic Regression Dipankar Bandyopadhyay, Ph.D. BMTRY 711: Analysis of Categorical Data Spring 2011 Division of Biostatistics and Epidemiology Medical University of South
More informationSales forecasting # 2
Sales forecasting # 2 Arthur Charpentier arthur.charpentier@univ-rennes1.fr 1 Agenda Qualitative and quantitative methods, a very general introduction Series decomposition Short versus long term forecasting
More informationThe Method of Least Squares
Hervé Abdi 1 1 Introduction The least square methods (LSM) is probably the most popular technique in statistics. This is due to several factors. First, most common estimators can be casted within this
More informationRegression Analysis LECTURE 2. The Multiple Regression Model in Matrices Consider the regression equation. (1) y = β 0 + β 1 x 1 + + β k x k + ε,
LECTURE 2 Regression Analysis The Multiple Regression Model in Matrices Consider the regression equation (1) y = β 0 + β 1 x 1 + + β k x k + ε, and imagine that T observations on the variables y, x 1,,x
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationUnderstanding and Applying Kalman Filtering
Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding
More informationSTAT 830 Convergence in Distribution
STAT 830 Convergence in Distribution Richard Lockhart Simon Fraser University STAT 830 Fall 2011 Richard Lockhart (Simon Fraser University) STAT 830 Convergence in Distribution STAT 830 Fall 2011 1 / 31
More informationPattern Analysis. Logistic Regression. 12. Mai 2009. Joachim Hornegger. Chair of Pattern Recognition Erlangen University
Pattern Analysis Logistic Regression 12. Mai 2009 Joachim Hornegger Chair of Pattern Recognition Erlangen University Pattern Analysis 2 / 43 1 Logistic Regression Posteriors and the Logistic Function Decision
More informationFitting Subject-specific Curves to Grouped Longitudinal Data
Fitting Subject-specific Curves to Grouped Longitudinal Data Djeundje, Viani Heriot-Watt University, Department of Actuarial Mathematics & Statistics Edinburgh, EH14 4AS, UK E-mail: vad5@hw.ac.uk Currie,
More information5. Orthogonal matrices
L Vandenberghe EE133A (Spring 2016) 5 Orthogonal matrices matrices with orthonormal columns orthogonal matrices tall matrices with orthonormal columns complex matrices with orthonormal columns 5-1 Orthonormal
More informationWeb-based Supplementary Materials for Bayesian Effect Estimation. Accounting for Adjustment Uncertainty by Chi Wang, Giovanni
1 Web-based Supplementary Materials for Bayesian Effect Estimation Accounting for Adjustment Uncertainty by Chi Wang, Giovanni Parmigiani, and Francesca Dominici In Web Appendix A, we provide detailed
More informationRegression III: Advanced Methods
Lecture 16: Generalized Additive Models Regression III: Advanced Methods Bill Jacoby Michigan State University http://polisci.msu.edu/jacoby/icpsr/regress3 Goals of the Lecture Introduce Additive Models
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More informationComputer exercise 4 Poisson Regression
Chalmers-University of Gothenburg Department of Mathematical Sciences Probability, Statistics and Risk MVE300 Computer exercise 4 Poisson Regression When dealing with two or more variables, the functional
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationBasic Statistics and Data Analysis for Health Researchers from Foreign Countries
Basic Statistics and Data Analysis for Health Researchers from Foreign Countries Volkert Siersma siersma@sund.ku.dk The Research Unit for General Practice in Copenhagen Dias 1 Content Quantifying association
More informationTime Series Analysis 1. Lecture 8: Time Series Analysis. Time Series Analysis MIT 18.S096. Dr. Kempthorne. Fall 2013 MIT 18.S096
Lecture 8: Time Series Analysis MIT 18.S096 Dr. Kempthorne Fall 2013 MIT 18.S096 Time Series Analysis 1 Outline Time Series Analysis 1 Time Series Analysis MIT 18.S096 Time Series Analysis 2 A stochastic
More informationWeek 5: Multiple Linear Regression
BUS41100 Applied Regression Analysis Week 5: Multiple Linear Regression Parameter estimation and inference, forecasting, diagnostics, dummy variables Robert B. Gramacy The University of Chicago Booth School
More informationPOLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.
Polynomial Regression POLYNOMIAL AND MULTIPLE REGRESSION Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model. It is a form of linear regression
More informationLecture 8. Confidence intervals and the central limit theorem
Lecture 8. Confidence intervals and the central limit theorem Mathematical Statistics and Discrete Mathematics November 25th, 2015 1 / 15 Central limit theorem Let X 1, X 2,... X n be a random sample of
More informationPart 2: Analysis of Relationship Between Two Variables
Part 2: Analysis of Relationship Between Two Variables Linear Regression Linear correlation Significance Tests Multiple regression Linear Regression Y = a X + b Dependent Variable Independent Variable
More informationPanel Data: Linear Models
Panel Data: Linear Models Laura Magazzini University of Verona laura.magazzini@univr.it http://dse.univr.it/magazzini Laura Magazzini (@univr.it) Panel Data: Linear Models 1 / 45 Introduction Outline What
More informationLinear Classification. Volker Tresp Summer 2015
Linear Classification Volker Tresp Summer 2015 1 Classification Classification is the central task of pattern recognition Sensors supply information about an object: to which class do the object belong
More informationLOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as
LOGISTIC REGRESSION Nitin R Patel Logistic regression extends the ideas of multiple linear regression to the situation where the dependent variable, y, is binary (for convenience we often code these values
More informationMultivariate Analysis (Slides 13)
Multivariate Analysis (Slides 13) The final topic we consider is Factor Analysis. A Factor Analysis is a mathematical approach for attempting to explain the correlation between a large set of variables
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationNotes on Applied Linear Regression
Notes on Applied Linear Regression Jamie DeCoster Department of Social Psychology Free University Amsterdam Van der Boechorststraat 1 1081 BT Amsterdam The Netherlands phone: +31 (0)20 444-8935 email:
More informationEfficiency and the Cramér-Rao Inequality
Chapter Efficiency and the Cramér-Rao Inequality Clearly we would like an unbiased estimator ˆφ (X of φ (θ to produce, in the long run, estimates which are fairly concentrated i.e. have high precision.
More informationVariations of Statistical Models
38. Statistics 1 38. STATISTICS Revised September 2013 by G. Cowan (RHUL). This chapter gives an overview of statistical methods used in high-energy physics. In statistics, we are interested in using a
More informationPart II. Multiple Linear Regression
Part II Multiple Linear Regression 86 Chapter 7 Multiple Regression A multiple linear regression model is a linear model that describes how a y-variable relates to two or more xvariables (or transformations
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationLecture Notes 1. Brief Review of Basic Probability
Probability Review Lecture Notes Brief Review of Basic Probability I assume you know basic probability. Chapters -3 are a review. I will assume you have read and understood Chapters -3. Here is a very
More information