This note expands on appendix A.7 in Verbeek (2004) on matrix differentiation.

Similar documents
F Matrix Calculus F 1

Matrix Differentiation

1 Introduction to Matrices

Lecture 2 Matrix Operations

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Linearly Independent Sets and Linearly Dependent Sets

DERIVATIVES AS MATRICES; CHAIN RULE

STATISTICS AND DATA ANALYSIS IN GEOLOGY, 3rd ed. Clarificationof zonationprocedure described onpp

Introduction to Matrix Algebra

Typical Linear Equation Set and Corresponding Matrices

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

Recall that two vectors in are perpendicular or orthogonal provided that their dot

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

These axioms must hold for all vectors ū, v, and w in V and all scalars c and d.

Lecture 5 Principal Minors and the Hessian

Simultaneous Prediction of Actual and Average Values of Study Variable Using Stein-rule Estimators

CURVE FITTING LEAST SQUARES APPROXIMATION

Inner Product Spaces

SYSTEMS OF REGRESSION EQUATIONS

Least-Squares Intersection of Lines

Scalar Valued Functions of Several Variables; the Gradient Vector

1 Teaching notes on GMM 1.

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

CONTROLLABILITY. Chapter Reachable Set and Controllability. Suppose we have a linear system described by the state equation

Notes on Factoring. MA 206 Kurt Bryan

POLYNOMIAL AND MULTIPLE REGRESSION. Polynomial regression used to fit nonlinear (e.g. curvilinear) data into a least squares linear regression model.

[1] Diagonal factorization

SAS R IML (Introduction at the Master s Level)

Mean value theorem, Taylors Theorem, Maxima and Minima.

Lecture 8 February 4

13 MATH FACTS a = The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.

Session 9 Case 3: Utilizing Available Software Statistical Analysis

Cross Validation techniques in R: A brief overview of some methods, packages, and functions for assessing prediction models.

SOLVING LINEAR SYSTEMS

3.1 Least squares in matrix form

8 Square matrices continued: Determinants

3 Orthogonal Vectors and Matrices

LS.6 Solution Matrices

Data Mining: Algorithms and Applications Matrix Math Review

Pattern Analysis. Logistic Regression. 12. Mai Joachim Hornegger. Chair of Pattern Recognition Erlangen University

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

Chapter 6: Multivariate Cointegration Analysis

LOGISTIC REGRESSION. Nitin R Patel. where the dependent variable, y, is binary (for convenience we often code these values as

Partial Fractions. Combining fractions over a common denominator is a familiar operation from algebra:

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Name: Section Registered In:

MATHEMATICAL METHODS OF STATISTICS

Linear Algebra Review. Vectors

MATH 304 Linear Algebra Lecture 18: Rank and nullity of a matrix.

5. Orthogonal matrices

Mean squared error matrix comparison of least aquares and Stein-rule estimators for regression coefficients under non-normal disturbances

Nonlinear Programming Methods.S2 Quadratic Programming

Programming Exercise 3: Multi-class Classification and Neural Networks

Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components

Lecture 5: Singular Value Decomposition SVD (1)

The Method of Least Squares

(Quasi-)Newton methods

Number Who Chose This Maximum Amount

Content. Chapter 4 Functions Basic concepts on real functions 62. Credits 11

FIXED EFFECTS AND RELATED ESTIMATORS FOR CORRELATED RANDOM COEFFICIENT AND TREATMENT EFFECT PANEL DATA MODELS

Chapter 3: The Multiple Linear Regression Model

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

1.2 Solving a System of Linear Equations

Chapter 6. Orthogonality

Linear Algebra Notes for Marsden and Tromba Vector Calculus

7. LU factorization. factor-solve method. LU factorization. solving Ax = b with A nonsingular. the inverse of a nonsingular matrix

Matrix Algebra in R A Minimal Introduction

ECON 523 Applied Econometrics I /Masters Level American University, Spring Description of the course

Solution to Homework 2

A linear combination is a sum of scalars times quantities. Such expressions arise quite frequently and have the form

MULTIPLE REGRESSION WITH CATEGORICAL DATA

CS3220 Lecture Notes: QR factorization and orthogonal transformations

Multi-variable Calculus and Optimization

STATISTICA Formula Guide: Logistic Regression. Table of Contents

Linear Algebra Methods for Data Mining

Linear Algebra Review and Reference

Linear Algebra: Vectors

Standard errors of marginal effects in the heteroskedastic probit model

As we have seen, there is a close connection between Legendre symbols of the form

Constrained Least Squares

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Marketing Mix Modelling and Big Data P. M Cain

14. Nonlinear least-squares

by the matrix A results in a vector which is a reflection of the given

Linear Algebra Notes

The Loss in Efficiency from Using Grouped Data to Estimate Coefficients of Group Level Variables. Kathleen M. Lang* Boston College.

Linear-Quadratic Optimal Controller 10.3 Optimal Linear Control Systems

Physics 235 Chapter 1. Chapter 1 Matrices, Vectors, and Vector Calculus

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Torgerson s Classical MDS derivation: 1: Determining Coordinates from Euclidean Distances

Probabilistic Linear Classification: Logistic Regression. Piyush Rai IIT Kanpur

Jim Lambers MAT 169 Fall Semester Lecture 25 Notes

Chapter 10: Basic Linear Unobserved Effects Panel Data. Models:

1 Norms and Vector Spaces

Notes on Applied Linear Regression

CHAPTER 8 FACTOR EXTRACTION BY MATRIX FACTORING TECHNIQUES. From Exploratory Factor Analysis Ledyard R Tucker and Robert C.

x = + x 2 + x

Vectors VECTOR PRODUCT. Graham S McDonald. A Tutorial Module for learning about the vector product of two vectors. Table of contents Begin Tutorial

Transcription:

INTRODUCTION TO VECTOR AND MATRIX DIFFERENTIATION Econometrics 2 Heino Bohn Nielsen Septemer 2, 2005 This note expands on appendix A7 in Vereek (2004) on matrix differentiation We first present the conventions for derivatives of scalar and vector functions; then we present the derivatives of a numer of special functions particularly useful in econometrics, and, finally, we apply the ideas to derive the ordinary least squares (OLS) estimator in the linear regression model We should emphasize that this note is cursory reading; the rules for specific functions needed in this course are indicated with a ( ) Conventions for Scalar Functions Let β (β,,β k ) 0 e a k vector and let f(β) f(β,,β k ) e a real-valued function that depends on β, ie f( ) :R k 7 R maps the vector β into a single numer, f(β) Then the derivative of f( ) with respect to β is defined as k () This is a k column vector with typical elements given y the partial derivative i Sometimes this vector is referred to as the gradient It is useful to rememer that the derivative of a scalar function with respect to a column vector gives a column vector as the result We can note that Wooldridge (2003, p783) does not follow this convention, and let row vector e a k

Similarly, the derivative of a scalar function with respect to a row vector yields the k row vector 0 k 2 Conventions for Vector Functions Now let g (β) g(β) g n (β) e a vector function depending on β (β,,β k ) 0,ie g( ) :R k 7 R n maps the k vector into a n vector, where g i (β) g i (β,,β k ), i, 2,,n, is a real-valued function Since g( ) is a column vector it is natural to consider the derivatives with respect to a row vector, β 0,ie g (β) g (β) g(β) k 0, (2) g n (β) g n (β) k where each row, i, 2,,n, contains the derivative of the scalar function g i ( ) with respect to the elements in β The result is therefore a n k matrix of derivatives with typical element (i, j) given y g i(β) If the vector function is defined as a row vector, it j is natural to take the derivative with respect to the column vector, β We can note that it holds in general that (g(β) 0 ) µ g(β) 0 0, (3) whichinthecaseaoveisak n matrix Applying the conventions in () and (2) we can define the Hessian matrix of second derivatives of a scalar function f(β) as 0 2 f(β) 0 k k k k, which is a k k matrix with typical elements (i, j) given y the second derivative 2 f(β) i j Note that it does not matter if we first take the derivative with respect to the column or the row 2

3 Some Special Functions First, let c e a k vector and let β e a k vector of parameters Next define the scalar function f(β) c 0 β, which maps the k parameters into a single numer It holds that (c 0 β) c ( ) To see this, we can write the function as f(β) c 0 β c β + c 2 β 2 + + c k β k Taking the derivative with respect to β yields (c β +c 2 β 2 ++c k β k ) (c β +c 2 β 2 ++c k β k ) k c c k c, which is a k vector as expected Also note that since β 0 c c 0 β,itholdsthat β 0 c c ( ) Now, let A e a n k matrix and let β e a k vector of parameters Furthermore define the vector function g(β) Aβ, which maps the k parameters into n function values g(β) is an n vector and the derivative with respect to β 0 is a n k matrix given y (Aβ) 0 A ( ) To see this, write the function as A β + A 2 β 2 + + A k β k g(β) Aβ, A n β + A n2 β 2 + + A nk β k and find the derivative (A β ++A k β k ) g(β) 0 (A n β ++A nk β k ) (A β ++A k β k ) k (A n β ++A nk β k ) k A A k A n A nk A Similarly, if we consider the transposed function, g(β) β 0 A 0,whichisa n row vector, we can find the k n matrix of derivatives as β 0 A 0 A 0 ( ) This is just an application of the result in (3) 3

Now consider a quadratic function f(β) β 0 Vβfor some k k matrix V This function maps the k parameters into a single numer Here we find the derivatives as the k column vector (V + V 0 )β, ( ) or the row variant 0 β 0 (V + V 0 ) ( ) If V is symmetric this reduces to 2Vβ and 2β 0 V, respectively To see how this works, consider the simple case k 3and write the function as β 0 Vβ β β 2 β 3 V V 2 V 3 V 2 V 22 V 23 V 3 V 32 V 33 V β 2 + V 22 β 2 2 + V 33 β 2 3 +(V 2 + V 2 )β β 2 +(V 3 + V 3 )β β 3 +(V 23 + V 32 )β 2 β 3 Taking the derivative with respect to β, weget 2 3 β β 2 β 3 2V β +(V 2 + V 2 )β 2 +(V 3 + V 3 )β 3 2V 22 β 2 +(V 2 + V 2 )β +(V 23 + V 32 )β 3 2V 33 β 3 +(V 3 + V 3 )β +(V 23 + V 32 )β 2 2V V 2 + V 2 V 3 + V 3 β V 2 + V 2 2V 22 V 23 + V 32 β 2 V 3 + V 3 V 23 + V 32 2V 33 β 3 V V 2 V 3 V V 2 V 3 V 2 V 22 V 23 + V 2 V 22 V 32 V 3 V 32 V 33 V 3 V 23 V 33 (V + V 0 )β β β 2 β 3 4 The Linear Regression Model To illustrate the use of matrix differentiation consider the linear regression model in matrix notation, Y Xβ +, where Y is a T vector of stacked left-hand-side variales, X is a T k matrix of explanatory variales, β is a k vector of parameters to e estimated, and is a T vector of error terms Here k is the numer of explanatory variales and T is the numer of oservations 4

One way to motivate the ordinary least squares (OLS) principle is to choose the estimator, β OLS of β, as the value that minimizes the sum of squared residuals, ie β OLS argmin β TX t 2 t argmin 0 β Looking at the function to e minimized, we find that 0 Y Xβ 0 Y Xβ Y 0 β 0 0 X Y Xβ Y 0 Y Y 0 Xβ β 0 X 0 Y + β 0 X 0 Xβ Y 0 Y 2Y 0 Xβ + β 0 X 0 Xβ, where the last line uses the fact that Y 0 Xβ and β 0 X 0 Y are identical scalar variales Note that 0 is a scalar function and taking the first derivative with respect to β yields the k vector 0 Y 0 Y 2Y 0 Xβ + β 0 X 0 X β 2X 0 Y +2X 0 Xβ Solving the k equations, ( 0 ) β 0, yields the OLS estimator β OLS X 0 X X 0 Y, provided that X 0 X is non-singular To make sure that β OLS is a minimum of 0 and not a maximum, we should formally take the second derivative and make sure that it is positive definite The k k Hessian matrix of second derivatives is given y 2 0 2X 0 Y +2X 0 X β β 0 0 2X 0 X, which is a positive definite matrix y construction References [] Vereek, Marno (2004): A Guide to Modern Econometrics, Second edition, John Wiley and Sons [2] Wooldridge, Jeffrey M (2003): Introductory Econometrics: A Modern Approach, 2nd edition, South Western College Pulishing 5