Some Properties Of the Gaussian Distribution
|
|
- Clement May
- 7 years ago
- Views:
Transcription
1 Some Properties Of the Gaussian Distribution Jianin Wu GVU Center and College of Computing Georgia Institute of Technology April, 004 Contents Introduction De nition. Univariate Gaussian Multivariate Gaussian Notation and Parameterization 4 4 Linear Operation and Summation 5 4. Univariate case Multivariate Case Geometry and Mahalanobis Distance 6 6 Conditioning 7 7 Product of Gaussians 9 8 Application I: Parameter Estimation 0 8. Maimum Likelihood Estimation Bayesian Parameter Estimation Application II: Kalman Filter 9. The Model The Estimation A Gaussian Integral 4 B Characteristic Functions 5 C Schur Complement and the Matri Inversion Lemma 6
2 D Vector and Matri Derivatives 7 Introduction The Gaussian distribution is the most widely used probability distribution in statistical pattern recognition and machine learning. The nice properties of the Gaussian distribution might be the main reason for its popularity. In this short paper, I try to organize the basic facts about the Gaussian distribution. There is no advanced theory in this paper. However, in order to understand these facts, some linear algebra and multivariate analysis are needed, which are not always covered su ciently in undergraduate tets. The attempt of this paper is to pool these facts together, and hope that it will be useful for new researchers entering this area. De nition. Univariate Gaussian The probability density function of a univariate Gaussian distribution has the following form:! p () p ( ), () in which is the ected value of, and is the variance. We assume that > 0. We have to rst verify that eq. () is a valid density. It is obvious that p () 0 always holds for R. From eq. (96) in Appendi A we know that R t d p t. Applying this equation, we have p () d p which means that p () is a valid density. The density p p! ( ) d () d (3) p p, (4) is called the standard normal density. In Appendi A, it is showed that the mean value and standard deviation of the standard normal distribution are 0 and respectively. By doing a change of variables, it is easy to show that R p () d and R ( ) p () d. I planned to write a short note listing some properties of Gaussian. However, somehow I decided to keep this paper self-containing. The result is that it becomes very fat.
3 . Multivariate Gaussian The probability density function of a multivariate Gaussian distribution has the following form: p () () d jj ( )T ( ), (5) in which is a d-dimensional vector, is the d-dimensional mean vector, and is the d-by-d covariance matri. We assume that is a symmetric, positive de nite matri. We have to rst verify that eq. (5) is a valid probability density function. It is obvious that p () 0 always holds for R d. Net we diagonalize as U T U in which U is an orthogonal matri containing the eigenvectors of, [ ; : : : ; d ] is a diagonal matri containing the eigenvalues of in its diagonal entries and jj jj. Let s de ne a new random vector as y U ( ). (6) If we treat eq. (6) as a change of variables, the determinant of the Jacobian matri will be jj. Now we are ready to calculate the integral p () d () d jj ( )T ( ) d (7) () d jj jj yt y dy (8) dy y p i dy i (9) i dy (0) i in which y i is the ith component of y, i.e. y (y ; : : : ; y d ). This equation gives the validity of the multivariate Gaussian density. Since y is a random vector, it has a density, denoted as p y (y). Using the inverse transform method, we get p y (y) p + U T y U T () U T () d jj () d yt y U T y T U T y () (3) The density de ned by p y (y) d () yt y. (4) 3
4 is called a spherical Gaussian distribution. Let z be a random vector formed by a subset of the components of y. By marginalization it isclear that p (z). Using this fact, y i () jzj zt z, and speci cally p (y i ) p it is straightforward to show that the mean vector and covariance matri of a spherical Gaussian are 0 and I respectively. Using the inverse transform of eq. (6), we can easily calculate the mean vector and covariance matri of the density p (). E E + U T y + E U T y (5) T E ( ) ( ) T E U T y U T y (6) U T E yy T U (7) U T U (8) (9) 3 Notation and Parameterization When we have a density of the form in eq. (5), it is often written as or, N (; ) (0) N (; ; ) () In most cases we will use the mean vector and covariance matri to ress a Gaussian density. This is called the moment parameterization. There is another parameterization of a Gaussian density, called the canonical parameterization. In the canonical parameterization, a Gaussian density is ressed as p () + T T, () in which d log log jj + T is a normalization constant. The parameters in these two representations are related by (3) (4) (5). (6) Notice that there is a confusion in our notation: has di erent meanings in eq. () and eq. (6). In eq. (), is a parameter in the canonical parameterization of a Gaussian density, which is not necessarily diagonal. In eq. (6), is a diagonal matri formed by the eigenvalues of. It is straightforward to show that the moment parameterization and canonical parameterization of the Gaussian distribution are equivalent. In some cases the canonical parameterization is more convenient to use than the moment parameterization. 4
5 4 Linear Operation and Summation 4. Univariate case Suppose N ; and N ; are two independent univariate Gaussian densities. It is obvious that a + b N a + b; a, in which a and b are two scalars. Now consider a random variable z +. The density of z is calculated as: p z (z) p ( ) p ( ) d d (7) z + p ( + ) p ( + ) d 0 d 0 (8) 0 z 0 +0 p ( 0 + ) p (z 0 ) d 0 (9)! (z ) +! (z ) d (30) 0 s (z ) + +! p p (z ) + + (z ) + + C A d (3) (3), (33) in which the step from eq. (3) to eq. (3) used the result of eq. (96). The sum of two univariate Gaussian random variables is again a Gaussian random variable, with the mean value and variance summed up respectively, i.e. z N + ; +. The summation rule is easily generalized to n Gaussians. 4. Multivariate Case Suppose N ( ; ) is a d-dimensional Gaussian density, A is a q-by-d matri and b is a q-dimensional vector, then z A + b is a q-dimensional Gaussian density: z N A + b; AA T. This fact is proved using the characteristic function tool (see Appendi B). 5
6 The characteristic function of z is z (t) E z it T z (34) E it T (A + b) (35) it T b E h i A T t i T (36) it T b i A T t T A T t T A T t (37) it T (A + b) t T AA T t, (38) in which the step from eq. (37) to eq. (38) used the eq. (07) in Appendi B. Appendi B states that if a characteristic function (t) is of the form it T t T t, then the underlying density p () is a Gaussian with mean vector and covariance matri. Applying this fact to eq. (38), we immediately get z N A + b; AA T : (39) Suppose N ( ; ) and N ( ; ) are two independent d- dimensional Gaussian densities, and de ne a new random vector z + y. We can calculate the probability density function p (z) using the same method as we used in the univariate case. However, the calculation is comple and we have to apply the matri inversion lemma in Appendi C. Characteristic function simpli es the calculation. Using eq. (0) in Appendi B, we get z (t) (t) y (t) (40) it T t T t it T t T t (4) it T ( + ) t T ( + ) t, (4) which immediately gives us z N ( + ; + ). The summation of two multivariate Gaussian random variables is as easy to compute as in the univariate case: sum up the mean vectors and covariance matrices. The rule is same for summing up several multivariate Gaussians. Now we have the tool of linear transformation and let s revisit eq. (6). For convenience we retype the equation here: N (; ), and y U ( ). (43) Using the properties of linear transformations on a Gaussian density, y is indeed a Gaussian (in section. we painfully calculate p (y) using the inverse transform method), and has mean vector 0 and covariance matri I. The transformation of applying eq. (6) is called the whitening transformation since the transformed density has an identity covariance matri. 5 Geometry and Mahalanobis Distance Figure shows a bivariate Gaussian density function. Gaussian density has only one mode, which is the mean vector, and the shape of the density is determined 6
7 Figure : Bivariate Gaussian density by the covariance matri. Figure showed the equal probability contour of a bivariate Gaussian density. All points on a given equal probability contour must have the following term evaluated to a constant value: r (; ) ( ) T ( ) c (44) r (; ) is called the Mahalanobis distance from to, given the covariance matri. Eq. (44) de nes a hyperellipsoid in d-dimensional, which means that the equal probability contour of a Gaussian density is a hyperellipsoid in d-dimensional. The principle aes of this hyperellipsoid are given by the eigenvectors of, and the lengths of these aes are proportional to square root of the eigenvalues associated with these eigenvectors. 6 Conditioning Suppose and are two multivariate Gaussian densities, which have a joint density function p () (d+d) jj T!, in which d and d are the dimensionality of and respectively, and. The matrices and are covariance matrices between and, and satisfying T. 7
8 Figure : Equal probability contour of a bivariate Gaussian density The marginal distributions N ( ; ) and N ( ; ) are easy to get from the joint distribution. We are interested in computing the conditional probability density p ( j ). We will need to compute the inverse of, and this task is completed by using the Schur complement (see Appendi C). For notational simplicity, we denote the Schur complement of as S, de ned as S. Similarly, the Schur complement of is S. Applying eq. (0) and noticing that T, we get (writing as 0, and as 0 ) S S T S + T S (45) and T 0T S 0 + 0T T S Thus we can split the joint distribution as p () d js () d j j j T 0 T S 0 + 0T 0. (46) T S 0 + 0! (47) in which we used the fact that jj j j js j (from eq. (9) in Appendi C). 8
9 Since the second term in the right hand side of eq. (47) is the marginal p ( ) and p ( ; ) p ( j ) p ( ), we now get the conditional probability p ( j ) as 0 + T p ( j ) 0 S 0 +! 0, () d js j (48) or j N + 0 ; S (49) N + ( ) ; (50) 7 Product of Gaussians Suppose p () N (; ; ) and p () N (; ; ) are two independent d-dimensional Gaussian densities. Sometimes we want to compute the density which is proportional to the product of the two Gaussian densities, i.e. p () p () p (), in which is a proper normalization constant to make p (z) a valid density function. In this task the canonical notation (see section 3) will be etremely helpful. Writing the two Gaussians in canonical form p () + T (5) p () the density p (z) is easy to compute, as + T p (z) p () p () 0 + ( + ) T T T, (5) T ( + ). (53) This equation states that in the canonical parameterization, in order to compute product of Gaussians, we just sum the parameters. This result is readily etended to the product of n Gaussians. Suppose we have n Gaussian distributions p i (), whose parameters in the canonical parameterization are i and i, i ; ; : : : ; n. Then p () n i p i () is also a Gaussian density, given by p () 0 + ( n i i ) T T ( n i i ). (54) Now let s go back to the moment parameterization. Suppose we have n Gaussian distributions p i (), in which p i () N (; i ; i ), i ; ; : : : ; n. Then p () n i p i () is Gaussian, p () N (; ; ) (55) 9
10 where + + : : : + n (56) + + : : : + n n (57) Now we have listed some properties of the Gaussian distribution. Net let s see how these properties are applied. 8 Application I: Parameter Estimation 8. Maimum Likelihood Estimation Let us suppose that we have a d-dimensional multivariate Gaussian density N (; ), and n i.i.d (independently, identically distributed) samples D f ; ; : : : ; n g sampled from this distribution. The task is to estimate the parameters and. The log-likelihood function of observing the data set D given parameters and is: l (; jd) (58) ny log p ( i ) (59) i nd log () + n log nx ( i ) T ( i ). (60) Taking derivative of the likelihood function with respect to and gives (see Appendi D): X ( i ) n nx ( i ) ( i ) T, (6) i in which eq. (6) used eq. (5) and the chain rule, and eq. (6) used eq. (33), eq. (34) and the fact that T. The notation in eq. (6) is a little bit confusing. There are two in the right hand side, the rst represents a summation and the second represents the covariance matri. In order to nd the maimum likelihood solution, we want to nd the maimum of the likelihood function. Setting both eq. (6) and eq. (6) to 0 gives us the solution: ML nx i (63) n ML n i nx ( i ML ) ( i ML ) T (64) i 0
11 Eq. (63) and eq. (64) clearly states that the maimum likelihood estimation of the mean vector and covariance matri are just the sample mean and sample covariance matri respectively. 8. Bayesian Parameter Estimation In Bayesian estimation, we assume that the covariance matri is known. Let us suppose that we have a d-dimensional multivariate Gaussian density N (; ), and n i.i.d samples D f ; ; : : : ; n g sampled from this distribution. We also need a prior on the parameter. Let s assume that the prior is that N ( 0 ; 0 ). The task is to estimate the parameters. Note that we assume 0, 0, and are all known. The only parameter to be estimated is the mean vector. In Bayesian estimation, instead of nd a point ^ in the parameter space that gives maimum likelihood, we calculate the posterior density for the parameter p (jd), and use the entire distribution of as our estimation for this parameter. Applying the Bayes law, p (jd) p (Dj) p () (65) ny p ( i ) p 0 () (66) i in which is a normalization constant which does not depend on. Apply the result in section 7, we know that p (jd) is also a Gaussian, and where p (jd) N (; n ; n ) (67) n n + 0 (68) n n n (69) Both n and n can be calculated from know parameters. So we have determined the posterior distribution p (jd) for given the data set D. We choose a Gaussian to be the prior family. Usually, the prior distribution is chosen such that the posterior belongs to the same functional form as the prior. A prior and posterior chosen in this way are said to be conjugate. We have seen that Gaussian have the nice property that both the prior and posterior are Gaussian, i.e. Gaussian is auto-conjugate. After p (jd) is determined, a new sample is classi ed by calculating the probability p (jd) p (j) p (jd) d. (70) Eq. (70) and eq. (7) has the same form. Thus we can guess that p (jd) is a Gaussian again, and p (jd) N (; n ; + n ). (7)
12 The guess is correct, and is easy to verify it by repeating the steps in eq.(7) through eq. (33). 9 Application II: Kalman Filter 9. The Model The Kalman lter address the problem of estimating a state vector in a discrete time process, given a linear dynamic model and a linear measurement model k A k + w k, (7) z k H k + v k. (73) The process noise w k and measurement noise v k are assumed to be Gaussian: w N (0; Q) (74) v N (0; R) (75) At time k, assume we know the distribution of k, the task is to estimate the posterior probability of k at time k, given the current observation z k and the previous state estimation p ( k ). In a broader point of view, the task can be formulated as estimating the posterior probability of k at time k, given all the previous state estimates and all the observations up to time step k. Under certain Markovian assumption, it is not hard to prove that the two problem formulations are equivalent. In the Kalman lter setup, we assume the prior is Gaussian, i.e. at time t 0, p ( 0 ) N (; 0 ; P 0 ). Instead of ; here we use P to represent a covariance matri, in order to match the notations in the Kalman lter literature. 9. The Estimation I will show that, with the help of the properties of Gaussians we have obtained, it is quite easy to derive the Kalman lter equations. The Kalman lter can be separated in two (related) steps. In the rst step, based on the estimation p ( k ) and the dynamic model (7), we get an estimate p k. Note that the minus sign means that this estimation is done before we take into account the measurement. In the second step, based on p k and the measurement model (73), we get the nal estimation p (k ). First, let s estimate p k. Assume that at time k, the estimation we already got is a Gaussian p ( k ) N ; k ; P k : (76)
13 This assumption coincides well with the prior p ( 0 ). We will show that, under this assumption, after the Kalman lter update, p ( k ) will also become a Gaussian, and this makes the assumption reasonable. Applying the linear operation equation (39) on the dynamic model (7), we immediately get the estimation for k : k N k ; P k (77) k A k (78) P k AP k A T + Q (79) The estimate p k conditioned on the observation zk gives p ( k ), the estimation we want. Thus the conditioning property (50) can be used. In order to use eq. (50), we have to build the joint covariance matri rst. Since Cov (z k ) HP k H T + R (applying eq. (39) to eq. (73)) and Cov z k ; k Cov H k + v k ; k (80) Cov H k ; k (8) HP k, (8) the joint covariance matri of k ; z k is: Pk P k H T HP k HP k H T + R. (83) Applying the conditioning equation (50), we get p ( k ) p k jz k (84) N ( k ; P k ) (85) P k P k P k H T HP k H T + R HPk (86) k k + P k H T HP k H T + R zk H k (87) The equations (7779) and (8487) are the Kalman lter updating rules. The term P k H T HP k H T + R appears in both eq. (86) and eq. (87). De ning K k P k H T HP k H T + R, (88) the equations are simpli ed as P k (I K k H) P k (89) k k + K k z k H k. (90) The term K k is called the Kalman gain matri, and the term z k the innovation. H k is called 3
14 A Gaussian Integral We will compute the integral of the univariate Gaussian density in this section. The trick in doing this integration is to consider two independent univariate gaussians at one time. e d s e d e y dy (9) s e ( +y ) ddy (9) s 0 0 re r drd (93) r [ e r ] 0 (94) p, (95) in eq. (93) the polar coordinates are used, and the etra r appeared inside the integral is the determinant of the Jacobian matri. The above integral can be easily etend as f (t) d p t (96) t in which we assume t > 0. Then we have df d dt dt t t t and d (97) d (98) df dt d r p t dt t. (99) The above two equations give us r d t t t. (00) Applying eq. (00), we have p d p r 4 (0) 4
15 It is obvious that since p is an odd function. d 0 (0) Eq. (0) and eq. (0) have proved that the mean and standard deviation of a standard normal distribution are 0 and respectively. B Characteristic Functions The characteristic function of a probability density p () is de ned as its Fourier transform (t) E it T (03) in which i p. Let s computer the characteristic function of a Gaussian density. (t) E it T (04) () d jj ( )T ( ) + it T d (05) it T t T t it T it d (06) () d jj it T t T t (07) Since the characteristic function is de ned as a Fourier transform, the inverse Fourier transform of (t) will be eactly p (), i.e. a density is completely determined by its characteristic function. When we see a characteristic function (t) is of the form it T t T t, we know that the underlying density is a Gaussian with mean vector and covariance matri. Suppose and y are two independent random vectors with the same dimension, and de ne a new random vector z + y. Then p z (z) p () p y (y) ddy (08) z+y p () p y (z ) d. (09) Since convolution in the function space is a product in the Fourier space, we have z (z) () y (y), (0) which means that the characteristic function of the sum of two independent random variables is just product of the characteristic functions of the summands. 5
16 C Schur Complement and the Matri Inversion Lemma The Schur complement is very useful in computing the inverse of a block matri. Suppose M is a block matri ressed as A B M, () C D in which A and D are non-singular square matrices. We want to compute M. Some algebraic manipulations give I 0 I A CA M B () I 0 I I 0 A B I A B CA (3) I C D 0 I A B I A B 0 D CA (4) B 0 I A 0 A 0 0 D CA, (5) B 0 S A in which the term D CA B is called the Schur complement of A, denoted as S A. Equation XMY implies that M Y X. Hence we have M I A B A 0 0 I 0 S A A A BS A I 0 0 S A CA I A + A BS A CA A BS A S A CA S A Taking the determinant of both sides of eq. (6), it gives I 0 CA I (6) (7) (8) jmj jaj js A j. (9) We can also compute M by using the Schur complement of D, in the following way: M S D S D BD D CS D D + D CS D BD (0) jmj jdj js D j. () Eq. (8) and eq. (0) are two di erent representations of the same matri M, which means the corresponding blocks in these two equations must be 6
17 equal, e.g. S D A + A BS A CA. This result is known as the matri inversion lemma: S D A BD C A + A B D CA B CA. () The following result, which comes from equating the upper right block is also useful: A B D CA B A BD C BD. (3) This formula and the matri inversion lemma are useful in the derivation of the Kalman lter equations. D Vector and Matri Derivatives Suppose y is a scalar, A is a matri, and and y are vectors. The partial derivative of y with respect to A is de @a ij where a ij the i; j-th component of the matri A. From the de nition (4), it is easy to get the following yt y (5) For a square matri A that is n-by-n, the matri M ij de ned by removing from A the i-th row and j-th column is called a minor of A. The scalar c ij ( ) i+j M ij is called a cofactor of A. The matri A cof with c ij in its i; j-th entry is called the cofactor matri of A. Finally, the adjoint matri of A is de ned as transpose of the cofactor matri A adj A T cof. (6) There are some well-known facts about the minors, determinant, and adjoint of a matri: jaj X j a ij c ij (7) A jaj A adj. (8) Since M ij has removed the i-th row, it does not depend on a ij, neither does c ij. Thus, we ij c ij, jaj A cof (30) 7
18 which in turn shows that A, A T adj (3) jaj A T. (3) Using the chain rule, we immediately get that for a positive de log jaj A T. (33) Applying the de nition (4), it is easy to show that for a square T A T, (34) since T A i j a ij i j where ( ; ; : : : ; n ). References [] C. M. Bishop. Neural Networks for Pattern Recognition, Oford University Press, Oford, UK, 996. [] R. O. Duda, P. E. Hart, and D. G. Stork. Pattern Classi cation, second edition, Wiley-Interscience, New York, NY, USA, 00. [3] [4] M. I. Jordan. An Introduction to Probabilistic Graphical Models, chapter 3, draft. [5] G. Welch and G. Bishop. An Introduction to the Kalman Filter, TR 95-04, Department of Computer Science, University of North Carolina at Chapel Hill. 8
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS 1. SYSTEMS OF EQUATIONS AND MATRICES 1.1. Representation of a linear system. The general system of m equations in n unknowns can be written a 11 x 1 + a 12 x 2 +
More informationMATRIX ALGEBRA AND SYSTEMS OF EQUATIONS
MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS Systems of Equations and Matrices Representation of a linear system The general system of m equations in n unknowns can be written a x + a 2 x 2 + + a n x n b a
More informationIntroduction to Matrix Algebra
Psychology 7291: Multivariate Statistics (Carey) 8/27/98 Matrix Algebra - 1 Introduction to Matrix Algebra Definitions: A matrix is a collection of numbers ordered by rows and columns. It is customary
More informationSolution to Homework 2
Solution to Homework 2 Olena Bormashenko September 23, 2011 Section 1.4: 1(a)(b)(i)(k), 4, 5, 14; Section 1.5: 1(a)(b)(c)(d)(e)(n), 2(a)(c), 13, 16, 17, 18, 27 Section 1.4 1. Compute the following, if
More informationThe Characteristic Polynomial
Physics 116A Winter 2011 The Characteristic Polynomial 1 Coefficients of the characteristic polynomial Consider the eigenvalue problem for an n n matrix A, A v = λ v, v 0 (1) The solution to this problem
More information15.062 Data Mining: Algorithms and Applications Matrix Math Review
.6 Data Mining: Algorithms and Applications Matrix Math Review The purpose of this document is to give a brief review of selected linear algebra concepts that will be useful for the course and to develop
More informationComponent Ordering in Independent Component Analysis Based on Data Power
Component Ordering in Independent Component Analysis Based on Data Power Anne Hendrikse Raymond Veldhuis University of Twente University of Twente Fac. EEMCS, Signals and Systems Group Fac. EEMCS, Signals
More informationAu = = = 3u. Aw = = = 2w. so the action of A on u and w is very easy to picture: it simply amounts to a stretching by 3 and 2, respectively.
Chapter 7 Eigenvalues and Eigenvectors In this last chapter of our exploration of Linear Algebra we will revisit eigenvalues and eigenvectors of matrices, concepts that were already introduced in Geometry
More informationLinear Algebra Notes for Marsden and Tromba Vector Calculus
Linear Algebra Notes for Marsden and Tromba Vector Calculus n-dimensional Euclidean Space and Matrices Definition of n space As was learned in Math b, a point in Euclidean three space can be thought of
More information1. Introduction to multivariate data
. Introduction to multivariate data. Books Chat eld, C. and A.J.Collins, Introduction to multivariate analysis. Chapman & Hall Krzanowski, W.J. Principles of multivariate analysis. Oxford.000 Johnson,
More informationCITY UNIVERSITY LONDON. BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION
No: CITY UNIVERSITY LONDON BEng Degree in Computer Systems Engineering Part II BSc Degree in Computer Systems Engineering Part III PART 2 EXAMINATION ENGINEERING MATHEMATICS 2 (resit) EX2005 Date: August
More informationLinear Threshold Units
Linear Threshold Units w x hx (... w n x n w We assume that each feature x j and each weight w j is a real number (we will relax this later) We will study three different algorithms for learning linear
More information1 Introduction to Matrices
1 Introduction to Matrices In this section, important definitions and results from matrix algebra that are useful in regression analysis are introduced. While all statements below regarding the columns
More informationF Matrix Calculus F 1
F Matrix Calculus F 1 Appendix F: MATRIX CALCULUS TABLE OF CONTENTS Page F1 Introduction F 3 F2 The Derivatives of Vector Functions F 3 F21 Derivative of Vector with Respect to Vector F 3 F22 Derivative
More informationMATH 551 - APPLIED MATRIX THEORY
MATH 55 - APPLIED MATRIX THEORY FINAL TEST: SAMPLE with SOLUTIONS (25 points NAME: PROBLEM (3 points A web of 5 pages is described by a directed graph whose matrix is given by A Do the following ( points
More informationSimilarity and Diagonalization. Similar Matrices
MATH022 Linear Algebra Brief lecture notes 48 Similarity and Diagonalization Similar Matrices Let A and B be n n matrices. We say that A is similar to B if there is an invertible n n matrix P such that
More informationMultivariate Analysis of Variance (MANOVA): I. Theory
Gregory Carey, 1998 MANOVA: I - 1 Multivariate Analysis of Variance (MANOVA): I. Theory Introduction The purpose of a t test is to assess the likelihood that the means for two groups are sampled from the
More informationInner Product Spaces
Math 571 Inner Product Spaces 1. Preliminaries An inner product space is a vector space V along with a function, called an inner product which associates each pair of vectors u, v with a scalar u, v, and
More informationSection 6.1 - Inner Products and Norms
Section 6.1 - Inner Products and Norms Definition. Let V be a vector space over F {R, C}. An inner product on V is a function that assigns, to every ordered pair of vectors x and y in V, a scalar in F,
More information1 2 3 1 1 2 x = + x 2 + x 4 1 0 1
(d) If the vector b is the sum of the four columns of A, write down the complete solution to Ax = b. 1 2 3 1 1 2 x = + x 2 + x 4 1 0 0 1 0 1 2. (11 points) This problem finds the curve y = C + D 2 t which
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2015 Timo Koski Matematisk statistik 24.09.2015 1 / 1 Learning outcomes Random vectors, mean vector, covariance matrix,
More informationSF2940: Probability theory Lecture 8: Multivariate Normal Distribution
SF2940: Probability theory Lecture 8: Multivariate Normal Distribution Timo Koski 24.09.2014 Timo Koski () Mathematisk statistik 24.09.2014 1 / 75 Learning outcomes Random vectors, mean vector, covariance
More informationOverview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model
Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model 1 September 004 A. Introduction and assumptions The classical normal linear regression model can be written
More informationExact Nonparametric Tests for Comparing Means - A Personal Summary
Exact Nonparametric Tests for Comparing Means - A Personal Summary Karl H. Schlag European University Institute 1 December 14, 2006 1 Economics Department, European University Institute. Via della Piazzuola
More informationNotes on Determinant
ENGG2012B Advanced Engineering Mathematics Notes on Determinant Lecturer: Kenneth Shum Lecture 9-18/02/2013 The determinant of a system of linear equations determines whether the solution is unique, without
More informationMATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.
MATH10212 Linear Algebra Textbook: D. Poole, Linear Algebra: A Modern Introduction. Thompson, 2006. ISBN 0-534-40596-7. Systems of Linear Equations Definition. An n-dimensional vector is a row or a column
More informationMATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).
MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors Jordan canonical form (continued) Jordan canonical form A Jordan block is a square matrix of the form λ 1 0 0 0 0 λ 1 0 0 0 0 λ 0 0 J = 0
More informationLINEAR ALGEBRA. September 23, 2010
LINEAR ALGEBRA September 3, 00 Contents 0. LU-decomposition.................................... 0. Inverses and Transposes................................. 0.3 Column Spaces and NullSpaces.............................
More informationGaussian Conjugate Prior Cheat Sheet
Gaussian Conjugate Prior Cheat Sheet Tom SF Haines 1 Purpose This document contains notes on how to handle the multivariate Gaussian 1 in a Bayesian setting. It focuses on the conjugate prior, its Bayesian
More informationChapter 12 Modal Decomposition of State-Space Models 12.1 Introduction The solutions obtained in previous chapters, whether in time domain or transfor
Lectures on Dynamic Systems and Control Mohammed Dahleh Munther A. Dahleh George Verghese Department of Electrical Engineering and Computer Science Massachuasetts Institute of Technology 1 1 c Chapter
More informationy t by left multiplication with 1 (L) as y t = 1 (L) t =ª(L) t 2.5 Variance decomposition and innovation accounting Consider the VAR(p) model where
. Variance decomposition and innovation accounting Consider the VAR(p) model where (L)y t = t, (L) =I m L L p L p is the lag polynomial of order p with m m coe±cient matrices i, i =,...p. Provided that
More information160 CHAPTER 4. VECTOR SPACES
160 CHAPTER 4. VECTOR SPACES 4. Rank and Nullity In this section, we look at relationships between the row space, column space, null space of a matrix and its transpose. We will derive fundamental results
More informationEigenvalues, Eigenvectors, Matrix Factoring, and Principal Components
Eigenvalues, Eigenvectors, Matrix Factoring, and Principal Components The eigenvalues and eigenvectors of a square matrix play a key role in some important operations in statistics. In particular, they
More informationCAPM, Arbitrage, and Linear Factor Models
CAPM, Arbitrage, and Linear Factor Models CAPM, Arbitrage, Linear Factor Models 1/ 41 Introduction We now assume all investors actually choose mean-variance e cient portfolios. By equating these investors
More informationBasics of Statistical Machine Learning
CS761 Spring 2013 Advanced Machine Learning Basics of Statistical Machine Learning Lecturer: Xiaojin Zhu jerryzhu@cs.wisc.edu Modern machine learning is rooted in statistics. You will find many familiar
More informationLinear Algebra Review. Vectors
Linear Algebra Review By Tim K. Marks UCSD Borrows heavily from: Jana Kosecka kosecka@cs.gmu.edu http://cs.gmu.edu/~kosecka/cs682.html Virginia de Sa Cogsci 8F Linear Algebra review UCSD Vectors The length
More informationc 2008 Je rey A. Miron We have described the constraints that a consumer faces, i.e., discussed the budget constraint.
Lecture 2b: Utility c 2008 Je rey A. Miron Outline: 1. Introduction 2. Utility: A De nition 3. Monotonic Transformations 4. Cardinal Utility 5. Constructing a Utility Function 6. Examples of Utility Functions
More informationInner Product Spaces and Orthogonality
Inner Product Spaces and Orthogonality week 3-4 Fall 2006 Dot product of R n The inner product or dot product of R n is a function, defined by u, v a b + a 2 b 2 + + a n b n for u a, a 2,, a n T, v b,
More informationMultivariate Normal Distribution
Multivariate Normal Distribution Lecture 4 July 21, 2011 Advanced Multivariate Statistical Methods ICPSR Summer Session #2 Lecture #4-7/21/2011 Slide 1 of 41 Last Time Matrices and vectors Eigenvalues
More informationThe Determinant: a Means to Calculate Volume
The Determinant: a Means to Calculate Volume Bo Peng August 20, 2007 Abstract This paper gives a definition of the determinant and lists many of its well-known properties Volumes of parallelepipeds are
More informationReview Jeopardy. Blue vs. Orange. Review Jeopardy
Review Jeopardy Blue vs. Orange Review Jeopardy Jeopardy Round Lectures 0-3 Jeopardy Round $200 How could I measure how far apart (i.e. how different) two observations, y 1 and y 2, are from each other?
More informationStatistical Machine Learning
Statistical Machine Learning UoC Stats 37700, Winter quarter Lecture 4: classical linear and quadratic discriminants. 1 / 25 Linear separation For two classes in R d : simple idea: separate the classes
More information1 Another method of estimation: least squares
1 Another method of estimation: least squares erm: -estim.tex, Dec8, 009: 6 p.m. (draft - typos/writos likely exist) Corrections, comments, suggestions welcome. 1.1 Least squares in general Assume Y i
More information13 MATH FACTS 101. 2 a = 1. 7. The elements of a vector have a graphical interpretation, which is particularly easy to see in two or three dimensions.
3 MATH FACTS 0 3 MATH FACTS 3. Vectors 3.. Definition We use the overhead arrow to denote a column vector, i.e., a linear segment with a direction. For example, in three-space, we write a vector in terms
More informationSome probability and statistics
Appendix A Some probability and statistics A Probabilities, random variables and their distribution We summarize a few of the basic concepts of random variables, usually denoted by capital letters, X,Y,
More informationDATA ANALYSIS II. Matrix Algorithms
DATA ANALYSIS II Matrix Algorithms Similarity Matrix Given a dataset D = {x i }, i=1,..,n consisting of n points in R d, let A denote the n n symmetric similarity matrix between the points, given as where
More informationNOTES ON LINEAR TRANSFORMATIONS
NOTES ON LINEAR TRANSFORMATIONS Definition 1. Let V and W be vector spaces. A function T : V W is a linear transformation from V to W if the following two properties hold. i T v + v = T v + T v for all
More informationVector and Matrix Norms
Chapter 1 Vector and Matrix Norms 11 Vector Spaces Let F be a field (such as the real numbers, R, or complex numbers, C) with elements called scalars A Vector Space, V, over the field F is a non-empty
More informationSimilar matrices and Jordan form
Similar matrices and Jordan form We ve nearly covered the entire heart of linear algebra once we ve finished singular value decompositions we ll have seen all the most central topics. A T A is positive
More informationUnderstanding and Applying Kalman Filtering
Understanding and Applying Kalman Filtering Lindsay Kleeman Department of Electrical and Computer Systems Engineering Monash University, Clayton 1 Introduction Objectives: 1. Provide a basic understanding
More informationIntroduction to General and Generalized Linear Models
Introduction to General and Generalized Linear Models General Linear Models - part I Henrik Madsen Poul Thyregod Informatics and Mathematical Modelling Technical University of Denmark DK-2800 Kgs. Lyngby
More informationChapter 6. Orthogonality
6.3 Orthogonal Matrices 1 Chapter 6. Orthogonality 6.3 Orthogonal Matrices Definition 6.4. An n n matrix A is orthogonal if A T A = I. Note. We will see that the columns of an orthogonal matrix must be
More informationBias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes
Bias in the Estimation of Mean Reversion in Continuous-Time Lévy Processes Yong Bao a, Aman Ullah b, Yun Wang c, and Jun Yu d a Purdue University, IN, USA b University of California, Riverside, CA, USA
More informationMATHEMATICS FOR ENGINEERS BASIC MATRIX THEORY TUTORIAL 2
MATHEMATICS FO ENGINEES BASIC MATIX THEOY TUTOIAL This is the second of two tutorials on matrix theory. On completion you should be able to do the following. Explain the general method for solving simultaneous
More informationMatrix Differentiation
1 Introduction Matrix Differentiation ( and some other stuff ) Randal J. Barnes Department of Civil Engineering, University of Minnesota Minneapolis, Minnesota, USA Throughout this presentation I have
More informationChapter 3: The Multiple Linear Regression Model
Chapter 3: The Multiple Linear Regression Model Advanced Econometrics - HEC Lausanne Christophe Hurlin University of Orléans November 23, 2013 Christophe Hurlin (University of Orléans) Advanced Econometrics
More informationSTA 4273H: Statistical Machine Learning
STA 4273H: Statistical Machine Learning Russ Salakhutdinov Department of Statistics! rsalakhu@utstat.toronto.edu! http://www.cs.toronto.edu/~rsalakhu/ Lecture 6 Three Approaches to Classification Construct
More informationRecall that two vectors in are perpendicular or orthogonal provided that their dot
Orthogonal Complements and Projections Recall that two vectors in are perpendicular or orthogonal provided that their dot product vanishes That is, if and only if Example 1 The vectors in are orthogonal
More information11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial Least Squares Regression
Frank C Porter and Ilya Narsky: Statistical Analysis Techniques in Particle Physics Chap. c11 2013/9/9 page 221 le-tex 221 11 Linear and Quadratic Discriminant Analysis, Logistic Regression, and Partial
More informationMultivariate normal distribution and testing for means (see MKB Ch 3)
Multivariate normal distribution and testing for means (see MKB Ch 3) Where are we going? 2 One-sample t-test (univariate).................................................. 3 Two-sample t-test (univariate).................................................
More informationCONTROLLABILITY. Chapter 2. 2.1 Reachable Set and Controllability. Suppose we have a linear system described by the state equation
Chapter 2 CONTROLLABILITY 2 Reachable Set and Controllability Suppose we have a linear system described by the state equation ẋ Ax + Bu (2) x() x Consider the following problem For a given vector x in
More informationDecember 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS
December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B KITCHENS The equation 1 Lines in two-dimensional space (1) 2x y = 3 describes a line in two-dimensional space The coefficients of x and y in the equation
More informationby the matrix A results in a vector which is a reflection of the given
Eigenvalues & Eigenvectors Example Suppose Then So, geometrically, multiplying a vector in by the matrix A results in a vector which is a reflection of the given vector about the y-axis We observe that
More informationInner products on R n, and more
Inner products on R n, and more Peyam Ryan Tabrizian Friday, April 12th, 2013 1 Introduction You might be wondering: Are there inner products on R n that are not the usual dot product x y = x 1 y 1 + +
More informationUNCOUPLING THE PERRON EIGENVECTOR PROBLEM
UNCOUPLING THE PERRON EIGENVECTOR PROBLEM Carl D Meyer INTRODUCTION Foranonnegative irreducible matrix m m with spectral radius ρ,afundamental problem concerns the determination of the unique normalized
More information1.2 Solving a System of Linear Equations
1.. SOLVING A SYSTEM OF LINEAR EQUATIONS 1. Solving a System of Linear Equations 1..1 Simple Systems - Basic De nitions As noticed above, the general form of a linear system of m equations in n variables
More informationChapter 1. Vector autoregressions. 1.1 VARs and the identi cation problem
Chapter Vector autoregressions We begin by taking a look at the data of macroeconomics. A way to summarize the dynamics of macroeconomic data is to make use of vector autoregressions. VAR models have become
More informationa 11 x 1 + a 12 x 2 + + a 1n x n = b 1 a 21 x 1 + a 22 x 2 + + a 2n x n = b 2.
Chapter 1 LINEAR EQUATIONS 1.1 Introduction to linear equations A linear equation in n unknowns x 1, x,, x n is an equation of the form a 1 x 1 + a x + + a n x n = b, where a 1, a,..., a n, b are given
More informationOctober 3rd, 2012. Linear Algebra & Properties of the Covariance Matrix
Linear Algebra & Properties of the Covariance Matrix October 3rd, 2012 Estimation of r and C Let rn 1, rn, t..., rn T be the historical return rates on the n th asset. rn 1 rṇ 2 r n =. r T n n = 1, 2,...,
More informationBayes and Naïve Bayes. cs534-machine Learning
Bayes and aïve Bayes cs534-machine Learning Bayes Classifier Generative model learns Prediction is made by and where This is often referred to as the Bayes Classifier, because of the use of the Bayes rule
More informationLecture 1: Systems of Linear Equations
MTH Elementary Matrix Algebra Professor Chao Huang Department of Mathematics and Statistics Wright State University Lecture 1 Systems of Linear Equations ² Systems of two linear equations with two variables
More informationIntroduction to Matrices for Engineers
Introduction to Matrices for Engineers C.T.J. Dodson, School of Mathematics, Manchester Universit 1 What is a Matrix? A matrix is a rectangular arra of elements, usuall numbers, e.g. 1 0-8 4 0-1 1 0 11
More informationLinear Algebra I. Ronald van Luijk, 2012
Linear Algebra I Ronald van Luijk, 2012 With many parts from Linear Algebra I by Michael Stoll, 2007 Contents 1. Vector spaces 3 1.1. Examples 3 1.2. Fields 4 1.3. The field of complex numbers. 6 1.4.
More informationMatrices 2. Solving Square Systems of Linear Equations; Inverse Matrices
Matrices 2. Solving Square Systems of Linear Equations; Inverse Matrices Solving square systems of linear equations; inverse matrices. Linear algebra is essentially about solving systems of linear equations,
More informationThe Bivariate Normal Distribution
The Bivariate Normal Distribution This is Section 4.7 of the st edition (2002) of the book Introduction to Probability, by D. P. Bertsekas and J. N. Tsitsiklis. The material in this section was not included
More informationSection 2.4: Equations of Lines and Planes
Section.4: Equations of Lines and Planes An equation of three variable F (x, y, z) 0 is called an equation of a surface S if For instance, (x 1, y 1, z 1 ) S if and only if F (x 1, y 1, z 1 ) 0. x + y
More informationOrthogonal Projections
Orthogonal Projections and Reflections (with exercises) by D. Klain Version.. Corrections and comments are welcome! Orthogonal Projections Let X,..., X k be a family of linearly independent (column) vectors
More informationLinear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University
Linear Algebra Done Wrong Sergei Treil Department of Mathematics, Brown University Copyright c Sergei Treil, 2004, 2009, 2011, 2014 Preface The title of the book sounds a bit mysterious. Why should anyone
More informationCS 2750 Machine Learning. Lecture 1. Machine Learning. http://www.cs.pitt.edu/~milos/courses/cs2750/ CS 2750 Machine Learning.
Lecture Machine Learning Milos Hauskrecht milos@cs.pitt.edu 539 Sennott Square, x5 http://www.cs.pitt.edu/~milos/courses/cs75/ Administration Instructor: Milos Hauskrecht milos@cs.pitt.edu 539 Sennott
More informationLS.6 Solution Matrices
LS.6 Solution Matrices In the literature, solutions to linear systems often are expressed using square matrices rather than vectors. You need to get used to the terminology. As before, we state the definitions
More informationPartial Derivatives. @x f (x; y) = @ x f (x; y) @x x2 y + @ @x y2 and then we evaluate the derivative as if y is a constant.
Partial Derivatives Partial Derivatives Just as derivatives can be used to eplore the properties of functions of 1 variable, so also derivatives can be used to eplore functions of 2 variables. In this
More informationOrthogonal Diagonalization of Symmetric Matrices
MATH10212 Linear Algebra Brief lecture notes 57 Gram Schmidt Process enables us to find an orthogonal basis of a subspace. Let u 1,..., u k be a basis of a subspace V of R n. We begin the process of finding
More informationMath 312 Homework 1 Solutions
Math 31 Homework 1 Solutions Last modified: July 15, 01 This homework is due on Thursday, July 1th, 01 at 1:10pm Please turn it in during class, or in my mailbox in the main math office (next to 4W1) Please
More informationNormalization and Mixed Degrees of Integration in Cointegrated Time Series Systems
Normalization and Mixed Degrees of Integration in Cointegrated Time Series Systems Robert J. Rossana Department of Economics, 04 F/AB, Wayne State University, Detroit MI 480 E-Mail: r.j.rossana@wayne.edu
More information2x + y = 3. Since the second equation is precisely the same as the first equation, it is enough to find x and y satisfying the system
1. Systems of linear equations We are interested in the solutions to systems of linear equations. A linear equation is of the form 3x 5y + 2z + w = 3. The key thing is that we don t multiply the variables
More informationA Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution
A Primer on Mathematical Statistics and Univariate Distributions; The Normal Distribution; The GLM with the Normal Distribution PSYC 943 (930): Fundamentals of Multivariate Modeling Lecture 4: September
More informationChapter 17. Orthogonal Matrices and Symmetries of Space
Chapter 17. Orthogonal Matrices and Symmetries of Space Take a random matrix, say 1 3 A = 4 5 6, 7 8 9 and compare the lengths of e 1 and Ae 1. The vector e 1 has length 1, while Ae 1 = (1, 4, 7) has length
More informationLinear Algebra Done Wrong. Sergei Treil. Department of Mathematics, Brown University
Linear Algebra Done Wrong Sergei Treil Department of Mathematics, Brown University Copyright c Sergei Treil, 2004, 2009, 2011, 2014 Preface The title of the book sounds a bit mysterious. Why should anyone
More informationMathematics Course 111: Algebra I Part IV: Vector Spaces
Mathematics Course 111: Algebra I Part IV: Vector Spaces D. R. Wilkins Academic Year 1996-7 9 Vector Spaces A vector space over some field K is an algebraic structure consisting of a set V on which are
More informationLinear Algebra and TI 89
Linear Algebra and TI 89 Abdul Hassen and Jay Schiffman This short manual is a quick guide to the use of TI89 for Linear Algebra. We do this in two sections. In the first section, we will go over the editing
More informationAn Introduction to the Kalman Filter
An Introduction to the Kalman Filter Greg Welch 1 and Gary Bishop 2 TR 95041 Department of Computer Science University of North Carolina at Chapel Hill Chapel Hill, NC 275993175 Updated: Monday, July 24,
More informationT ( a i x i ) = a i T (x i ).
Chapter 2 Defn 1. (p. 65) Let V and W be vector spaces (over F ). We call a function T : V W a linear transformation form V to W if, for all x, y V and c F, we have (a) T (x + y) = T (x) + T (y) and (b)
More information[1] Diagonal factorization
8.03 LA.6: Diagonalization and Orthogonal Matrices [ Diagonal factorization [2 Solving systems of first order differential equations [3 Symmetric and Orthonormal Matrices [ Diagonal factorization Recall:
More informationApplied Linear Algebra I Review page 1
Applied Linear Algebra Review 1 I. Determinants A. Definition of a determinant 1. Using sum a. Permutations i. Sign of a permutation ii. Cycle 2. Uniqueness of the determinant function in terms of properties
More informationSpatial Statistics Chapter 3 Basics of areal data and areal data modeling
Spatial Statistics Chapter 3 Basics of areal data and areal data modeling Recall areal data also known as lattice data are data Y (s), s D where D is a discrete index set. This usually corresponds to data
More informationLecture L3 - Vectors, Matrices and Coordinate Transformations
S. Widnall 16.07 Dynamics Fall 2009 Lecture notes based on J. Peraire Version 2.0 Lecture L3 - Vectors, Matrices and Coordinate Transformations By using vectors and defining appropriate operations between
More informationThe Matrix Elements of a 3 3 Orthogonal Matrix Revisited
Physics 116A Winter 2011 The Matrix Elements of a 3 3 Orthogonal Matrix Revisited 1. Introduction In a class handout entitled, Three-Dimensional Proper and Improper Rotation Matrices, I provided a derivation
More informationStatistics Graduate Courses
Statistics Graduate Courses STAT 7002--Topics in Statistics-Biological/Physical/Mathematics (cr.arr.).organized study of selected topics. Subjects and earnable credit may vary from semester to semester.
More informationRepresentation of functions as power series
Representation of functions as power series Dr. Philippe B. Laval Kennesaw State University November 9, 008 Abstract This document is a summary of the theory and techniques used to represent functions
More informationEpipolar Geometry. Readings: See Sections 10.1 and 15.6 of Forsyth and Ponce. Right Image. Left Image. e(p ) Epipolar Lines. e(q ) q R.
Epipolar Geometry We consider two perspective images of a scene as taken from a stereo pair of cameras (or equivalently, assume the scene is rigid and imaged with a single camera from two different locations).
More information