2. Linear Minimum Mean Squared Error Estimation

Similar documents
min ǫ = E{e 2 [n]}. (11.2)

α = u v. In other words, Orthogonal Projection

Solutions to Exam in Speech Signal Processing EN2300

Probability and Random Variables. Generation of random variables (r.v.)

Introduction to General and Generalized Linear Models

The Bivariate Normal Distribution

Chapter 4: Vector Autoregressive Models

TTT4120 Digital Signal Processing Suggested Solution to Exam Fall 2008

Pricing Formula for 3-Period Discrete Barrier Options

Inner product. Definition of inner product

MATH 304 Linear Algebra Lecture 20: Inner product spaces. Orthogonal sets.

Inner Product Spaces

Overview of Violations of the Basic Assumptions in the Classical Normal Linear Regression Model

Limits and Continuity

Linear Threshold Units

Linear Algebra Review. Vectors

Lecture 8: Signal Detection and Noise Assumption

TTT4110 Information and Signal Theory Solution to exam

Some probability and statistics

Advanced Signal Processing and Digital Noise Reduction

Lecture 14: Section 3.3

BEHAVIOR BASED CREDIT CARD FRAUD DETECTION USING SUPPORT VECTOR MACHINES

Analysis of Mean-Square Error and Transient Speed of the LMS Adaptive Algorithm

3.1 Least squares in matrix form

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Similarity and Diagonalization. Similar Matrices

Univariate and Multivariate Methods PEARSON. Addison Wesley

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Rotation Matrices and Homogeneous Transformations

Nonlinear Iterative Partial Least Squares Method

Recall that two vectors in are perpendicular or orthogonal provided that their dot

Econometrics Simple Linear Regression

SF2940: Probability theory Lecture 8: Multivariate Normal Distribution

Adding vectors We can do arithmetic with vectors. We ll start with vector addition and related operations. Suppose you have two vectors

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

MATH10212 Linear Algebra. Systems of Linear Equations. Definition. An n-dimensional vector is a row or a column of n numbers (or letters): a 1.

Sections 2.11 and 5.8

Regression Analysis: A Complete Example

FEGYVERNEKI SÁNDOR, PROBABILITY THEORY AND MATHEmATICAL

1 Norms and Vector Spaces

To give it a definition, an implicit function of x and y is simply any relationship that takes the form:

Least Squares Estimation

Inner Product Spaces and Orthogonality

On the Efficiency of Competitive Stock Markets Where Traders Have Diverse Information

Linearly Independent Sets and Linearly Dependent Sets

Derivative Approximation by Finite Differences

Chapter 8 - Power Density Spectrum

NOTES ON LINEAR TRANSFORMATIONS

Review Jeopardy. Blue vs. Orange. Review Jeopardy

Linear Algebra Notes for Marsden and Tromba Vector Calculus

Understanding and Applying Kalman Filtering

Numerical Methods For Image Restoration

Introduction: Overview of Kernel Methods

BANACH AND HILBERT SPACE REVIEW

MULTIVARIATE PROBABILITY DISTRIBUTIONS

Data Mining and Data Warehousing. Henryk Maciejewski. Data Mining Predictive modelling: regression

Factor analysis. Angela Montanari

1 VECTOR SPACES AND SUBSPACES

Introduction to Matrix Algebra

MIMO CHANNEL CAPACITY

Quadratic forms Cochran s theorem, degrees of freedom, and all that

(Refer Slide Time: 01:11-01:27)

Orthogonal Projections

is in plane V. However, it may be more convenient to introduce a plane coordinate system in V.

Dimensionality Reduction: Principal Components Analysis

CONTROL SYSTEMS, ROBOTICS, AND AUTOMATION - Vol. V - Relations Between Time Domain and Frequency Domain Prediction Error Methods - Tomas McKelvey

a 11 x 1 + a 12 x a 1n x n = b 1 a 21 x 1 + a 22 x a 2n x n = b 2.

Computer exercise 2: Least Mean Square (LMS)

1 Short Introduction to Time Series

Signal Detection C H A P T E R SIGNAL DETECTION AS HYPOTHESIS TESTING

Power Spectral Density

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS

3 Orthogonal Vectors and Matrices

Principles of Digital Communication

Chapter 6. Linear Transformation. 6.1 Intro. to Linear Transformation

MATHEMATICAL METHODS OF STATISTICS

The Algorithms of Speech Recognition, Programming and Simulating in MATLAB

Master s Theory Exam Spring 2006

Systems of Linear Equations

Math 241, Exam 1 Information.

Principle Component Analysis and Partial Least Squares: Two Dimension Reduction Techniques for Regression

8. Linear least-squares

Numerical Analysis Lecture Notes

MATRIX ALGEBRA AND SYSTEMS OF EQUATIONS. + + x 2. x n. a 11 a 12 a 1n b 1 a 21 a 22 a 2n b 2 a 31 a 32 a 3n b 3. a m1 a m2 a mn b m

MATH 4330/5330, Fourier Analysis Section 11, The Discrete Fourier Transform

1 Inner Products and Norms on Real Vector Spaces

NEW YORK STATE TEACHER CERTIFICATION EXAMINATIONS

3. Let A and B be two n n orthogonal matrices. Then prove that AB and BA are both orthogonal matrices. Prove a similar result for unitary matrices.

Since it is necessary to consider the ability of the lter to predict many data over a period of time a more meaningful metric is the expected value of

CAPM, Arbitrage, and Linear Factor Models

Computational Foundations of Cognitive Science

Chapter 3: The Multiple Linear Regression Model

How To Prove The Dirichlet Unit Theorem

Trend and Seasonal Components

Analysis/resynthesis with the short time Fourier transform

THREE DIMENSIONAL GEOMETRY

Vector Spaces 4.4 Spanning and Independence

Internet Appendix to CAPM for estimating cost of equity capital: Interpreting the empirical evidence

4.5 Linear Dependence and Linear Independence

The Ideal Class Group

Transcription:

. Linear inimum ean Squared Error Estimation.1. Linear minimum mean squared error estimators Situation considered: - A random sequence X ( 1,, X( whose realizations can be observed. - A random variable Y which has to be estimated. - We seek an estimate of Y with a linear estimator of the form: Ŷ = h 0 + h m X( m - A measure of the goodness of Ŷ is the mean squared error (SE: E[ ( Ŷ Y. Covariance and variance of random variables: Let U and V denote two random variables with expectation µ U E[ U and µ V E[ V. - The covariance of U and V is defined to be: Σ UV E[ ( U µ U ( V µ V = E[ UV µ U µ V. Let U [ U ( 1,, U( T and V [ V ( 1,, V( ' T denote two random vectors. The covariance matrix of U and V is defined as A direct way to obtain Σ UV : where Σ U ( 1V ( 1 Σ U ( 1V( ' Σ UV Σ U( V ( 1 Σ U( V( ' Σ UV = E[ ( U µ U ( V µ V T = E[ UV T µ U ( µ V T µ U E[ U = [ E[ U ( 1,, E[ U( T µ V E[ V Examples: U = X [ X ( 1,, X( T and V = Y. In the sequel we shall frequently make use of the following covariance matrix and vector: (i Σ XX = E[ ( X µ X ( X µ X T σ X ( 1 Σ X ( 1X( = Σ X( X ( 1 σ X( - The variance of U is defined to be: σ U E[ ( U µ U = = E[ U ( µ U Σ UU (ii Σ XY = E[ ( X µ X ( Y µ Y T = Σ X ( 1Y Σ X( Y Linear minimum mean squared error estimator (LSEE A LSEE of Y is a linear estimator, i.e. an estimator of the form Ŷ = h 0 + h m X( m, which minimizes the SE E[ ( Ŷ Y. -1 -

A linear estimator is entirely determined by the ( + 1 -dimensional vector h [ h 0,, h T. Orthogonality principle: Orthogonality principle: A necessary condition for h [ h 0,, h T to be the coefficient vector of the LSEE is that its entries fulfils the ( + 1 identities: E[ Y Ŷ = E Y h 0 + h m X( m = 0 E[ ( Y ŶX( j = E Y h 0 + h m X( m X( j = 0, j = 1,, Because the coefficient vector of the LSEE minimizes E[ ( Ŷ Y, its components must satisfy the set of equations: E [ ( Ŷ Y = 0 j = 0,,. h j Notice that the two expressions in (.1 can be rewritten as: E[ Y Ŷ = 0 (.a E[ ( Y ŶX( j = 0 j = 1,, Important consequences of the orthogonality principle: E[ ( Y ŶŶ = 0 E[ ( Y Ŷ = E[ Y E[ Ŷ = E[ ( Y ŶY (.3a (.3b Ŷ (.b Y Y (.1a (.1b Ŷ E Y [ Geometrical interpretation: Let U and V denote two random variables with finite second moment, i.e. E[ U < and E[ V <. Then, the expectation E[ UV can be viewed as the scalar or inner product of U and V. Within this interpretation: - U and V are uncorrelated, i.e. E[ UV = 0 if and only if, they are orthogonal, - E[ U is the norm (length of U. Interpretation of both equations in (.3: - (.3a: the estimation error Y Ŷ and the estimate Ŷ are orthogonal. - (.3b: results from Pythagoras Theorem. Computation of the coefficient vector of the LSEE: The coefficients of the LSEE satisfy the relationships: µ Y = h 0 h m µ X( m = h + ( h - T µ 0 X Σ XY Σ XX h - = where h - [ h 1,, h T and X [ X 1,, X T. Both identities follow by appropriately reformulating relations (.1a and (.1b and using a matrix notation for the latter one. -3-4

Thus, provided exists the coefficients of the LSEE are given by: Example: Linear prediction of a WSS process Let Y( n denote a WSS process with - zero mean, i.e E[ Y( n = 0, - autocorrelation function E[ Y( ny( n + k = R YY ( k We seek the LSEE for the present value of Y( n based on the past observations Y( n 1,, Y( n of the process. Hence, - Y = Y( n - X ( m = Y( n m,,,, i.e. - Σ XX = ( Σ XX 1 h - = ( Σ XX 1 Σ XY (.4a h 0 µ Y ( h - T T = µ X = µ Y Σ XY ( ΣXX 1 µ X X = [ Y( n 1,, Y( n T Because µ Y = 0 and µ X = 0, it follows from (.4b that h 0 = 0 Computation of Σ XY and Σ XX : - Σ XY = [ E[ Y( n 1Y( n,, E[ Y( n Y( n T = [ R YY ( 1,, R YY ( T = (.4b E[ Y( n 1 E[ Y( n 1Y( n E[ Y( n 1Y( n E[ Y( n Y( n 1 E[ Y( n E[ Y( n Y( n E[ Y( n Y( n 1 E[ Y( n Y( n E[ Y( n Residual error using a LSEE: The SE resulting when using a LSEE is E[ ( Ŷ Y = σ Y ( h - T Σ XY (.5 E[ ( Y Ŷ = E[ ( Y ŶY = E[ Y E[ Ŷ Y.. inimum mean squared error estimators Conditional expectation: Let U and V denote two random variables. The conditional expectation of V given U = u is observed is defined to be E[ V u vp( v udv. Notice that E[ V U is a random variable. In the sequel we shall make use of the following important property of conditional expectations: EEV [ [ U = E[ V = R YY ( 0 R YY ( 1 R YY ( R YY ( 1 R YY ( 1 R YY ( 0 R YY ( 1 R YY ( R YY ( R YY ( 1 R YY ( 0 R YY ( 3 R YY ( 1 R YY ( R YY ( 3 R YY ( 0-5 -6

inimum mean squared error estimator (SEE: The SEE of Y based on the observation of X ( 1,, X( is of the form: Hence if X ( 1 = x( 1,, X( = x ( is observed, then = Let Ŷ denote an arbitrary estimator. Then, = 0 = E[ ( Ŷ Y + E[ ( Y Y Thus, E[ ( Ŷ Y ( Y Y = 0. Y ( X( 1,, X( = E[ Y X( 1,, X( Y ( x( 1,, x ( = E[ Y x( 1,, x ( E[ ( Ŷ Y = E[ (( Ŷ Y ( Y Y yp( y x( 1,, x ( dy = E[ ( Ŷ Y E[ ( Ŷ Y ( Y Y + E[ ( Y Y with equality if, and only if, Ŷ = E[ ( Ŷ Y E[ ( Y Y Y. We still have to prove that Example: ultivariate Gaussian variables: [ Y, X( 1,, X( T N ( µσ, with - µ [ µ Y, µ X ( 1, µ, X( T σ - Σ Y Σ XY From Equation (6. in [Shanmugan it follows that Bivariate case: = 1, X ( 1 = X - Σ XX = σ X, Σ XY - Σ XY = ρσ X σ Y, where ρ -------------- is the correlation coefficient of Y and σ X σ Y X. In this case, ( Σ XY T Y Σ XX = E[ Y X = µ Y + ( Σ XY T ( Σ XX 1 ( X µ X Y ρσ Y = E[ Y X = µ Y + ---------- ( X µ σ X X ρσ Y µ Y ---------- µ σ X ρσ Y = + ---------- X X σ X h 0 h 1 We can observe that Y is linear, i.e. is the LSEE Ŷ = Y in the bivariate case. This is also true in the general multivariate Gaussian case. In fact, Ŷ = Y if, and only if, [ Y, X( 1,, X( T is a Gaussian random vector. -7-8

.3. Time-discrete Wiener filters Problem: Estimation of a WSS random sequence Y( n based on the observation of another sequence X( n. Without loss of generality we assume that E[ Y( n = E[ X( n = 0. The goodness of the estimator Ŷ ( n is described by the SE E[ ( Ŷ ( n Y( n. We distinguish between two cases: - Prediction: Ŷ ( n depends on one or several past observations of X( n only, i.e. Ŷ ( n = Ŷ ( X( n 1, X( n, with n 1, n, < n - Filtering: Ŷ ( n depends on the present observation and/or one or many future observation(s of X( n, i.e. Ŷ ( n = Ŷ ( X( n 1, X( n, where at least one n i n If all n i n, the filter is causal otherwise it is noncausal. xn ( Prediction yn ( Filtering.3.1. Noncausal Wiener filters Linear inimum ean Squared Error Filter We seek a linear filter Ŷ ( n = hm ( X( n m = m = which minimizes the SE E[ ( Ŷ ( n Y( n. Such a filter exists. It is called a Wiener filter in honour of his inventor. Orthogonality principle (time domain: The coefficients of a Wiener filter satisfy the conditions: E[ ( Y( n Ŷ ( n X( n k = It follows from these identities (see also (.3 that hn ( * X( n = E Y( n hm ( X( n m X( n k = 0, k =, 1, 0, 1, m = E[ ( Y( n Ŷ ( n Ŷ ( n = 0 E[ ( Y( n Ŷ ( n = E[ Y( n E[ Ŷ ( n = E[ ( Y( n Ŷ ( n Y( n Ŷ ( n Y( n Ŷ ( n Y( n 1 0 1 3 4 n 1 0 1 3 4 n Typical application: WSS signal embedded in additive white noise X( n = Y( n + W( n. where, - W( n is a white noise sequence, - Y( n is a WSS process With the definitions R XX ( k E[ X( nx( n + k, R XY ( k E[ X( ny( n + k we can recast the orthogonality conditions as follows: R XY ( k = hm ( R XX ( k m k =, 1, 0, 1, m = R XY ( k = hk ( *R XX ( k Wiener-Hopf equation - Y( n and W( n are uncorrelated. However, the theoretical treatment is more general as shown below. -9-10

Orthogonality principle (frequency domain: where Transfer function of the Wiener filter: SE of the Wiener filter (time-domain formulation: S XY ( f = H( f S XX ( f S XY ( f F { R XY ( k } H( f S XY ( f = ------------------- S XX ( f E[ ( Y( n Ŷ ( n = σ Y m = hm ( R XY ( m E[ ( Y( n Ŷ ( n = E[ Y( n E[ Ŷ ( ny( n E[ ( Y( n Ŷ ( n is the value p( 0 of the sequence pk ( = R YY ( k hm ( R YX ( k m Hence, m = = R YY ( k hk ( *R YX ( k S XY ( f P( f = S YY ( f H( f S YX ( f = S YY ( f ------------------------ S XX ( f 1 E[ ( Y( n Ŷ ( n = p( 0 = P( f df 1 1 E[ ( Y( n Ŷ ( n S XY ( f = S YY ( f ------------------------ df S XX ( f 1 SE of the Wiener filter (frequency-domain formulation: We can rewrite the above identity as: E[ ( Y( n Ŷ ( n = R YY ( 0 hm ( R YX ( m m =.3.. Causal Wiener filters A. X( n is a white noise. We first assume that X( n is a white noise with unit variance, i.e. E[ X( nx( n+ k = δ( k. Derivation of the causal Wiener filter from the noncausal Wiener filter: Let us consider the noncausal Wiener filter Ŷ ( n = hm ( X( n m m = whose transfer function is given by S XY ( f H( f = ------------------- =. S XX ( f S ( f XY Then, the causal Wiener filter Ŷ c ( n results by cancelling the noncausal part of the non-causal Wiener filter: Ŷ c ( n = hm ( X( n m m = 0-11 -1

Sketch of the proof: Ŷ ( n can be written as 1 Ŷ ( n = hm ( X( n m + m = m = 0 hm ( X( n m Because X( n is a white noise, the causal part V = Ŷ c ( n and the noncausal part U = Ŷ( n Ŷ c ( n of Ŷ ( n are orthogonal. It follows from this property U V Ŷ c ( n (Causal X( n Whitening Filter Z( n E[ Z( nz( n + k = δ( k gn ( This operation is called whitening and the filter gn ( is called a whitening filter. equivalent there exists another causal filter g ( n so that X ( n = g ( n*z( n : Z( n g ( n X( n Y( n Ŷ c ( n Yˆ c ( n U Y( n = Ŷ( n Ŷ c ( n Y( n Ŷ ( n Ŷ ( n that Ŷ c ( n and Y( n are orthogonal, i.e. that Ŷ c ( n minimizes the SE within the class of linear causal estimators. Notice that if then G( f = S XX ( f 1 G ( f = S XX ( f G( f F { gn ( } G ( f F { g ( n } We shall see that a whitening filter exists such that (.6 G ( f = G( f 1 Causal Wiener filter aking use of the result in Part A, the block diagram of the causal Wiener filter is B. X( n is an arbitrary WSS process whose spectrum satisfies the Paley- Wiener condition. Usually, the above truncation procedure to obtain Ŷ c ( n does not apply because U and V are correlated and therefore not orthogonal in the general case. Causal whitening filter: However, we can show (see the Spectral Decomposition Theorem below that provided S XX ( f satisfies the Paley-Wiener condition (see below then X( n can be converted into an equivalent white noise sequence Z( n with unit variance by filtering it with an appropriate causal filter gn (, X( n (Causal Whitening Filter gn ( Z( n Causal part of F 1 { S ZY ( f } S ZY ( f is obtained from S XY ( f according to S ZY ( f = G( f S XY ( f Ŷ c ( n -13-14

Hence, the block diagram of the causal Wiener filter is: X( n Spectral Decomposition Theorem: Let S XX ( f satisfies the so-called Paley-Wiener condition: log( S XX ( f df > 1 Then S XX ( f can be written as S XX ( f = G( f + G( f - with G( f + and G( f - satisfying G( f + = G( f - = S XX ( f oreover, the sequences gn ( + F 1 { G( f + } gn ( - F 1 { G( f - } g 1 ( n F 1 { 1 G ( f + } g 1 ( n F 1 { 1 G ( f - } satisfy Whitening Filter gn ( 1 gn ( + = g 1 ( n = 0 n < 0 gn ( - = g 1 ( n = 0 n > 0 Z( n Causal part of F 1 { G( f S XY ( f }. Causal sequences Anticausal sequences Ŷ c ( n Whitening filter (cont d: The sought whitening filter used to obtained Z( n is gn ( = g 1 ( n + and g ( n = gn ( +. It can be easily verified that both sequences satisfy the identities in (.6..3.3. Finite Wiener filters Finite linear filter: Wiener-Hopf equation: By applying the orthogonality principle we obtain the Wiener-Hopf system of equations: where and Σ XX Coefficient vector of the finite Wiener filter: provided Σ XX is invertible. Ŷ ( n = hm ( X( n m m = Σ XY 1 = Σ XX h h [ h( 1,, h ( T Σ XY [ R XY ( 1,, R XY ( T R XX ( 0 R XX ( 1 R XX ( R XX ( 1 + R XX ( 1 R XX ( 0 R XX ( 1 R XX ( 1 + 1 R XX ( R XX ( 1 R XX ( 0 R XX ( 1 + R XX ( 1 + R XX ( 1 + 1 R XX ( 1 + R XX ( 0 h = ( Σ XX 1 Σ XY -15-16