Background 2. Lecture 2 1. The Least Mean Square (LMS) algorithm 4. The Least Mean Square (LMS) algorithm 3. br(n) = u(n)u H (n) bp(n) = u(n)d (n)



Similar documents
Computer exercise 2: Least Mean Square (LMS)

4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department sss40@eng.cam.ac.

Lecture 5: Variants of the LMS algorithm

Scott C. Douglas, et. Al. Convergence Issues in the LMS Adaptive Filter CRC Press LLC. <

Adaptive Equalization of binary encoded signals Using LMS Algorithm

ADAPTIVE ALGORITHMS FOR ACOUSTIC ECHO CANCELLATION IN SPEECH PROCESSING

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm

The Filtered-x LMS Algorithm

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

Stability of the LMS Adaptive Filter by Means of a State Equation

Adaptive Notch Filter for EEG Signals Based on the LMS Algorithm with Variable Step-Size Parameter

Adaptive Variable Step Size in LMS Algorithm Using Evolutionary Programming: VSSLMSEV

4F7 Adaptive Filters (and Spectrum Estimation) Kalman Filter. Sumeetpal Singh sss40@eng.cam.ac.uk

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Luigi Piroddi Active Noise Control course notes (January 2015)

Advanced Signal Processing and Digital Noise Reduction

System Identification for Acoustic Comms.:

Solutions to Exam in Speech Signal Processing EN2300

Analysis of Mean-Square Error and Transient Speed of the LMS Adaptive Algorithm

Analysis of Filter Coefficient Precision on LMS Algorithm Performance for G.165/G.168 Echo Cancellation

IMPLEMENTATION OF THE ADAPTIVE FILTER FOR VOICE COMMUNICATIONS WITH CONTROL SYSTEMS

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

A Stochastic 3MG Algorithm with Application to 2D Filter Identification

LEAST-MEAN-SQUARE ADAPTIVE FILTERS

Formulations of Model Predictive Control. Dipartimento di Elettronica e Informazione

Linear Threshold Units

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

The Algorithms of Speech Recognition, Programming and Simulating in MATLAB

Echo Cancellation in Digital Subscriber Line Systems

Probability and Random Variables. Generation of random variables (r.v.)

CS 295 (UVM) The LMS Algorithm Fall / 23

Performance Analysis of the LMS Algorithm with a Tapped Delay Line (Two-Dimensional Case)

MIMO CHANNEL CAPACITY

Statistical Machine Learning

MATH 423 Linear Algebra II Lecture 38: Generalized eigenvectors. Jordan canonical form (continued).

Understanding and Applying Kalman Filtering

Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory

(Refer Slide Time: 01:11-01:27)

A STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF

1 Short Introduction to Time Series

Impact of Remote Control Failure on Power System Restoration Time

Big Data - Lecture 1 Optimization reminders

December 4, 2013 MATH 171 BASIC LINEAR ALGEBRA B. KITCHENS

How I won the Chess Ratings: Elo vs the rest of the world Competition

State Space Time Series Analysis

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

SMORN-VII REPORT NEURAL NETWORK BENCHMARK ANALYSIS RESULTS & FOLLOW-UP 96. Özer CIFTCIOGLU Istanbul Technical University, ITU. and

CSCI567 Machine Learning (Fall 2014)

WAVES AND FIELDS IN INHOMOGENEOUS MEDIA

IEEE TRANSACTIONS ON INFORMATION THEORY, VOL. 51, NO. 7, JULY G. George Yin, Fellow, IEEE, and Vikram Krishnamurthy, Fellow, IEEE

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Sparse recovery and compressed sensing in inverse problems

Adaptive Solution for Blind Identification/Equalization Using Deterministic Maximum Likelihood

SOLVING LINEAR SYSTEMS

Adaptive Sampling Rate Correction for Acoustic Echo Control in Voice-Over-IP Matthias Pawig, Gerald Enzner, Member, IEEE, and Peter Vary, Fellow, IEEE

Sparse LMS via Online Linearized Bregman Iteration

A Cold Startup Strategy for Blind Adaptive M-PPM Equalization

The CUSUM algorithm a small review. Pierre Granjon

TTT4120 Digital Signal Processing Suggested Solution to Exam Fall 2008

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

Diffusion Adaptation Strategies for Distributed Optimization and Learning over Networks

Since it is necessary to consider the ability of the lter to predict many data over a period of time a more meaningful metric is the expected value of

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

Simulation-based optimization methods for urban transportation problems. Carolina Osorio

Time Series Analysis III

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

Load-Balanced Implementation of a Delayless Partitioned Block Frequency-Domain Adaptive Filter

t := maxγ ν subject to ν {0,1,2,...} and f(x c +γ ν d) f(x c )+cγ ν f (x c ;d).

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

Reinforcement Learning

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

Kalman Filter Applied to a Active Queue Management Problem

Part II Redundant Dictionaries and Pursuit Algorithms

Component Ordering in Independent Component Analysis Based on Data Power

Sequential Tracking in Pricing Financial Options using Model Based and Neural Network Approaches

TTT4110 Information and Signal Theory Solution to exam

Communication on the Grassmann Manifold: A Geometric Approach to the Noncoherent Multiple-Antenna Channel

Linear smoother. ŷ = S y. where s ij = s ij (x) e.g. s ij = diag(l i (x)) To go the other way, you need to diagonalize S

A linear algebraic method for pricing temporary life annuities

Reinforcement Learning

Nonlinear Iterative Partial Least Squares Method

METNUMER - Numerical Methods

A Regime-Switching Model for Electricity Spot Prices. Gero Schindlmayr EnBW Trading GmbH

MISSING DATA TECHNIQUES WITH SAS. IDRE Statistical Consulting Group

Centre for Central Banking Studies

Parallel & Distributed Optimization. Based on Mark Schmidt s slides

Capacity Limits of MIMO Channels

1 Review of Least Squares Solutions to Overdetermined Systems

STUDY OF MUTUAL INFORMATION IN PERCEPTUAL CODING WITH APPLICATION FOR LOW BIT-RATE COMPRESSION

GLM, insurance pricing & big data: paying attention to convergence issues.

Lecture 3: Linear methods for classification

Tentamen i GRUNDLÄGGANDE MATEMATISK FYSIK

A Load Balancing Method in SiCo Hierarchical DHT-based P2P Network

Digital LMS Adaptation of Analog Filters Without Gradient Information

Chapter 6: Multivariate Cointegration Analysis

Solving Linear Systems, Continued and The Inverse of a Matrix

Notes on Factoring. MA 206 Kurt Bryan

Experimental Identification an Interactive Online Course

SIXTY STUDY QUESTIONS TO THE COURSE NUMERISK BEHANDLING AV DIFFERENTIALEKVATIONER I

Transcription:

Lecture 2 1 During this lecture you will learn about The Least Mean Squares algorithm (LMS) Convergence analysis of the LMS Equalizer (Kanalutjämnare) Background 2 The method of the Steepest descent that was studies at the last lecture is a recursive algorithm for calculation of the Wiener filter when the statistics of the signals are known (knowledge about R och p). The problem is that this information is oftenly unknown! LMS is a method that is based on the same principles as the method of the Steepest descent, but where the statistics is estimated continuously. Since the statistics is estimated continuously, the LMS algorithm can adapt to changes in the signal statistics; The LMS algorithm is thus an adaptive filter. Because of estimated statistics the gradient becomes noisy. The LMS algorithm belongs to a group of methods referred to as stochastic gradient methods, while the method of the Steepest descent belongs to the group deterministic gradient methods. The Least Mean Square (LMS) algorithm 3 The Least Mean Square (LMS) algorithm 4 We want to create an algorithm that minimizes E{ e(n) 2 }, just like the SD, but based on unkown statistics. A strategy that then can be used is to uses estimates of the autocorrelation matrix R and the cross correlationen vector p. If instantaneous estimates are chosen, br(n) = u(n)u H (n) bp(n) = u(n)d (n) the resulting method is the Least Mean Squares algorithm. For the SD, the update of the filter weights is given by w(n+1) = w(n) + 1 2 µ[ J(n)] where J(n) = 2p + 2Rw(n). In the LMS we use the estimates R b och bp to calculate J(n). b Thus, also the updated filter vector becomes an estimate. It is therefore denoted bw(n); b J(n) = 2bp(n) + 2b R(n)bw(n)

The Least Mean Square (LMS) algorithm 5 The Least Mean Square (LMS) algorithm 6 For the LMS algorithm, the update equation of the filter weights becomes bw(n + 1) = bw(n) + µu(n)e (n) where e (n) = d (n) u H (n)bw(n). Illustration of gradient noise. R och p in the example from Lecture 1. µ = 0.1. SD LMS 2 1.5 1 2 1.5 1 1.06 1.06 Compare with the corresponding equation for the SD: w0 0.5 0 w0 0.5 0 w(n + 1) = w(n) + µe{u(n)e (n)} 0.5 1.96 0.5 1.96 5.56 2.86 5.56 2.86 where e (n) = d (n) u H (n)w(n). 1 1.5 6.46 3.76 4.66 1 1.5 6.46 3.76 4.66 The difference is thus that E{u(n)e (n)} is estimated as u(n)e (n). This leads to gradient noise. 11.86 2 2 1.5 1 0.5 0 0.5 1 1.5 2 11.86 2 2 1.5 1 0.5 0 0.5 1 1.5 2 w 1 w 1 Because of the noise in the gradientm the LMS never reaches J min, which the SD does. Convergence analysis of the LMS, overview 7 1. The filter weight error vector 8 Consider the update equation bw(n + 1) = bw(n)+µu(n)e (n). We want to know how fast and towards what the algorithm converges,in terms of J(n) and bw(n). Strategy: 1. Introduce the filter weight error vector ǫ(n) = bw(n) w o. 2. Express the update equation in ǫ(n). 3. Express J(n) in ǫ(n). This leads to K(n) = E{ǫ(n)ǫ H (n)}. 4. Calculate the difference equation in K(n), which controls the convergence. 5. Make a variable change from K(n) to X(n) which converges equally fast. The LMS contains feedback, and just like for the SD there is a risk that the algorithm diverges, if the stepsize µ is not chosen appropriately. In order to investigate the stability, the filter weight error vector is introduced ǫ(n): ǫ(n) = bw(n) w o (w o = R 1 p Wiener-Hopf) Note that ǫ(n) corresponds to c(n) = w(n) w o for the Steepest descent, but that ǫ(n) is stochastic because of bw(n).

2. Update equation in ǫ(n) 9 3. Express J(n) in ǫ(n) 10 The update equation for ǫ(n) at time n+1 can be expressed recursively as ǫ(n+1) = [I µb R(n)]ǫ(n) + µ[bp(n) b R(n)wo ] = [I µu(n)u H (n)]ǫ(n) + µu(n)e o (n) where e o (n) = d(n) wh o u(n). The convergence analysis of the LMS is considerably more complicated than that for the SD. Therefore, two tools are needed (approximations). These are Independence theory Direct-averaging method Compare to that of the SD: c(n + 1) = (I µr)c(n) 3. Express J(n) i ǫ(n), cont. 11 3. Express J(n) in ǫ(n), cont. 12 Independence Theory 1. The input vectors of different time instants n, u(1), u(2),..., u(n), is independet of each other (and thereby uncorrelated). 2. The input signal vector at time n is independent of the desired signal at all earlier time instants, u(n) idependent of d(1), d(2),..., d(n 1). 3. The desired signal at time n, d(n), is independent of all earlier desired signals, d(1), d(2),..., d(n 1), but dependent of the input signal vector at time n, u(n). Direct-averaging Direct-averaging means that in the update equation for ǫ, ǫ(n + 1) = [I µu(n)u H (n)]ǫ(n) + µu(n)e o (n), the instantaneous estimate b R(n) = u(n)u H (n) is substituted with the expectation over the ensemble, R = E{u(n)u H (n)}, such that results. ǫ(n + 1) = (I µr)ǫ(n) + µu(n)e o (n) 4. The signals d(n) and u(n) at the same time instant n is mutually normally distributed.

3. Express J(n) i ǫ(n), cont. 13 3. Express J(n) i ǫ(n), cont. 14 For the SD we could show that J(n) = J min + [w(n) w o ] H R[w(n) w o ] and that the eigenvalues of R controls the convergence speed. For the LMS, ǫ(n) = bw(n) w o, J(n) = E{ e(n) 2 } = E{ d(n) bw H (n)u(n) 2 } = E{ e o (n) ǫ H (n)u(n) 2 } = i = E{ e o (n) 2 H +E{ǫ (n) H u(n)u (n) ǫ(n)} {z } {z } J min br(n) Here, (i) indicates that the indpendence assupmtion is used. Since ǫ and b R are estimates, and in addition not independent, the expectation operator cannot just be moved into b R. The behavior of J ex (n) = J(n) J min = E{ǫ H (n)b Rǫ(n)} determines the convergence. Since the following holds for the trace of a matrix, 1. tr(scalar) = scalar 2. tr(ab) = tr(ba) 3. E{tr(A)} = tr(e{a}) J ex (n) can be written as J ex (n) 1,2 = E{tr[b R(n)ǫ(n)ǫ H (n)]} 3 = tr[e{b R(n)ǫ(n)ǫ H (n)}] i = tr[e{b R(n)}E{ǫ(n)ǫ H (n)}] = tr[rk(n)] The convergence thus depends on K(n) = E{ǫ(n)ǫ H (n)}. 4. Difference equation of K(n) 15 5. Changing of variables from K(n) to X(n) 16 The update equation for the filter weight error is, as shown previously, ǫ(n + 1) = [I µu(n)u H (n)]ǫ(n) + µu(n)e o (n) The solution to this stochastic difference equation is for small µ close to the solution for ǫ(n + 1) = (I µr)ǫ(n) + µu(n)e o (n), which is resulting from Direct-averaging. This equation is still difficult to solve, but now the behavior of K(n) can be described by the following difference equation K(n + 1) = (I µr)k(n)(i µr) + µ 2 J min R The variable changes Q H RQ = Λ Q H K(n)Q = X(n) where Λ is the diagonal eigenvalue matrix of R, and where X(n) normally is not diagonal, yields J ex (n) = tr(rk(n)) = tr(λx(n)) The convergence can be discribed by X(n + 1) = (I µλ)x(n)(i µλ) + µ 2 J min Λ

5. Changing of variables from K(n) to X(n), cont. 17 J ex (n) depends only on the diagonal elements of X(n) (because of MX the trace tr[λx(n)]). We can therefore write J ex (n) = λ i x i (n). The update of the diagonal elements is controlled by x i (n + 1) = (1 µλ i ) 2 x i (n) + µ 2 J min λ i i=1 This is an inhomogenous difference equation.this means that the solution will contain a transient part, and a stationary part that depends on J min and µ. The cost function can therefore be written Properties of the LMS 18 Convergence for the same interval as the SD: 0 < µ < 2 λ max X M µλ i J ex ( ) = J min 2 µλ i=1 i Misadjustment M = J ex( ) is a measure of how close the the J min optimal solution the LMS (in mean-square sense). J(n) = J min + J ex ( ) + J trans (n) where J min +J ex ( ) och J trans (n) is the stationary and transient solutions, respectively. At the best, the LMS can thus reach J min +J ex ( ). Rules of thumb LMS 19 Summary LMS 20 The eigenvalues of R are rarely known. Therefore, a set of rules of thumbs is commonly used, that is based on the tap-input power, P M 1 k=0 E{ u(n k) 2 } = Mr(0) = tr(r). The stepsize 0 < µ < 2 P M 1 k=0 E{ u(n k) 2 } Misadjustment M µ P M 1 2 k=0 E{ u(n k) 2 } Summary of the iterations: 1. Set start values for the filter coefficients, ŵ(0) 2. Calculate the error e(n) = d(n) ŵ H (n)u(n) 3. Update the filter coefficients ŵ(n + 1) = ŵ(n) + µu(n)[d (n) u H (n)ŵ(n)] 4. Repeat steps 2 and 3 Average time constant τ mse,av 1 2µλav där λ av = 1 M P Mi=1 λ i. When λ i is unknown, the following relationship is useful. Approximate relationship M and τ mse,av M M 4τmse,av Here, τ mse,av denotes the time it takes for J(n) to decay a factor of e 1.

Example: Channel equalizer in training phase 21 Example, cont: Learning Curve 22 p(n) p(n) =..., 1, 1, 1, 1, 1,... Channel h(n) WGN v(n) σ 2 v = 0.001 + u(n) Σ + h(n) = {0, 0.2798, 1.000, 0.2798, 0} z 7 FIR M = 11 Fördröjning LMS d(n) bd(n) + Σ e(n) Sent information in the form of a pn-sequence is affected by a channel (h(n)). An adaptive inverse filter is used to compensate for the channel such that the total effect can be approximated by a delay. White Gaussian noise is added during the transmission (WGN). Average of MSE, J(n) 10 0 10 1 Learning curves based on 200 realizations. The decay is related to τmse,av. A large µ gives fast convergence, but also a large Jex( ). µ = 0.0075 µ = 0.025 µ = 0.075 10 2 0 500 1000 1500 Iteration n The curves illustrates learning curves for two different µ. What to read? 23 Least Mean Squares, Haykin chap.5.1 5.3, 5.4 (see lecture), 5.5 5.7 Exercises: 4.1, 4.2, 1.2, 4.3, 4.5, (4.4) Computer exercise, theme: Implement the LMS algorithm in Matlab, and generate the results in Haykin chap.5.7