Lecture 5: Variants of the LMS algorithm

Similar documents

4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department sss40@eng.cam.ac.

Background 2. Lecture 2 1. The Least Mean Square (LMS) algorithm 4. The Least Mean Square (LMS) algorithm 3. br(n) = u(n)u H (n) bp(n) = u(n)d (n)

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm

Computer exercise 2: Least Mean Square (LMS)

Adaptive Equalization of binary encoded signals Using LMS Algorithm

ADAPTIVE ALGORITHMS FOR ACOUSTIC ECHO CANCELLATION IN SPEECH PROCESSING

The Filtered-x LMS Algorithm

System Identification for Acoustic Comms.:

Luigi Piroddi Active Noise Control course notes (January 2015)

Probability and Random Variables. Generation of random variables (r.v.)

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Scott C. Douglas, et. Al. Convergence Issues in the LMS Adaptive Filter CRC Press LLC. <

Adaptive Variable Step Size in LMS Algorithm Using Evolutionary Programming: VSSLMSEV

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

SECOND DERIVATIVE TEST FOR CONSTRAINED EXTREMA

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Proximal mapping via network optimization

Nonlinear Iterative Partial Least Squares Method

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

TTT4110 Information and Signal Theory Solution to exam

Least-Squares Intersection of Lines

Module 1 : Conduction. Lecture 5 : 1D conduction example problems. 2D conduction

TTT4120 Digital Signal Processing Suggested Solution to Exam Fall 2008

A Stochastic 3MG Algorithm with Application to 2D Filter Identification

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Formulations of Model Predictive Control. Dipartimento di Elettronica e Informazione

Linear Threshold Units

degrees of freedom and are able to adapt to the task they are supposed to do [Gupta].

Understanding and Applying Kalman Filtering

Lecture 8 February 4

Nonlinear Optimization: Algorithms 3: Interior-point methods

Demand Forecasting When a product is produced for a market, the demand occurs in the future. The production planning cannot be accomplished unless

Machine Learning: Multi Layer Perceptrons

min ǫ = E{e 2 [n]}. (11.2)

Component Ordering in Independent Component Analysis Based on Data Power

Monotonicity Hints. Abstract

Statistical Machine Learning

AUTOMATION OF ENERGY DEMAND FORECASTING. Sanzad Siddique, B.S.

8. Linear least-squares

Nonlinear Programming Methods.S2 Quadratic Programming

Solutions to Exam in Speech Signal Processing EN2300

Signal to Noise Instrumental Excel Assignment

Advanced Signal Processing and Digital Noise Reduction

Digital Imaging and Multimedia. Filters. Ahmed Elgammal Dept. of Computer Science Rutgers University

> plot(exp.btgpllm, main = "treed GP LLM,", proj = c(1)) > plot(exp.btgpllm, main = "treed GP LLM,", proj = c(2)) quantile diff (error)

3. Interpolation. Closing the Gaps of Discretization... Beyond Polynomials

Lecture 5 Least-squares

OPTIMAl PREMIUM CONTROl IN A NON-liFE INSURANCE BUSINESS

The Combination Forecasting Model of Auto Sales Based on Seasonal Index and RBF Neural Network

Lecture 13 Linear quadratic Lyapunov theory

Echo Cancellation in Digital Subscriber Line Systems

An Introduction to Neural Networks

Em bedded DSP : I ntroduction to Digital Filters

Analysis of Mean-Square Error and Transient Speed of the LMS Adaptive Algorithm

Adaptive Notch Filter for EEG Signals Based on the LMS Algorithm with Variable Step-Size Parameter

AN INTRODUCTION TO NUMERICAL METHODS AND ANALYSIS

How I won the Chess Ratings: Elo vs the rest of the world Competition

LEAST-MEAN-SQUARE ADAPTIVE FILTERS

Pricing and calibration in local volatility models via fast quantization

Dynamic Neural Networks for Actuator Fault Diagnosis: Application to the DAMADICS Benchmark Problem

(Refer Slide Time: 01:11-01:27)

Increasing for all. Convex for all. ( ) Increasing for all (remember that the log function is only defined for ). ( ) Concave for all.

Efficient online learning of a non-negative sparse autoencoder

Lecture 6. Artificial Neural Networks

SMORN-VII REPORT NEURAL NETWORK BENCHMARK ANALYSIS RESULTS & FOLLOW-UP 96. Özer CIFTCIOGLU Istanbul Technical University, ITU. and

(Quasi-)Newton methods

Practical Guide to the Simplex Method of Linear Programming

DATA ANALYSIS II. Matrix Algorithms

Applications to Data Smoothing and Image Processing I

An Iterative Image Registration Technique with an Application to Stereo Vision

Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory

Machine Learning and Data Mining. Regression Problem. (adapted from) Prof. Alexander Ihler

Time series Forecasting using Holt-Winters Exponential Smoothing

Lecture 7: Finding Lyapunov Functions 1

Analysis of Filter Coefficient Precision on LMS Algorithm Performance for G.165/G.168 Echo Cancellation

Machine Learning and Pattern Recognition Logistic Regression

Lecture 3: Linear methods for classification

A Learning Algorithm For Neural Network Ensembles

Multiple Linear Regression in Data Mining

Dynamical Systems Analysis II: Evaluating Stability, Eigenvalues

Lecture 5 Principal Minors and the Hessian

Analysis of Multilayer Neural Networks with Direct and Cross-Forward Connection

Signal Detection C H A P T E R SIGNAL DETECTION AS HYPOTHESIS TESTING

A QUICK GUIDE TO THE FORMULAS OF MULTIVARIABLE CALCULUS

Convolution. 1D Formula: 2D Formula: Example on the web:

PRACTICAL GUIDE TO DATA SMOOTHING AND FILTERING

By choosing to view this document, you agree to all provisions of the copyright laws protecting it.

Ehlers Filters by John Ehlers

Design of Efficient Digital Interpolation Filters for Integer Upsampling. Daniel B. Turek

Filter Comparison. Match #1: Analog vs. Digital Filters

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

Lecture Notes to Accompany. Scientific Computing An Introductory Survey. by Michael T. Heath. Chapter 10

The Steepest Descent Algorithm for Unconstrained Optimization and a Bisection Line-search Method

1 Portfolio mean and variance

Subspace Analysis and Optimization for AAM Based Face Alignment

Transcription:

1 Standard LMS Algorithm FIR filters: Lecture 5: Variants of the LMS algorithm y(n) = w 0 (n)u(n)+w 1 (n)u(n 1) +...+ w M 1 (n)u(n M +1) = M 1 k=0 w k (n)u(n k) =w(n) T u(n), Error between filter output y(t) and a desired signal d(t): Change the filter parameters according to 1. Normalized LMS Algorithm e(n) =d(n) y(n) =d(n) w(n) T u(n) w(n +1)=w(n)+µu(n)e(n) Modify at time n the parameter vector from w(n) tow(n +1) n =0, 1, 2,..., fulfilling the constraint w T (n +1)u(n) =d(n) with the least modification of w(n), i.e. with the least Euclidian norm of the difference w(n +1) w(n) =δw(n +1)

Lecture 5 2 Thus we have to minimize δw(n +1) 2 = M 1 (w k (n +1) w k (n)) 2 under the constraint w T (n +1)u(n) =d(n) The solution can be obtained by Lagrange multipliers method: J(w(n +1),λ) = δw(n +1) 2 + λ ( d(n) w T (n +1)u(n) ) = M 1 k=0 k=0 (w k (n +1) w k (n)) 2 + λ d(n) M 1 w i (n +1)u(n i) To obtain the minimum of J(w(n + 1),λ) we check the zeros of criterion partial derivatives: We have thus w j (n +1) M 1 k=0 i=0 J(w(n +1),λ) w j (n +1) (w k (n +1) w k (n)) 2 + λ d(n) M 1 w i (n +1)u(n i) i=0 w j (n +1)=w j (n)+ 1 λu(n j) 2 = 0 = 0 2(w j (n +1) w j (n)) λu(n j) = 0 where λ will result from d(n) = M 1 i=0 w i (n +1)u(n i)

Lecture 5 3 d(n) = M 1 (w i (n)+ 1 λu(n i))u(n i) 2 i=0 d(n) = M 1 λ = λ = i=0 w i (n)u(n i)+ 1 2 λ M 1 (u(n i)) 2 i=0 2(d(n) M 1 i=0 w i (n)u(n i)) M 1 i=0 (u(n i)) 2 2e(n) M 1 i=0 (u(n i)) 2 Thus, the minimum of the criterion J(w(n + 1),λ) will be obtained using the adaptation equation 2e(n) w j (n +1)=w j (n)+ M 1 i=0 (u(n i)) 2u(n j) In order to add an extra freedom degree to the adaptation strategy, one constant, µ, controlling the step size will be introduced: 1 w j (n +1)=w j (n)+ µ M 1 i=0 (u(n i)) 2e(n)u(n j) =w µ j(n)+ u(n) 2e(n)u(n j) To overcome the possible numerical difficulties when u(n) is very close to zero, a constant a>0 is used: w j (n +1)=w j (n)+ µ a + u(n) 2e(n)u(n j) This is the updating equation used in the Normalized LMS algorithm.

Lecture 5 4 The principal characteristics of the Normalized LMS algorithm are the following: The adaptation constant µ is dimensionless, whereas in LMS, the adaptation has the dimensioning of a inverse power. Setting µ µ(n) = a + u(n) 2 we may vue Normalized LMS algorithm as a LMS algorithm with data- dependent adptation step size. Considering the approximate expression µ(n) = µ a + ME(u(n)) 2 the normalization is such that : * the effect of large fluctuations in the power levels of the input signal is compensated at the adaptation level. * the effect of large input vector length is compensated, by reducing the step size of the algorithm. This algorithm was derived based on an intuitive principle: In the light of new input data, the parameters of an adaptive system should only be disturbed in a minimal fashion. The Normalized LMS algorithm is convergent in mean square sense if 0 < µ <2

Lecture 5 5 Comparison of LMS and NLMS within the example from Lecture 4 (channel equalization) 10 0 Learning curve Ee 2 (n) for LMS algorithm µ=0.075 µ=0.025 µ=0.0075 10 1 Learning curve Ee 2 (n) for Normalized LMS algorithm µ=1.5 µ=1.0 µ=0.5 µ=0.1 10 0 10 1 10 1 10 2 10 2 10 3 0 500 1000 1500 2000 2500 time step n 10 3 0 500 1000 1500 2000 2500 time step n

Lecture 5 6 Comparison of LMS and NLMS within the example from Lecture 4 (channel equalization): The LMS was run with three different step-sizes: µ =[0.075; 0.025; 0.0075] The NLMS was run with four different step-sizes: µ =[1.5; 1.0; 0.5; 0.1] With the step-size µ =1.5, NLMS behaved definitely worse than with step-size µ =1.0 (slower, and with a higher steady state square error). So µ =1.5 is further ruled out Each of the three step-sizes µ =[1.0; 0.5; 0.1] was interesting: on one hand, the larger the stepsize, the faster the convergence. But on the other hand, the smaller the step-size, the better the steady state square error. So each step-size may be a useful tradeoff between convergence speed and stationary MSE (not both can be very good simultaneously). LMS with µ =0.0075 and NLMS with µ =0.1 achieved a similar (very good) average steady state square error. However, NLMS was faster. LMS with µ =0.075 and NLMS with µ =1.0 had a similar convergence speed (very good). However, NLMS achieved a lower steady state average square error. To conclude: NLMS offers better trade-offs than LMS. The computational complexity of NLMS is slightly higher than that of LMS.

Lecture 5 7 2. LMS Algorithm with Time Variable Adaptation Step Heuristics of the method: We combine the benefits of two different situations: The convergence time constant is small for large µ. The mean-square error in steady state is low for small µ. Therefore, in the initial adaptation stages µ is kept large, then it is monotonically reduced, such that in the final adaptation stage it is very small. There are many receipts of cooling down an adaptation process. Monotonically decreasing the step size µ(n) = 1 n + c Disadvantage for non-stationary data: the algorithm will not react anymore to changes in the optimum solution, for large values of n. Variable Step algorithm: where w(n +1)=w(n)+M(n)u(n)e(n) M(n) = µ 0 (n) 0. 0 0 µ 1 (n). 0 0 0. 0 0 0. µ M 1 (n)

Lecture 5 8 or componentwise w i (n +1)=w i (n)+µ i (n)u(n i)e(n) i =0, 1,...,M 1 * each filter parameter w i (n) is updated using an independent adaptation step µ i (n). * the time variation of µ i (n) is ad-hoc selected as if m 1 successive identical signs of the gradient estimate, e(n)u(n i),are observed, then µ i (n) is increased µ i (n) =c 1 µ i (n m 1 )(c 1 > 1)(the algorithm is still far of the optimum, is better to accelerate) if m 2 successive changes in the sign of gradient estimate, e(n)u(n i), are observed, then µ i (n) is decreased µ i (n) =µ i (n m 2 )/c 2 (c 2 > 1) (the algorithm is near the optimum, is better to decelerate; by decreasing the step-size, so that the steady state error will finally decrease).

Lecture 5 9 Comparison of LMS and variable size LMS ( µ = (channel equalization) with c = [10; 20; 50] µ 0.01n+c )within the example from Lecture 4 10 0 Learning curve Ee 2 (n) for LMS algorithm µ=0.075 µ=0.025 µ=0.0075 10 1 Learning curve Ee 2 (n) for variable size LMS algorithm c=10 c=20 c=50 LMS µ=0.0075 10 0 10 1 10 1 10 2 10 2 10 3 0 500 1000 1500 2000 2500 time step n 10 3 0 500 1000 1500 2000 2500 time step n

Lecture 5 10 3. Sign algorithms In high speed communication the time is critical, thus faster adaptation processes is needed. sgn(a) = 1; a>0 0; a =0 1; a<0 The Sign algorithm (other names: pilot LMS, or Sign Error) The Clipped LMS (or Signed Regressor) The Zero forcing LMS (or Sign Sign) w(n +1) =w(n) +µu(n) sgn(e(n)) w(n +1)=w(n)+µ sgn(u(n))e(n) w(n +1) =w(n) +µ sgn(u(n)) sgn(e(n)) The Sign algorithm can be derived as a LMS algorithm for minimizing the Mean absolute error (MAE) criterion J(w) =E[ e(n) ] =E[ d(n) w T u(n) ] (I propose this as an exercise for you). Properties of sign algorithms:

Lecture 5 11 Very fast computation : if µ is constrained to the form µ =2 m, only shifting and addition operations are required. Drawback: the update mechanism is degraded, compared to LMS algorithm, by the crude quantization of gradient estimates. * The steady state error will increase * The convergence rate decreases The fastest of them, Sign-Sign, is used in the CCITT ADPCM standard for 32000 bps system.

Lecture 5 12 Comparison of LMS and Sign LMS within the example from Lecture 4 (channel equalization) 10 0 Learning curve Ee 2 (n) for LMS algorithm µ=0.075 µ=0.025 µ=0.0075 10 1 Learning curve Ee 2 (n) for Sign LMS algorithm µ=0.075 µ=0.025 µ=0.0075 µ=0.0025 10 0 10 1 10 1 10 2 10 2 10 3 0 500 1000 1500 2000 2500 time step n 10 3 0 500 1000 1500 2000 2500 time step n Sign LMS algorithm should be operated at smaller step-sizes to get a similar behavior as standard LMS algorithm.

Lecture 5 13 4. Linear smoothing of LMS gradient estimates Lowpass filtering the noisy gradient Let us rename the noisy gradient g(n) = ˆ w J g(n) = ˆ w J = 2u(n)e(n) g i (n) = 2e(n)u(n i) Passing the signals g i (n) through low pass filters will prevent the large fluctuations of direction during adaptation process. b i (n) = LP F (g i (n)) where LPF denotes a low pass filtering operation. The updating process will use the filtered noisy gradient w(n +1)=w(n) µb(n) The following versions are well known: Averaged LMS algorithm When LPF is the filter with impulse response h(0) = 1 N,...,h(N 1) = 1 N,h(N) =h(n +1)=...=0 we obtain simply the average of gradient components: w(n +1)=w(n)+ µ N n j=n N +1 e(j)u(j)

Lecture 5 14 Momentum LMS algorithm When LPF is an IIR filter of first order h(0) = 1 γ,h(1) = γh(0),h(2) = γ 2 h(0),... then, b i (n) = LP F (g i (n)) = γb i (n 1) + (1 γ)g i (n) b(n) = γb(n 1)+(1 γ)g(n) The resulting algorithm can be written as a second order recursion: w(n +1) = w(n) µb(n) γw(n) = γw(n 1) γµb(n 1) w(n +1) γw(n) = w(n) γw(n 1) µb(n)+γµb(n 1) w(n +1) = w(n)+γ(w(n) w(n 1)) µ(b(n) γb(n 1)) w(n +1) = w(n)+γ(w(n) w(n 1)) µ(1 γ)g(n) w(n +1) = w(n)+γ(w(n) w(n 1))+2µ(1 γ)e(n)u(n) w(n +1) w(n) = γ(w(n) w(n 1)) + µ(1 γ)e(n)u(n) Drawback: The convergence rate may decrease. Advantages: The momentum term keeps the algorithm active even in the regions close to minimum. For nonlinear criterion surfaces this helps in avoiding local minima (as in neural network learning by backpropagation)

Lecture 5 15 5. Nonlinear smoothing of LMS gradient estimates If there is an impulsive interference in either d(n) oru(n), the performances of LMS algorithm will drastically degrade (sometimes even leading to instabilty). Smoothing the noisy gradient components using a nonlinear filter provides a potential solution. The Median LMS Algorithm Computing the median of window size N + 1, for each component of the gradient vector, will smooth out the effect of impulsive noise. The adaptation equation can be implemented as w i (n +1) = w i (n) µ med ((e(n)u(n i)), (e(n 1)u(n 1 i)),...,(e(n N)u(n N i))) * Experimental evidence shows that the smoothing effect in impulsive noise environment is very strong. * If the environment is not impulsive, the performances of Median LMS are comparable with those of LMS, thus the extra computational cost of Median LMS is not worth. * However, the convergence rate must be slower than in LMS, and there are reports of instabilty occurring whith Median LMS.

Lecture 5 16 6. Double updating algorithm As usual, we first compute the linear combination of the inputs then the error y(n) =w(n) T u(n) (1) e(n) =d(n) y(n) The parameter adaptation is done as in Normalized LMS: µ w(n +1)=w(n)+ u(n) 2e(n)u(n) Now the new feature of double updating algorithm is the output estimate as ŷ(n) =w(n +1) T u(n) (2) Since we use at each time instant, n, two different computations of the filter output ((1) and (2)), the name of the method is double updating. One interesting situation is obtained when µ = 1. Then ŷ(n) = w(n +1) T u(n) =(w(n) T µ + u(n) T u(n) e(n)u(n)t )u(n) = w(n) T u(n)+e(n) =w(n) T u(n)+d(n) w(n) T u(n) =d(n) Finaly, the output of the filter is equal to the desired signal ŷ(n) =d(n), and the new error is zero ê(n) =0. This situation is not acceptable (sometimes too good is worse than good). Why?

Lecture 5 17 7. LMS Volterra algorithm We will generalize the LMS algorithm, from linear combination of inputs (FIR filters) to quadratic combinations of the inputs. y(n) = w(n) T u(n) = M 1 y(n) = M 1 k=0 k=0 w k (n)u(n k) w [1] k (n)u(n k)+ M 1 i=0 M 1 We will introduce the input vector with dimension M + M(M +1)/2: j=i Linear combiner w [2] i,j(n)u(n i)u(n j) (3) Quadratic combiner ψ(n) = [ u(n) u(n 1)... u(n M +1) u 2 (n) u(n)u(n 1)... u 2 (n M +1) ] T and the parameter vector θ(n) = [ w [1] 0 (n) w [1] 1 (n)... w [1] M 1(n) w [2] 0,0(n) w [2] 0,1(n)... w [2] M 1,M 1(n) Now the output of the quadratic filter (3) can be written and therefore the error y(n) = θ(n) T ψ(n) e(n) = d(n) y(n) =d(n) θ(n) T ψ(n) is a linear function of the filter parameters (i.e. the entries of θ(n)) ] T

Lecture 5 18 Minimization of the mean square error criterion J(θ) =E(e(n)) 2 = E(d(n) θ(n) T ψ(n)) 2 will proceed as in the linear case, resulting in the LMS adptation equation θ(n +1)=θ(n)+µψ(n)e(n) Some proprties of LMS for Volterra filters are the following: The mean sense convergence of the algorithm is obtained when 0 <µ< 2 (4) λ max where λ max is the maximum eigenvalue of the correlation matrix R = Eψ(n)ψ(n) T If the distribution of the input is symmetric (as for the Gaussian distribution), the corelation matrix has the block structure R = R 2 0 0 R 4 corresponding to the partition in linear and quadratic coefficients θ(n) = w[1] (n) w [2] (n) ψ(n) = u[1] (n) u [2] (n)

Lecture 5 19 and the adaptation problem decouples in two problems, i.e. the Wiener optimal filters will be solutions of R 2 w [1] R 4 w [2] = Ed(n)u [1] (n) = Ed(n)u [2] (n) and similarly, the LMS solutions can be analyzed separately, for linear, w [1], and quadratic, w [2], coefficients, according to the eigenvalues of R 2 and R 4.