4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department Email : sss40@eng.cam.ac.



Similar documents
Background 2. Lecture 2 1. The Least Mean Square (LMS) algorithm 4. The Least Mean Square (LMS) algorithm 3. br(n) = u(n)u H (n) bp(n) = u(n)d (n)

Lecture 5: Variants of the LMS algorithm

Scott C. Douglas, et. Al. Convergence Issues in the LMS Adaptive Filter CRC Press LLC. <

Computer exercise 2: Least Mean Square (LMS)

The Filtered-x LMS Algorithm

Enhancing the SNR of the Fiber Optic Rotation Sensor using the LMS Algorithm

Adaptive Equalization of binary encoded signals Using LMS Algorithm

Luigi Piroddi Active Noise Control course notes (January 2015)

ADAPTIVE ALGORITHMS FOR ACOUSTIC ECHO CANCELLATION IN SPEECH PROCESSING

Stability of the LMS Adaptive Filter by Means of a State Equation

Adaptive Notch Filter for EEG Signals Based on the LMS Algorithm with Variable Step-Size Parameter

Final Year Project Progress Report. Frequency-Domain Adaptive Filtering. Myles Friel. Supervisor: Dr.Edward Jones

Adaptive Variable Step Size in LMS Algorithm Using Evolutionary Programming: VSSLMSEV

System Identification for Acoustic Comms.:

A STUDY OF ECHO IN VOIP SYSTEMS AND SYNCHRONOUS CONVERGENCE OF

LMS is a simple but powerful algorithm and can be implemented to take advantage of the Lattice FPGA architecture.

Signal Detection. Outline. Detection Theory. Example Applications of Detection Theory

TTT4120 Digital Signal Processing Suggested Solution to Exam Fall 2008

Advanced Signal Processing and Digital Noise Reduction

Maximum Likelihood Estimation of ADC Parameters from Sine Wave Test Data. László Balogh, Balázs Fodor, Attila Sárhegyi, and István Kollár

EE289 Lab Fall LAB 4. Ambient Noise Reduction. 1 Introduction. 2 Simulation in Matlab Simulink

4F7 Adaptive Filters (and Spectrum Estimation) Kalman Filter. Sumeetpal Singh sss40@eng.cam.ac.uk

Statistical Machine Learning

How I won the Chess Ratings: Elo vs the rest of the world Competition

Packet Telephony Automatic Level Control on the StarCore SC140 Core

Dynamic Process Modeling. Process Dynamics and Control

Log-Likelihood Ratio-based Relay Selection Algorithm in Wireless Network

USING SPECTRAL RADIUS RATIO FOR NODE DEGREE TO ANALYZE THE EVOLUTION OF SCALE- FREE NETWORKS AND SMALL-WORLD NETWORKS

Linear Threshold Units

The p-norm generalization of the LMS algorithm for adaptive filtering

Logistic Regression. Jia Li. Department of Statistics The Pennsylvania State University. Logistic Regression

CS 688 Pattern Recognition Lecture 4. Linear Models for Classification

Solutions to Exam in Speech Signal Processing EN2300

LEAST-MEAN-SQUARE ADAPTIVE FILTERS

Computational Optical Imaging - Optique Numerique. -- Deconvolution --

Neuro-Dynamic Programming An Overview

Modern Optimization Methods for Big Data Problems MATH11146 The University of Edinburgh

The CUSUM algorithm a small review. Pierre Granjon

Numerisches Rechnen. (für Informatiker) M. Grepl J. Berger & J.T. Frings. Institut für Geometrie und Praktische Mathematik RWTH Aachen

The QOOL Algorithm for fast Online Optimization of Multiple Degree of Freedom Robot Locomotion

Analysis of Mean-Square Error and Transient Speed of the LMS Adaptive Algorithm

International Journal of Advanced Research in Electrical, Electronics and Instrumentation Engineering

An Introduction to Machine Learning

Adaptive Online Gradient Descent

Multiple Linear Regression in Data Mining

PID Controller Design for Nonlinear Systems Using Discrete-Time Local Model Networks

IMPLEMENTATION OF THE ADAPTIVE FILTER FOR VOICE COMMUNICATIONS WITH CONTROL SYSTEMS

CSCI567 Machine Learning (Fall 2014)

Adaptive Sampling Rate Correction for Acoustic Echo Control in Voice-Over-IP Matthias Pawig, Gerald Enzner, Member, IEEE, and Peter Vary, Fellow, IEEE

Chapter 13 Introduction to Nonlinear Regression( 非 線 性 迴 歸 )

A Stochastic 3MG Algorithm with Application to 2D Filter Identification

QAM Demodulation. Performance Conclusion. o o o o o. (Nyquist shaping, Clock & Carrier Recovery, AGC, Adaptive Equaliser) o o. Wireless Communications

Kristine L. Bell and Harry L. Van Trees. Center of Excellence in C 3 I George Mason University Fairfax, VA , USA kbell@gmu.edu, hlv@gmu.

Probability and Random Variables. Generation of random variables (r.v.)

LOW COST HARDWARE IMPLEMENTATION FOR DIGITAL HEARING AID USING

Optimal linear-quadratic control

Lecture 3: Linear methods for classification

(Quasi-)Newton methods

Bag of Pursuits and Neural Gas for Improved Sparse Coding

The Effects of Start Prices on the Performance of the Certainty Equivalent Pricing Policy

(Refer Slide Time: 01:11-01:27)

POTENTIAL OF STATE-FEEDBACK CONTROL FOR MACHINE TOOLS DRIVES

CS229 Lecture notes. Andrew Ng

Estimation of Fractal Dimension: Numerical Experiments and Software

Sparse Channel Estimation with Zero Tap Detection

Department of Economics

Blind Deconvolution of Barcodes via Dictionary Analysis and Wiener Filter of Barcode Subsections

Project Report. Noise cancellation in a teleconference

Course: Model, Learning, and Inference: Lecture 5

Broadband Networks. Prof. Dr. Abhay Karandikar. Electrical Engineering Department. Indian Institute of Technology, Bombay. Lecture - 29.

5 Transforming Time Series

Creating a NL Texas Hold em Bot

2004 Networks UK Publishers. Reprinted with permission.

Some stability results of parameter identification in a jump diffusion model

Least Squares Estimation

We shall turn our attention to solving linear systems of equations. Ax = b

CCNY. BME I5100: Biomedical Signal Processing. Linear Discrimination. Lucas C. Parra Biomedical Engineering Department City College of New York

PHASE ESTIMATION ALGORITHM FOR FREQUENCY HOPPED BINARY PSK AND DPSK WAVEFORMS WITH SMALL NUMBER OF REFERENCE SYMBOLS

Cloud Computing. Computational Tasks Have value for task completion Require resources (Cores, Memory, Bandwidth) Compete for resources

A Practical Scheme for Wireless Network Operation

Linear Models for Classification

Sequential Tracking in Pricing Financial Options using Model Based and Neural Network Approaches

Designing a learning system

Probabilistic Models for Big Data. Alex Davies and Roger Frigola University of Cambridge 13th February 2014

Regularized Logistic Regression for Mind Reading with Parallel Validation

Applications to Data Smoothing and Image Processing I

PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 4: LINEAR MODELS FOR CLASSIFICATION

Discrete Frobenius-Perron Tracking

Component Ordering in Independent Component Analysis Based on Data Power

GI01/M055 Supervised Learning Proximal Methods

RADIO FREQUENCY INTERFERENCE AND CAPACITY REDUCTION IN DSL

Big Data Analytics. Lucas Rego Drumond

Time Domain and Frequency Domain Techniques For Multi Shaker Time Waveform Replication

TTT4110 Information and Signal Theory Solution to exam

Understanding and Applying Kalman Filtering

Lezione 6 Communications Blockset

Analysis of Bayesian Dynamic Linear Models

Frequency Response of FIR Filters

Numerical Methods for Option Pricing

Factorization Theorems

Transcription:

4F7 Adaptive Filters (and Spectrum Estimation) Least Mean Square (LMS) Algorithm Sumeetpal Singh Engineering Department Email : sss40@eng.cam.ac.uk 1

1 Outline The LMS algorithm Overview of LMS issues concerning stepsize bound, convergence, misadjustment Some simulation examples The normalised LMS (NLMS) Further topics 2

2 Least Mean Square (LMS) Steepest Descent (SD) was h (n + 1) = h (n) µ J (h (n)) 2 = h (n) + µe {u (n) e (n)} Often J (h (n)) = 2E {u (n)e(n)} is unknown or too difficult to derive Remedy is to use the instantaneous approximation 2u (n)e(n) for J (h (n)) Using this approximation we get the LMS algorithm e (n) = d (n) h T (n)u(n), h (n + 1) = h (n) + µe (n)u(n) 3

This is desirable because We do not need knowledge of R and p anymore If statistics are changing over time, it adapts accordingly Complexity: 2M + 1 multiplications and 2M additions per iteration. Not M 2 multiplications like SD Undesirable because we have to choose µ when R not known, subtle convergence analysis 4

3 Application: Noise Cancellation Mic 1: Reference signal d(n) = s(n) }{{} signal of interest + v(n) }{{} noise s (n) and v (n) statistically independent Aim: recover signal of interest Method: use another mic, Mic 2, to record noise only, u(n) Obviously u (n) v (n) but u (n) and v (n) are hopefully correlated Now filter recorded noise u (n) to make it more like v(n) Recovered signal is e(n) = d(n) h(n) T u(n) and not y(n) Run Matlab demo on webpage 5

We are going to see an example with speech s(n) generated as a mean 0 variance 1 Gaussian random variable Mic 1 s noise was 0.5 sin(n π 2 + 0.5) Mic 2 s noise was 10 sin(n π 2 ) Mic 1 and 2 s noise are both sinusoids but with different amplitudes and phase shifts You could increase the phase shift but you will need a larger value for M Run Matlab demo on webpage 6

4 LMS convergence in mean Write the reference signal model as d (n) = u T (n)h opt + ε (n) where h opt denotes the optimal vector (Wiener filter) that h (n) should converge to and ε (n) = d(n) u T (n)h opt For this reference signal model, the LMS becomes h (n + 1) = h (n) + µu (n) ( ) u T (n)h opt + ε (n) u T (n)h(n) = h (n) + µu (n)u T (n) ( h opt h (n) ) + µu (n) ε (n) ( ) (h h (n + 1) h opt = I µu (n)u T ) (n) (n) hopt + µu (n) ε (n) This looks like the SD recursion h (n + 1) h opt = (I µr) ( h (n) h opt ) 7

Introducing the expectation operator gives E { } h (n + 1) h {( opt ) (h = E I µu (n)u T ) } (n) (n) hopt + µe } {u (n) {{ ε (n)} } ( =0 { }) I µe u (n)u T (n) E { } h (n) h opt (Independence assumption) = (I µr)e { h (n) h opt } Independence assumption states that h (n) h opt is independent of u (n)u T (n) since h (n) function of u (0), u (1),..., u (n 1) and all previous desired signals, only true if {u (n)} is an independent sequence of random vectors but not so otherwise. Assumption is better justified for a block LMS type update scheme where the filter is updated at multiples of some block length L, i.e. when n = kl and not otherwise 8

We are back to the SD scenario and so E {h (n)} h opt if 0 < µ < 2 λ max Assuming h (n), being a function of u (0), u (1),..., u (n 1), is independent of u (n) is not true. However, behaviour predicted using this simplifying assumption agrees with experiments and computer simulations The point of the LMS was that we don t have access to R, so how to compute λ max? Solution by a time average: using the fact that for square matrices tr(ab) =tr(ba), and R = QΛQ T, 9

M λ k = tr (Λ) k=1 ( ) = tr ΛQ T Q = tr (QΛQ T) = tr (R) { } = ME u 2 (n) and we see that λ max < M k=1 λ k = ME { u 2 (n) } Note that we can estimate E { u 2 (n) } by a simple sample average and the new tighter bound on the stepsize is 2 0 < µ < ME { u 2 (n) } < 2 λ max With a fixed stepsize, {h (n)} n 0 will never settle at h opt, but rather 10

oscillate about h opt. Even if h (n) = h opt then h (n + 1) h opt = µu (n) e (n), and because u (n)e(n) is random, h (n + 1) will move away from h opt Assume d(n) = u(n) T h opt + ε(n) where ε(n) is a zero mean independent noise sequence. Best case for E { e 2 (n) }, using e(n) = u T (n)h opt + ε (n) u T (n)h(n), is when h (n) = h opt, so that E { e 2 (n) } = E { ε 2 (n) } (= J min ). However, since h (n) h opt, we would expect E { e 2 (n) } > J min. We call E { e 2 (n) } J min J min the misadjustment 11

5 LMS main points Simple to implement Works fine is many applications if filter order and stepsize is chosen properly There is a trade-off effect with the stepsize choice large µ yields better tracking ability in a non-stationary environment but has large misadjustment noise small µ has poorer tracking ability but has smaller misadjustment noise 12

6 Adaptive stepsize: Normalised LMS (NLMS) We showed that LMS was stable provided 2 µ < ME { u 2 (n) } What if E { u 2 (n) } varied, which would be true for a non-stationary input signal LMS should be able to adapt its step-size automatically The instantaneous estimate of ME { u 2 (n) } is u T (n)u(n) µ Now replace the LMS stepsize with u T (n)u(n) = µ u(n) 2 where µ is a constant that should < 2, e.g., 0.25 < µ < 0.75. We make µ smaller because of the poor quality estimate for ME { u 2 (n) } in the denominator This choice of stepsize gives the Normalized Least Mean Squares 13

(NLMS) e (n) = d (n) u T (n)h(n) µ h (n + 1) = h (n) + u (n) 2e (n)u(n) where µ is relabelled to µ. NLMS is the LMS algorithm with a datadependent stepsize There is another interpretation of the NLMS. It can be shown h (n + 1) is the solution to the following optimization problem (Example Sheet): h (n + 1) = arg min h h (n) h s.t. d (n) u T (n)h = 0 In light of new data, the parameters of an adaptive system should only be perturbed in a minimal fashion. Another motivation for the NLMS is to reduce the sensitivity of the filter update to large amplitude fluctuations in u (n) But small amplitudes will now adversely effect the NLMS. The solution 14

is µ h (n + 1) = h (n) + u (n) 2 + ǫ e (n)u(n) where ǫ is a small constant, e.g. 0.0001. 7 Comparing NLMS and LMS Compare the stability of the LMS and NLMS for different values of stepsize. You will see that the NLMS is stable for 0 < µ < 2. You will still need to tune µ to get the desired convergence behaviour (or fluctuations of h (n) once it has stabilized) though. Run the NLMS example on the course website 15

8 Nonlinear LMS LMS algorithm uses linear combination of u (n), u (n 1),..., u (n M + 1) (i.e. u (n)) to predict d (n) y (n) = h T (n)u(n) Why not nonlinear, e.g., quadratic y (n) = + M 1 i=0 M 1 i=0 h 1,i (n)u(n i) M 1 j=i h 2,i,j (n)u(n i)u(n j) This case can be written in the usual form y (n) = θ T (n) } {{ } Linear Parameters Ψ (n) } {{ } Function of u(n) and all we have done with the LMS can be applied 16