1 / 32

Least Mean-Square Adaptive Filtering

Least Mean-Square Adaptive Filtering. Steepest Descent. The update rule for SD is where or SD is a deterministic algorithm, in the sense that p and R are assumed to be exactly known. In practice we can only estimate these functions. Basic Idea.

elita
Download Presentation

Least Mean-Square Adaptive Filtering

An Image/Link below is provided (as is) to download presentation Download Policy: Content on the Website is provided to you AS IS for your information and personal use and may not be sold / licensed / shared on other websites without getting consent from its author. Content is provided to you AS IS for your information and personal use only. Download presentation by click this link. While downloading, if for some reason you are not able to download a presentation, the publisher may have deleted the file from their server. During download, if you can't get a presentation, the file might be deleted by the publisher.

E N D

Presentation Transcript


  1. Least Mean-SquareAdaptive Filtering ELE 774 - Adaptive Signal Processing

  2. Steepest Descent • The update rule for SD is where or • SD is a deterministic algorithm, in the sense that p and R are assumed to be exactly known. • In practice we can only estimate these functions. ELE 774 - Adaptive Signal Processing

  3. Basic Idea • The simplest estimate of the expectations is • To remove the expectation terms and replace them with the instantaneous values, i.e. • Then, the gradient becomes • Eventually, the new update rule is No expectations, Instantaneous samples! ELE 774 - Adaptive Signal Processing

  4. Basic Idea • However the term in the brackets is the error, i.e. then • is the gradient of instead of as in SD. ELE 774 - Adaptive Signal Processing

  5. Basic Idea • Filter weights are updated using instantaneous values ELE 774 - Adaptive Signal Processing

  6. Update Equation for Method of Steepest Descent Update Equation for Least Mean-Square ELE 774 - Adaptive Signal Processing

  7. LMS Algorithm unbiased • Since the expectations are omitted, the estimates will have a high variance. • Therefore, the recursive computation of each tap weight in the LMS algorithm suffers from a gradient noise. • In contrast to SD which is a deterministic algorithm, LMS is a member of the family of stochastic gradient descent algorithms. • LMS has higher MSE (J(∞)) compared to SD (Jmin) (Wiener Soln.) as n→∞ • i.e., J(n) →J(∞) as n→∞ • Difference is called the excess mean-square error Jex(∞) • The ratio Jex(∞)/ Jmin is called the misadjustment. • Hopefully, J(∞) is a finite value, then LMS is said to be stable in the mean square sense. • LMS will perform a random motion around the Wiener solution. ELE 774 - Adaptive Signal Processing

  8. LMS Algorithm • Involves a feedback connection. • Although LMS might seem very difficult to work due the randomness, the feedback acts as a low-pass filter or performs averaging so that the randomness can be filtered-out. • The time-constant of averaging is inversely proportional to μ. • Actually, if is chosen small enough, the adaptive process is made to progress slowly and the effects of the gradient noise on the tap weights are largely filtered-out. • Computational complexity of LMS is very low→ very attractive • Only 2M+1 complex multiplications and 2M complex additions per iteration. ELE 774 - Adaptive Signal Processing

  9. LMS Algorithm ELE 774 - Adaptive Signal Processing

  10. Canonical Model • LMS algorithm for complex signals/with complex coef.s can be represented in terms of four separate LMS algorithms for real signals with cross-coupling between them. • Write the input/desired signal/tap gains/output/error in the complex notation ELE 774 - Adaptive Signal Processing

  11. Canonical Model • Then the relations bw. these expressions are ELE 774 - Adaptive Signal Processing

  12. Canonical Model ELE 774 - Adaptive Signal Processing

  13. Canonical Model ELE 774 - Adaptive Signal Processing

  14. Analysis of the LMS Algorithm • Although the filter is a linear combiner, the algorithm is highly non-linear and violates superposition and homogenity • Assume the initial condition , then • Analysis will continue using the weight-error vector and its autocorrelation output input Here we use expectation, however, actually it is the ensemble average!. ELE 774 - Adaptive Signal Processing

  15. Analysis of the LMS Algorithm • We have • Let • Then the update eqn. can be written as • Analyse convergence in an average sense • Algorithm run many times→study their ensemble average behavior ELE 774 - Adaptive Signal Processing

  16. Analysis of the LMS Algorithm Here we use expectation, however, actually it is the ensemble average!. • Using • It can be shown that Small step size assumption ELE 774 - Adaptive Signal Processing

  17. Small Step Size Analysis • Assumption I: step size  is small (how small?) → LMS filter act like a low-pass filter with very low cut-off frequency. • Assumption II: Desired response is described by a linear multiple regression model that is matched exactly by the optimum Wiener filter where eo(n) is the irreducible estimation error and • Assumption III: The input and the desired response are jointly Gaussian. ELE 774 - Adaptive Signal Processing

  18. Small Step Size Analysis • Applying the similarity transformation resulting from the eigendecom. on i.e. • Then, we have where We do not have this term in Wiener filtering!. Components of v(n) are uncorrelated! HW: Prove these relations. ELE 774 - Adaptive Signal Processing

  19. Small Step Size Analysis stochastic force • Components of v(n) are uncorrelated: • first order difference equation (Brownian motion, thermodynamics) • Solution: Iterating from n=0 forced component of v(n) natural component of v(n) ELE 774 - Adaptive Signal Processing

  20. Learning Curves • Two kinds of learning curves • Mean-square error (MSE) learning curve • Mean-square deviation (MSD) learning curve • Ensemble averaging→ results of many (→∞) realizations are averaged. • What is the relation bw. MSE and MSD? for  small ELE 774 - Adaptive Signal Processing

  21. for  small Learning Curves • under the assumptions of slide 17. • Excess MSE • LMS performs worse than SD, there is always an excess MSE ← use ELE 774 - Adaptive Signal Processing

  22. Learning Curves • Mean-square deviation D is lower-upper bounded by the excess MSE. • They have similar response: decaying as n grows or ELE 774 - Adaptive Signal Processing

  23. Convergence • For  small • Hence, for convergence • The ensemble-average learning curve of an LMS filter does not exhibit oscillations, rather, it decays exponentially to the const. value or Jex(n) ELE 774 - Adaptive Signal Processing

  24. Misadjustment • Misadjustment, define • For small , from prev. slide or equivalently but then ELE 774 - Adaptive Signal Processing

  25. Average Time Constant • From SD we know that but then ELE 774 - Adaptive Signal Processing

  26. Observations • Misadjustment is • directly proportional to the filter length M, for a fixed mse,av • inversely proportional to the time constant mse,av • slower convergence results in lower misadjustment. • Directly proportional to the step size  • smaller step size results in lower misadjustment. • Time constant is • inversely proportional to the step size  • smaller step size results in slower convergence • Large  requires the inclusion of k(n) (k≥1) into the analysis • Difficult to analyse, small step analysis is no longer valid, • learning curve becomes more noisy ELE 774 - Adaptive Signal Processing

  27. LMS vs. SD • Main goal is to minimise the Mean Square Error (MSE) • Optimum solution found by Wiener-Hopf equations. • Requires auto/cross-correlations. • Achieves the minimum value of MSE, Jmin. • LMS and SD are iterative algorithms designed to find wo. • SD has direct access to auto/cross-correlations (exact measurements) • can approach the Wiener solution wo, can go down to Jmin. • LMS uses instantenous estimates instead (noisy measurements) • fluctuates around wo in a Brownian-motion manner, at most J(∞). ELE 774 - Adaptive Signal Processing

  28. LMS vs. SD • Learning curves • SD has a well-defined curve composed of decaying exponentials • For LMS, curve is composed of noisy- decaying exponentials ELE 774 - Adaptive Signal Processing

  29. Statistical Wave Theory • As filter length increases, M→∞ • Propagation of electromagnetic disturbances along a transmission line towards infinity is similar to signals on n infinitely long LMS filter. • Finite length LMS filter (transmission line) • Corrections have to be made at the edges to tackle reflections, • As length increases reflection region decreases compared to the total filter. • Imposes a limit on the step size to avoid instability as M→∞ • If the upper bound is exceeded, instability is observed. Smax: maximum component of the PSD S(ω) of the tap inputs u(n). ELE 774 - Adaptive Signal Processing

  30. H∞ Optimality of LMS • A single realisationof LMS is not optimum in the MSE sense • Ensemble average is. • The previous derivation is heuristic • (replacing auto/cross correlations with their instantenous estimates.) • In what sense is LMS optimum? • It can be shown that LMS minimises • Maximum energy gain of the filter under the constraint • Minimising the maximum of something → minimax • Optimisation of an H∞ criterion. ELE 774 - Adaptive Signal Processing

  31. H∞ Optimality of LMS • Provided that the step size parameter  satisfies the limits on the prev. slide, then • no matter how different the initial weight vector is from the unknown parameter vector wo of the multiple regression model, and • irrespective of the value of the additive disturbancen(n), • the error energy produced at the output of the LMS filter will never exceed a certain level. ELE 774 - Adaptive Signal Processing

  32. Limits on the Step Size ELE 774 - Adaptive Signal Processing

More Related