840 likes | 2.61k Views
Least Mean-Square Adaptive Filtering. Steepest Descent. The update rule for SD is where or SD is a deterministic algorithm, in the sense that p and R are assumed to be exactly known. In practice we can only estimate these functions. Basic Idea.
E N D
Least Mean-SquareAdaptive Filtering ELE 774 - Adaptive Signal Processing
Steepest Descent • The update rule for SD is where or • SD is a deterministic algorithm, in the sense that p and R are assumed to be exactly known. • In practice we can only estimate these functions. ELE 774 - Adaptive Signal Processing
Basic Idea • The simplest estimate of the expectations is • To remove the expectation terms and replace them with the instantaneous values, i.e. • Then, the gradient becomes • Eventually, the new update rule is No expectations, Instantaneous samples! ELE 774 - Adaptive Signal Processing
Basic Idea • However the term in the brackets is the error, i.e. then • is the gradient of instead of as in SD. ELE 774 - Adaptive Signal Processing
Basic Idea • Filter weights are updated using instantaneous values ELE 774 - Adaptive Signal Processing
Update Equation for Method of Steepest Descent Update Equation for Least Mean-Square ELE 774 - Adaptive Signal Processing
LMS Algorithm unbiased • Since the expectations are omitted, the estimates will have a high variance. • Therefore, the recursive computation of each tap weight in the LMS algorithm suffers from a gradient noise. • In contrast to SD which is a deterministic algorithm, LMS is a member of the family of stochastic gradient descent algorithms. • LMS has higher MSE (J(∞)) compared to SD (Jmin) (Wiener Soln.) as n→∞ • i.e., J(n) →J(∞) as n→∞ • Difference is called the excess mean-square error Jex(∞) • The ratio Jex(∞)/ Jmin is called the misadjustment. • Hopefully, J(∞) is a finite value, then LMS is said to be stable in the mean square sense. • LMS will perform a random motion around the Wiener solution. ELE 774 - Adaptive Signal Processing
LMS Algorithm • Involves a feedback connection. • Although LMS might seem very difficult to work due the randomness, the feedback acts as a low-pass filter or performs averaging so that the randomness can be filtered-out. • The time-constant of averaging is inversely proportional to μ. • Actually, if is chosen small enough, the adaptive process is made to progress slowly and the effects of the gradient noise on the tap weights are largely filtered-out. • Computational complexity of LMS is very low→ very attractive • Only 2M+1 complex multiplications and 2M complex additions per iteration. ELE 774 - Adaptive Signal Processing
LMS Algorithm ELE 774 - Adaptive Signal Processing
Canonical Model • LMS algorithm for complex signals/with complex coef.s can be represented in terms of four separate LMS algorithms for real signals with cross-coupling between them. • Write the input/desired signal/tap gains/output/error in the complex notation ELE 774 - Adaptive Signal Processing
Canonical Model • Then the relations bw. these expressions are ELE 774 - Adaptive Signal Processing
Canonical Model ELE 774 - Adaptive Signal Processing
Canonical Model ELE 774 - Adaptive Signal Processing
Analysis of the LMS Algorithm • Although the filter is a linear combiner, the algorithm is highly non-linear and violates superposition and homogenity • Assume the initial condition , then • Analysis will continue using the weight-error vector and its autocorrelation output input Here we use expectation, however, actually it is the ensemble average!. ELE 774 - Adaptive Signal Processing
Analysis of the LMS Algorithm • We have • Let • Then the update eqn. can be written as • Analyse convergence in an average sense • Algorithm run many times→study their ensemble average behavior ELE 774 - Adaptive Signal Processing
Analysis of the LMS Algorithm Here we use expectation, however, actually it is the ensemble average!. • Using • It can be shown that Small step size assumption ELE 774 - Adaptive Signal Processing
Small Step Size Analysis • Assumption I: step size is small (how small?) → LMS filter act like a low-pass filter with very low cut-off frequency. • Assumption II: Desired response is described by a linear multiple regression model that is matched exactly by the optimum Wiener filter where eo(n) is the irreducible estimation error and • Assumption III: The input and the desired response are jointly Gaussian. ELE 774 - Adaptive Signal Processing
Small Step Size Analysis • Applying the similarity transformation resulting from the eigendecom. on i.e. • Then, we have where We do not have this term in Wiener filtering!. Components of v(n) are uncorrelated! HW: Prove these relations. ELE 774 - Adaptive Signal Processing
Small Step Size Analysis stochastic force • Components of v(n) are uncorrelated: • first order difference equation (Brownian motion, thermodynamics) • Solution: Iterating from n=0 forced component of v(n) natural component of v(n) ELE 774 - Adaptive Signal Processing
Learning Curves • Two kinds of learning curves • Mean-square error (MSE) learning curve • Mean-square deviation (MSD) learning curve • Ensemble averaging→ results of many (→∞) realizations are averaged. • What is the relation bw. MSE and MSD? for small ELE 774 - Adaptive Signal Processing
for small Learning Curves • under the assumptions of slide 17. • Excess MSE • LMS performs worse than SD, there is always an excess MSE ← use ELE 774 - Adaptive Signal Processing
Learning Curves • Mean-square deviation D is lower-upper bounded by the excess MSE. • They have similar response: decaying as n grows or ELE 774 - Adaptive Signal Processing
Convergence • For small • Hence, for convergence • The ensemble-average learning curve of an LMS filter does not exhibit oscillations, rather, it decays exponentially to the const. value or Jex(n) ELE 774 - Adaptive Signal Processing
Misadjustment • Misadjustment, define • For small , from prev. slide or equivalently but then ELE 774 - Adaptive Signal Processing
Average Time Constant • From SD we know that but then ELE 774 - Adaptive Signal Processing
Observations • Misadjustment is • directly proportional to the filter length M, for a fixed mse,av • inversely proportional to the time constant mse,av • slower convergence results in lower misadjustment. • Directly proportional to the step size • smaller step size results in lower misadjustment. • Time constant is • inversely proportional to the step size • smaller step size results in slower convergence • Large requires the inclusion of k(n) (k≥1) into the analysis • Difficult to analyse, small step analysis is no longer valid, • learning curve becomes more noisy ELE 774 - Adaptive Signal Processing
LMS vs. SD • Main goal is to minimise the Mean Square Error (MSE) • Optimum solution found by Wiener-Hopf equations. • Requires auto/cross-correlations. • Achieves the minimum value of MSE, Jmin. • LMS and SD are iterative algorithms designed to find wo. • SD has direct access to auto/cross-correlations (exact measurements) • can approach the Wiener solution wo, can go down to Jmin. • LMS uses instantenous estimates instead (noisy measurements) • fluctuates around wo in a Brownian-motion manner, at most J(∞). ELE 774 - Adaptive Signal Processing
LMS vs. SD • Learning curves • SD has a well-defined curve composed of decaying exponentials • For LMS, curve is composed of noisy- decaying exponentials ELE 774 - Adaptive Signal Processing
Statistical Wave Theory • As filter length increases, M→∞ • Propagation of electromagnetic disturbances along a transmission line towards infinity is similar to signals on n infinitely long LMS filter. • Finite length LMS filter (transmission line) • Corrections have to be made at the edges to tackle reflections, • As length increases reflection region decreases compared to the total filter. • Imposes a limit on the step size to avoid instability as M→∞ • If the upper bound is exceeded, instability is observed. Smax: maximum component of the PSD S(ω) of the tap inputs u(n). ELE 774 - Adaptive Signal Processing
H∞ Optimality of LMS • A single realisationof LMS is not optimum in the MSE sense • Ensemble average is. • The previous derivation is heuristic • (replacing auto/cross correlations with their instantenous estimates.) • In what sense is LMS optimum? • It can be shown that LMS minimises • Maximum energy gain of the filter under the constraint • Minimising the maximum of something → minimax • Optimisation of an H∞ criterion. ELE 774 - Adaptive Signal Processing
H∞ Optimality of LMS • Provided that the step size parameter satisfies the limits on the prev. slide, then • no matter how different the initial weight vector is from the unknown parameter vector wo of the multiple regression model, and • irrespective of the value of the additive disturbancen(n), • the error energy produced at the output of the LMS filter will never exceed a certain level. ELE 774 - Adaptive Signal Processing
Limits on the Step Size ELE 774 - Adaptive Signal Processing