Adaptive Filtering Algorithms
Two major uses of adaptive filtering algorithms are:
1. Adaptive Noise Canceling (Adaptive Interference Canceling). A reference signal is matched with a primary signal. The reference signal contains purely noise and the primary signal is the measured physiological variable with additive, common mode noise.
2. Template Matching - a template is matched with an input signal or image. Weights are used to adjust prespecified parameters of the template until it best matches the input signal or image based on an error criterion.
The above two uses are similar in that during adaptive noise canceling, the reference signal is actually a template. The reference signal will best match the primary signal when the template (reference or noise signal) matches the additive noise contained in the primary signal. An example of noise cancellation is shown in the following figure. There is 60 Hz (line frequency interference) present on the primary signal so that it appears corrupted (center panel). The reference signal (left panel) shows the 60 Hz noise. The primary signal after noise removal (right panel) shows the true biological signal uncorrupted by noise; in this case three cycles of a blood pressure pulse. In this example there was a 1:1 correspondence of the 60 Hz on the primary and reference signals; therefore a simple subtraction completely removed the noise, which is called common mode. Likewise, if there were a 1:2 correspondence between the amplitudes of the noise on the primary and reference signals, subtracting from the primary 0.5 x the reference signal would complete cancel the noise. But what if the amplitude of the noise on the primary and reference signals varied independently (i.e., time variant function)? A constant scaling factor could not be used to adequately cancel the noise from the primary. In this case, adaptive filtering is necessary (i.e., the scaling coefficient to adjust the amplitude of the reference signal so that it best cancels the noise on the primary varies over time according to an error criterion). An error criterion is a measure used to compare the primary signal to the reference signal. For the purpose of adaptive noise cancellation, they will be most alike when the scaling factor or gain used to adjust the amplitude of the reference is such that the amplitude of the common mode noise on the reference and the common mode noise on the primary are the same (such as occurs in the figure). At this point, the scaling factor is said to have converged to its optimal value. Another name for the scaling factor or gain is simply the weight w. At convergence, the optimal weighting is achieved, symbolized as w*.


Adaptive noise cancellation (filtering) can be implemented in hardware or software. Implemented efficiently in hardware, it will process the incoming data rapidly, but it is often more difficult to change the algorithm used for cancellation in mid-stream (often requiring change in wire connections). For hardware implementation, the data can be filtered while still in analog form or after digitization. Implemented in software, the data is actually ported into a commercial digital computer first, and it is then processed via software algorithm. Software algorithms are slower to run; however, their advantage is that it is often easier to change the algorithm (just re-code the program). Following is a derivation for an adaptive noise cancellation algorithm.
Algorithm:
We seek to adjust the gain g of a reference signal x for best cancellation of the noise n from primary signal d, where the elements of the vectors are the time samples and the gain weight is a scalar. The primary signal is also termed the desired signal because it is desired to remove noise from it by best cancellation with a reference signal. The resulting signal after noise cancellation, or removal, from the primary is called the error signal
e. We can write the relationship at any discrete time k = 1 to n as:
e
k = dk - gk * xk
If the noise amplitude on the primary versus the reference signal is constant for all time at a ratio of 1:1, there would be no need for adaptive cancellation, gk would be time invariant (g, without the time index symbol) and equal to 1, and the above equation would be reduced to:
e
k = dk - xk
However, as we have said above, often the gain is time varying; therefore let us consider how to treat the problem. We mentioned that e k is an error function. The mean squared error function (MSE, often symbolized as x) is written as:
x
k = E[ek2]
where E is the expectation operator (take the mean over all time). This function is parabolic and concave up, with the minimum occurring, of course, at the minimum mean squared error. For **any** weight w that is used to adjust the shape of x, whether it is the gain or some other parameter, the optimal weighting w* will be the weight at which the mean squared error is at a global minimum value. In the
example figure at left, the minimal mean squared error occurs when the weight is 0.5, therefore w* = 0.5 in this case. The figure at left shows the average relationship of the squared error to the weight for all time; the parabola is also called a performance surface, error bowl, or simply, bowl. If the optimal weight w* is time varying, it means that the performance surface will shift, over short time intervals, to the left or right or both.
How might the weighting be adjusted automatically to compensate for time variance? The method of steepest descent can be used to update the weight w based on the equation:
wk+1 = wk - m Ñ k
where k is the time index, m is a convergence coefficient, and Ñk is the gradient of the mean squared error at discrete time k (the slope, or tangent, at any particular point on the parabolic curve):
Ñ
k = ¶ E[ek2] / ¶ wk = ¶ xk / ¶ wkwhere
¶ is the partial derivative. By inspection one can note that the weight will increase from time k to time k+1 if the weight at time k is less than 0.5, and it will decrease if the weight is greater than 0.5. Hence, the weight adjusts itself with each iteration toward the bottom of the weight bowl. The step size of the jump (change in weight value from k to k+1) depends on the position along the parabola - it will be greater when the actual weight is far from the optimal weight. The gradient is based on the average MSE over all time. For simplicity, the following estimate of the gradient can be used:Ñ
k » Dek2 / Dwk = [(ek+)2 - (ek-)2] / Dwkwhere a difference equation has replaced the partial derivative. Note that instead of computing the gradient over all time, it is now computed over 1 sample point! This is an equation of finite differences, i.e., a minute (incremental) change in the weight
Dw is used to vary the error to determine the gradient. e k+ is the error at w + Dw, e k- is the error at w - Dw. If the weight is a scaling factor or gain, then:e
k = dk - w * xke
k+ = dk - (w + Dw) * xke
k- = dk - (w - Dw) * xk
The weighted reference signal at discrete time k is called the desired signal yk, where:
yk = f (wk, xk)
Hence:
e
k = dk - yke
k+ = dk - yk+e
k- = dk - yk-
where the + and - superscripts on y denote whether the incremental weight change is positive or negative. If we substitute the equations for ek+ and ek- into the gradient estimate equation, we obtain the following:
(ek+)2 - (ek-)2 = (dk - yk+)2 - (dk - yk-)2
= [2 dk - 2 dk yk+ + (yk+)2] - [2 dk - 2 dk yk- + (yk-)2]
= [(yk+)2 - (yk-)2] - 2 dk (yk+ - yk-)
= (yk+ + yk-) (yk+ - yk-) - 2 dk (yk+ - yk-)
= 2 yk (yk+ - yk-) - 2 dk (yk+ - yk-)
= -2 (dk - yk ) (yk+ - yk-)
= -2 ek (yk+ - yk-)
supposing that yk+ + yk- » 2 yk. Therefore:
Ñ
k » [-2 e k (yk+ - yk-)] / 2Dw
where
Dw is a constant if the finite difference in weights (+Dw and - Dw) are held constant, so that Dw - (- Dw) = 2Dw. The above equation states that the gradient estimate is proportional to the product of the error e and the derivative of the estimated signal y with respect to any weight parameter, wk , at discrete time k. Note that although we make Dw constant, the weight parameter wk itself is time varying (that is the whole point of adapting it). If wk is the signal gain, then the above equations reduce to the Widrow-Hoff Least Mean Squares (LMS) algorithm:Ñ
k » {-2 e k [(g+Dg) xk - (g-Dg) xk]} / 2Dg= [-2
ek (2Dg) xk] / 2Dg= -2
ek xkwk+1 = wk - 2m ek xk (LMS algorithm)
This algorithm is used ubiquitously in biomedical engineering for adaptive noise cancellation. It is simply the last equation above that is implemented by computer algorithm. Each sample k of the primary signal dk and the reference signal xk are fed into the equation to determine wk+1:
Input dk, xk
e
k = dk - wk * xkwk+1 = wk - 2m ek xk
That is all the code that is needed! Based on theoretical considerations, the scalar convergence coefficient m is such that 0 < m < 1. When m is closer to 1, the weight converges more rapidly but there is more "jitter" (it oscillates back and forth upon convergence to the optimal weight). When m is closer to 0, the weight converges less rapidly but there is also less "jitter".
Template Matching
A variant of the above can be used for template matching. Let xk , k = 1, n, be a fixed segment of the reference signal to be used as a template for matching with dk , k = 1, n, the primary signal. The gradient estimate can then be computed as a time average over the segment length used for matching:
Ñ
k » (1/n) S [-2 ek (yk+ - yk-)] / 2Dw for k = 1, n
The weight is updated iteratively (index i) using the same time segmented signals:
wi+1 = wi + 2
m S [ek (yk+ - yk-)] for k = 1, n
where the product is summed for all points k = 1 to n in the segment (constants 1/n and 1/2
Dw are included in m). The weight being adjusted to best match the template with the input can be the gain, but also other weights as well such as the time duration, phase lag, and average level can be used. In the color figure below, it is shown how the reference signal or template (green) might adapt to the shape of the primary signal (red), after 1, 5, 20, and 500 iterations. Four parameters were weighted: the gain (scale along the y axis), time duration (scale along the x axis), average level (shift along the y axis), and time delay or phase lag (shift along the x axis). The optimal weights for these parameters (the weights upon convergence) are a measure of the difference between the two signals prior to weighting and can be useful to quantify physiologic changes.
One difficulty with this approach is that there may be multiple "local minima" in the performance surface. It means that instead of a single parabola describing the relationship between squared error and weight, multiple parabola can occur. This can occur for example for phase lag and time duration weights if there are multiple deflections in the signals. As the template is shifted or scaled along the x axis, its deflections will align with different deflections of the primary, producing local minima. Therefore, the initial overlap of the signals prior to adaptive filtering must such that the initialized weights are within the concavity of the global minimum of the performance surface.

An example of template matching with multiple minima in the performance surface is given in the figures above. In the figure at left the input (primary) signal is traced in black and the template (reference) is traced in red. The template was actually extracted as the segment from sample number 400-760 of the primary (361 sample points which is the interval of approximately 1 heartbeat in the figure at left). The weight to be adjusted is the time shift (phase lag). The mean sum of squares of the difference between input and template was computed beginning with the template at the position shown, and then computing again each time the template was shifted one sample point to the right. The code for doing this is given below. The resulting performance surface is shown in the right-hand figure. The global minimum of the sum of squares error curve occurs when the template is approximately centered at sample number 590, which is when its major peak is aligned with the major peak at the center of the primary trace at left. Local minima in the performance surface occur at about sample points 390 and 820 in the curve at right. To converge to the global minimum based on the equation of steepest descent, the phase weight should be initialized so that the template is within the concavity of the global minimum (between about 460 and 750).
Program: sum of squares error
real a(1200), b(361), sum(1200)
open (7, file = "primary.txt")
open (8, file = "template.txt")
open (9, file = "mserror.txt")
do 1 i = 1, 1200
read (7,*) a(i)
if (i.le.361) read(8,*) b(i)
1 continue
do 2 i = 1, 839
do 2 j = 1, 361
sum(i) = sum(i) + (a(i+j) - b(j))**2
2 continue
sum = sum/361
do 3 i = 1, 839
write (9,*) i + 180, sum(i)
3 continue
stop
end
References
1 - Ciaccio EJ, Dunn SM, Akay M et al. Characterization of Spontaneous Changes in Electrogram Morphology. Ciaccio et al. IEEE Computers in Cardiology 1994:701-704.
2 - Widrow B and McCool JM. A comparison of adaptive algorithms based on the method of steepest descent and random search. IEEE Trans Ant Prop 1976; AP-24:615-636
3 - Verhuecks NAM, van der Elzen HC, Snijders FAM et al. Distal echo cancellation for base band data transmission. IEEE Trans Acoust Speech Sig Proc 1979; ASSP-27:768-781.