2.04 Denoising and Signal-to-Noise Ratio Enhancement: Derivatives V.-M. Taavitsainen, Helsinki Metropolia University of Applied Sciences, Vantaa, Finland ª 2009 Elsevier B.V. All rights reserved.
2.04.1 2.04.1.1 2.04.1.2 2.04.1.3 2.04.1.4 References
Derivatives Differentiation Based on Local (Discontinuous) Approximation Differentiation Based on Local Fitting Differentiation Based on Global Fitting Differentiation Based on Global Parametric Fitting
Symbols A, B,. . . e(i) "i f(x) df f 9ðx Þ or dx d 2f f 0ðx Þ or dx 2 dnf ðnÞ f ðx Þ or dx n
matrices ith leave-one-out residual ith measurement error function of x, for example, a spectrum derivative of f
n x x, y, . . . (xi, yi) yi y˜ i
second derivative of f nth derivative of f
yˆ i
57 59 60 61 64 65
smoothness controlling parameter number of data points argument, for example, wavelength vectors data points ith measured response (signal) ith true value of the response (signal) ith estimated value of yi
2.04.1 Derivatives Differential and integral calculus are the cornerstones in modeling problems of natural sciences. However, in chemometrics, differentiation and integration are typically used for pretreatment of chemometric signals, for example spectra. The main goals are to enhance peak resolution or to remove baselines or trends in spectra. The theory of numerical differentiation is presented in most textbooks about numerical methods. For example, Hoffman1 is a good general book about numerical methods in engineering and science, and FooTim et al.2 give chemometric applications of numerical methods. In the following, we shall limit the discussion to continuous, and usually at least twice differentiable functions defined on a closed interval [a, b]. Let us recall the definition of the derivative of a function: the derivative of f ðxÞ at x is defined as the limit of the difference quotient lim
h!0
f ðx þ hÞ – f ðxÞ h
ð1Þ
The limit is commonly denoted by f 9(x) or df =dx. Strictly speaking, the definition given above does not hold for the end points a and b, but such technicalities do not cause any trouble in the following. It is well known from elementary calculus that ordinary functions have closed-form formulas for finding the derivative of a given function. Consequently, in differentiating known function, Equation (1) is seldom needed. The geometrical interpretation of the derivative of a function is the slope of the tangential straight line at point. Thus the derivative reflects the rate of change f (x) at point x. Figure 1 shows a hypothetical peak and tangential straight lines at x ¼ 0.4 and at x ¼ 0.6. 57
58 Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
1 0.9 0.8 0.7
f(x)
0.6 0.5 0.4 0.3 0.2 0.1 0
0
0.2
0.4
0.6
0.8
1
x Figure 1 A hypothetical peak and tangential straight lines at x ¼ 0.4 and at x ¼ 0.6.
This interpretation explains the most common chemometric application of differentiation: if a spectrum contains a linear trend, then the first derivative of the trend is a constant value and the second derivative is 0. Therefore a linear baseline in a spectrum can be removed simply by differentiating the spectrum twice. Unfortunately, differentiating data is not as easy as one might think based on Equation (1). The second derivative, the derivative of f 9(x), is denoted by f 0(x) or d2 f =dx 2 . Higher order derivatives, denoted by f (n)(x) or dn f =dx n , are denoted similarly. The geometrical interpretation of the second derivative of a function is that it is related to the curvature of the function. A high positive value of the second derivative means strong upward curvature. Curvature gives us another important chemometric application of differentiation: the human eye is not very good in locating changes in curvature in a given curve. By explicitly plotting the curvature, that is the second derivative, we may greatly enhance spectral resolution as seen in Figure 2. Figure 2 shows the first and second derivatives of the peak in Figure 1, revealing the fact that the peak is actually a sum of two nearby peaks.
20
1st derivative 0.1 × 2nd derivative
15
f (n)(x), n = 1,2
10 5 0 –5 –10 –15 –20
0
0.1
0.2
0.3
0.4
0.5 x
0.6
0.7
0.8
0.9
1
Figure 2 The first and second derivatives of the peak in Figure 1. For clarity, the second derivative is divided by 10.
Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
2.04.1.1
59
Differentiation Based on Local (Discontinuous) Approximation
In chemometrics and in data analysis, in general, the problem is that the functions are given in tabular or graphical form and, consequently, we cannot evaluate the limit in Equation (1). Another problem in data analysis is the fact that the values of the functions are measurements, always contaminated with noise. We shall begin with the unrealistic case of differentiation of noiseless data. Throughout this section, we shall assume that the data consist of points ðx1 ; y1 Þ; ðx2 ; y2 Þ; . . . ; ðxn ; yn Þ, where yi ¼ f ðxi Þ and the distance between consequent points is a constant value h ¼ xiþ1 – xi , that is equidistant data. It is also assumed that h is small enough for considering the difference quotient to give a reasonable approximation of the limit of Equation (1). For sparse data, that is large h, or nonequidistant abscissas, model fitting is needed. For spectra containing only few peaks, parametric functional models may be applied, but for more complicated spectra splines or other local models, for example splines, work better (see Chapter 2.05). In most cases of equidistant data, h is just a scaling factor, which does not affect the results and can therefore set to be equal to 1. Thus, for small enough h the difference quotient f ðx þ hÞ – f ðxÞ yiþ1 – yi ¼ h xiþ1 – xi
gives a reasonable approximation to the derivative at xi, that is, f 9ðxi Þ
yiþ1 – yi xiþ1 – xi
ð2Þ
Equation (2) is known as the forward difference approximation of the first derivative. Naturally, one could use backward differences f 9ðxi Þ ðyi – yi – 1 Þ=ðxi – xi – 1 Þ as well. Note that the sequence of derivatives contains one point less than the original sequence. The right-hand side of Equation (2) is the formula for the slope of the straight line going through the points ðxi ; yi Þ and ðxiþ1 ; yiþ1 Þ. Thus, what we actually do when we use forward or backward differences is that we assume that the underlying function can be locally approximated by a straight line either in the interval ½xi ; xiþ1 corresponding to forward differences or in the interval ½xi ; xiþ1 corresponding to backward differences. It is easy to see that neither of these is a good approximation, especially for functions with high curvature. Figure 3 shows a spectrum and a zoom-out of one of the peaks. It is evident that both forward and backward differences would give a poor estimate of the slope at x ¼ 7145 (h ¼ 5).
2.5
2 1.95
2
1.9 Absorbance
1.5
1.85
1
1.8 1.75
0.5
1.7 0 –0.5
1.65 0
2000 4000 6000 8000 Wave number
1.6 7050 7100 7150 7200 7250 Wave number
Figure 3 A sample spectrum (left) and a zoom-out (right) of the peak at ca.7150.
60 Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
It should be noted that the x-axis is the sequence number of the absorbances in the original spectrum in all spectra of this section. The example spectrum has been averaged by replacing the every consequent five absorbances with their mean values. A much better approximation is obtained using a second-order approximation in which we assume that the underlying function can locally be approximated by a parabola fi ðxÞ ¼ a þ bx þ cx 2 going through points xi1, xi, and xiþ1. Further, setting the origin to xi, that is xi1 ¼ h, xi ¼ 0, and xiþ1 ¼ h, the following system of linear equations is obtained: 8 2 > < yi – 1 ¼ a – bh þ ch yi ¼ a > : yiþ1 ¼ a þ bh þ ch2
ð3Þ
The solution of Equation (3) for the coefficients is a ¼yi, b ¼ ðyiþ1 – yi – 1 Þ=2h, and c ¼ ðyiþ1 – 2yi þ yi – 1 Þ=2h 2 . Since the derivative is independent of the change of the origin and fi 9ðxÞ ¼ b þ 2cx, we get by setting x ¼ 0 f 9ðxi Þ fi 9ðxi Þ ¼ b ¼
yiþ1 – yi – 1 2h
ð4Þ
Similarly, we would get a second-order approximation for the second derivative at xi: f 9ðxi Þ 2c ¼
yiþ1 – 2yi þ yi – 1 h2
ð5Þ
If the noise in the data is negligible, then Equations (4) and (5) give good estimates for first and second derivatives. However, since all measured values are contaminated with noise, that is with measurement errors, another intelligent method has to be followed. One possible approach is to first filter (smooth) the data to get rid of the noise. If this can be successfully performed, the differentiation can be done using Equations (4) and (5) after filtering. A large number of different smoothing strategies exist. Here, we present an efficient tool, especially suitable for smoothing before differentiation, called a global method, as it uses the whole data simultaneously. Another alternative, also a global method in the previous sense, is to use splines (see Chapter 2.05). But before dealing with global approaches, let us look at better local approaches than simple differences. 2.04.1.2
Differentiation Based on Local Fitting
One approach to numerical differentiation is to apply the principle of local approximation with polynomials, in such a way that more data points are used than the parameters in the approximating polynomial. In other words, we will use local least-squares fitting. This is the idea behind the well-known Savitzky–Golay filtering (see Chapter 2.03). For more details, see, for example, the original paper by Savitzky and Golay,3 or Madden,4 or Luo et al.5 For example, we could fit second-order polynomials (parabolas) with five consequent points around xi : xi – 2h; xi – h; xi ; xi þ h; xi þ 2h. This will give us the following overdetermined set of equations for the parabola: y ¼ a þ b ðx – xi Þ þ c ðx – xi Þ2
ð6Þ
Note that with this parameterization, b is the first derivative at xi and 2c is the second derivative at xi : 8 yi–2 > > > > > y > < i –1 yi > > > yiþ1 > > > : yiþ2
¼ a – 2bh þ 4ch 2 ¼ a – bh þ ch 2 ¼a ¼ a þ bh þ ch 2 ¼ a þ 2bh þ 4ch2
ð7Þ
Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
61
Solving Equation (6) for the parameters a, b, and c, and evaluating the first and second derivatives at xi, we obtain 2yiþ2 þ yiþ1 – yi – 1 – 2yi – 2 10h
ð8aÞ
2yiþ2 – yiþ1 – 2yi – yi – 1 þ 2yi – 2 7h 2
ð8bÞ
f 9ðxi Þ ¼ f 0ðxi Þ ¼
Of course, other combinations could be used as well, for example, cubic polynomials fitted to seven consequent points, etc. Symbolic math programs, for example, wxMaxima,6 facilitate considerably the derivation of the solution: Equations (7) are just solved symbolically and the second and third coefficients are then simplified symbolically. The advantage in local smoothing is that the calculations do not require much memory if they are performed in a loop. Equations (8a) and (8b) can also be easily formulated as matrix multiplications, but in that case using sparse matrices is advisable. This matrix product is actually also a convolution, and the derivative can also be calculated by convolving the original spectrum with the weight sequence [2 1 0 1 2]/(10h) in Equation (8a). For details see, for example, Madden.4 Figure 4 shows the first derivative of the spectrum in Figure 3 differentiated using Equation (8a) and compared to derivatives obtained using central differences (Equation (5)). The zoom-out shows that the method smoothes the derivatives quite well. Figure 5 shows the second derivative of the spectrum in Figure 3 differentiated using Equation (8b) and compared to derivatives obtained using central differences (Equation (5)). According to the zoom-out, it is not quite obvious whether the degree of smoothing is sufficient or not. 2.04.1.3
Differentiation Based on Global Fitting
The idea behind global fitting is that the function values yi are measurements of an unknown function having desired smoothness properties. The most common criterion of smoothness R b is that the absolute values of the second derivatives of this function are restricted, that is, that the norm a ½f 0ðxÞ2 dx is small enough. The model is simple (global model 1): f ðxi Þ ¼
Zxi
ð9aÞ
f 9ðxÞ dx a
3
0.015
× 10–3 Central difference
Central difference Local
0.01 0.005 Absorbance
Local
2 1
0 0 –0.005 –1 –0.01 –2
–0.015 –0.02
0
5000 Wave number
10 000
–3
3500
4000 4500 Wave number
Figure 4 The first derivative of the spectrum in Figure 3 differentiated using Equation (8a) (red line) and compared to derivatives obtained using central differences (Equation (5), blue line). The zoom-out (right) shows the first derivative of same spectrum around wave number 4000.
62 Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
1.5
×10–3
1
1
Central difference
Central difference
Local
Local
0.5
0.5 Absorbance
×10–3
0 0 –0.5 –1
–0.5
–1.5 –2
0
5000 Wave number
10 000
–1 7000
7200 7400 Wave number
7600
Figure 5 The second derivative of the spectrum in Figure 3, differentiated using Equation (8b) (red line) and compared to derivatives obtained using central differences (Equation (5), blue line). The zoom-out (right) shows the second derivative of the same spectrum around wave number 7300.
that is, each value of the underlying function is the integral of its derivative. In practice, this equation has to be discretized. An alternative, having more sparsity, is to discretize even a simpler model (global model 2) f ðxi Þ ¼ y˜i
ð9bÞ
where yi ¼ y˜i þ "i , with "i being the measurement error at xi and y˜i the true value of the function at xi. Actually the model is f ðxi Þ ¼ f ðxi Þ, but it may look a little strange. In this formulation, the solution gives the smoothed measured values, not the derivatives. The derivative can then be obtained using, for example, central differences. In practice, there is not much difference in these two alternatives, but the latter one is considerably faster due to the extreme sparsity of the coefficient matrix. In matrix notation, the discretized system of equations can be put into the form Ax ¼ y, where A is the integration operator in discrete form, x is the vector of the unknown derivatives, and y is theRvector of measured b values of the function to be differentiated. The restriction can also be discretized giving a ½f 9ðxÞ2 dx Bx, where B is the derivative operator in discrete form. Examples of A and B for a five-element vector x are 2
1 0
6 61 1 6 6 A¼6 61 1 6 61 1 4 1 1
0 0 0 0 0 1 0 1 1 1 1
3
7 07 7 7 07 7; 7 07 5 1
2
–1 1
0 0 0
3
7 6 6 0 –1 1 0 07 7 6 B¼6 7 6 0 0 –1 1 07 5 4 0 0 0 –1 1
If the discretization step x has any significance, A must by multiplied and B divided by x. Usually this is not necessary, because only a value that is proportional to the derivative is needed. Now the restricted least-squares solution to the problem can be found by minimizing jjy – Axjj2 þ jjBxjj2
ð10Þ
This kind of least-squares estimation is also called generalized ridge regression or Tikhonov regularization. With B ¼ I it is called ordinary ridge regression. The parameter controls the degree of smoothness of the solution, the derivatives. It can be chosen, for example, by visual inspection or by cross-validation. The method can be easily changed to control the smoothness of derivatives of any order. It can also be generalized for two-dimensional
Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
0.015
3 Global 2 Global 1
Absorbance
0.01
1
0
0
–0.005
–1
–0.01
–2
–0.015
0
5000 Wave number
× 10–3 Global 2 Global 1
2
0.005
–3
10 000
63
3500
4000 4500 Wave number
Figure 6 The first derivative of the spectrum in Figure 4, calculated by both global models using Equations (9a) (red line) and (9b) (blue line) and compared to derivatives obtained using central differences (Equation (5)). The zoom-out (right) shows the first derivative of same spectrum around wave number 4000.
spectra, for example, given by hyphenated instruments. It is rather straightforward to show that minimizing the expression (10) with respect to the unknown derivatives leads to solving the system of equations
AT A þ BT B x ¼ AT y
ð11Þ
Figure 6 shows the derivative of the spectrum in Figure 4 calculated by both global models (Equations (9a) and (9b)). We can see that the derivatives are practically the same, but estimating model 2 was 10 times faster than model 1 in this example. Comparison with Figure 4 shows a much smoother behavior which, of course, is dependent on the choice of the value of the parameter . It is up to the user to decide the appropriate degree of smoothing. Figure 7 shows the second derivatives obtained using global model 2. Comparison with Figure 5 shows a much smoother behavior. But again, choosing the degree of smoothness is a decision of the user and it depends strongly on the nature of the application.
4
× 10–4
5
× 10–4
Global 2
Global 2
Absorbance
2
0 0 –2
–4
–6
0
5000 Wave number
10 000
–5 7000
7200 7400 Wave number
7600
Figure 7 The second derivative of the spectrum in Figure 4, calculated by both global model 2 (Equation (9b)). The zoomout (right) shows the second derivative of the same spectrum around wave number 7300.
64 Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
One possible approach to choose the best value for is cross-validation. The easiest form of cross-validation is the leave-one-out strategy for which the well-known formulas, derived already by Gauss, can be used. First a so-called hat matrix is calculated: –1 H ¼ A AT A þ BT B AT
ð12aÞ
Using the diagonal elements Hii of H, the cross-validated residuals e(i) are calculated by eðiÞ ¼
yi – yˆi 1 – Hii
ð12bÞ
where yˆi is the fitted value at xi, i.e. the estimate of y˜i, and finally the cross-validated standard error is sCV ¼
2.04.1.4
sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi Pn 2 i¼1 eðiÞ
ð12cÞ
n
Differentiation Based on Global Parametric Fitting
The methods described above are nonparametric, in the sense that the underlying functions are not given in an explicit parametric form. Parametric functions could be used for both global and local fitting. However, parametric global fitting is seldom possible for spectroscopic data, unless the spectra contain only few peaks. If the theoretic form of the peaks is known, fitting this functional form to the data gives very smooth derivative, because in this case the differentiation can be performed analytically. The parameter estimation must be carried out using iterative minimization algorithms, and it can be tedious to find out good enough initial guesses for the parameters. Figure 8 shows a parametric global fit to the two highest peaks of Figure 4. The model is a bimodal function g h of the form f ðx Þ ¼ ae bjx – x1 j þ ce d jx – x1 j þ e þ fx, where x1, x2, a, b, c, d, e, f, g, and h are global parameters to be fitted to the spectrum. Its derivative would be a rather complicated formula, though quite straightforward to calculate. Naturally, the applicability of this kind of parametric modeling is quite limited. In most cases, a local parametric approximation is a better choice. The most common local approximations are the different spline functions. With noisy data, only the smoothing spline alternatives are acceptable. The differentiation based on smoothing splines is explained in Chapter 2.05.
2.4 Data Fit
2.3 2.2 Absorbance
2.1 2 1.9 1.8 1.7 1.6 1.5 6300
6350
6400
6450 6500 Wave number
6550
Figure 8 A parametric global fit (solid line) to the two highest peaks (dots) of Figure 4.
6600
Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
65
References 1. Hoffman, J. D. Numerical Methods for Engineers and Scientists, 2nd ed.; Marcel Dekker: New York, 2001. 2. Foo-Tim, C.; Yi-Zeng, L.; Junbin, G.; Xue-Guang, S. Chemometrics, from Basics to Wavelet Transform; Wiley: New York, 2004. 3. Savitzky, A.; Golay, M. J. E. Smoothing and Differentiation of Data by Simplified Least Squares Procedure. Anal. Chem. 1964, 36, 1627–1639. 4. Madden, H. H. Comments on the Savitzky–Golay Convolution Method for Least-Squares Fit Smoothing and Differentiation of Digital Data. Anal. Chem. 1978, 50, 1383–1386. 5. Luo, J. W.; Ying, K.; He, P.; Bai, J. Properties of Savitzky–Golay Digital Differentiators. Digital Signal Processing 2005, 15, 122–136. 6. http://wxmaxima.sourceforge.net
66 Denoising and Signal-to-Noise Ratio Enhancement: Derivatives
Biographical Sketch
Veli-Matti Taavitsainen, born on 16 August 1953, graduated from the University of Helsinki in 1978, with a major in mathematics. During 1978–81 he worked as a statistician in the Finnish Geodetic Institute, and also took part in measurement expeditions, for example, measuring postglacial land uplift and continental plate movements. From 1981 to 1995 he worked in a chemical company, Kemira, where he assisted chemists and chemical engineers in statistical analyses, design of experiments, and mathematical modeling. The interest toward chemometrics was stimulated by a lecture presented by Svante Wold. His studies focus on fields such as nonlinear PLS, constrained regression, and combining soft and hard models. His licentiate of philosophy thesis (at the University of Oulu) deals with (hard) reactor modeling and his Ph.D. thesis (at Lappeenranta University of Technology) focuses on combining soft and hard models. In 1995 he moved to EVTEK University of Applied Sciences, presently part of the Helsinki Metropolia University of Applied Sciences. He has also worked as a consultant in several Finnish chemical companies. Veli-Matti Taavitsainen is an active member of the Finnish Chemometric Society, and is the present chairman of the society. His hobbies include badminton, orienteering, trekking, gardening, saunas, and playing flute and saxophones. His family includes wife, Hilkka and two sons, Mikko and Juho.