8 Peak detection " 'You may seek it with thimbles—and seek it with care— You may hunt it with forks and hope; You may threaten its life with a railway-share; You may charm it with smiles and soap—'" (Lewis Carroll, The Hunting of the Snark)
The detection of peaks in a chromatogram is crucial for both qualitative and quantitative analyses, for the amount of information increases as more peaks are detected. Peak overlap and baseline noise, however, make the detection of peaks rather cimibersome. The location and intensity of the peaks are best estimated when the exact peak shape model and peak width are a priori known. In other instances a false peak may be detected when baseline noise might be taken for a minor peak, or a peak may be lost when the occurrence of overlapping is not recognized. Most routinely used detection methods do not employ any assimaption for peak shapes or baseline noise. Most often the derivatives of the signal are analyzed and a peak is detected when a threshold is exceeded. All the information these peak detection methods use is that peak is "a signal that goes up and comes down."
8.1 Matched filters Matched filtering is a very sophisticated means for peak detection in a noisy baseline. When white noise is superposed onto the signal, matched filters provide the optimima peak detection. Matched filtering consists in the calculation of the convolution of the instrumental signal with the reverse of the standardized peak shape model. When a peak of proper shape is detected, the fQter answers with resonance, therefore the maxima of filter output correspond to the expected peak locations. The use of matched fQters is very infrequent in analytical chemistry. Matched filter has been used for peak detection in spectra [1] and chromatograms [2]. Some recent applications for noise fQtering have also been published [3-6].
184
Peak detection
Brouwer and Jansen [1] have established the use of matched filters for peak detection in spectra by using the derivatives of the signal and the peak model. As a starting point, let us asstime that we have a peak shape model fit) that we try to fit by a least-squares method to a signal x{t) containing a single peak. Instead of the measured signal and the fitted model, however, their first derivatives are compared. The calculation of the first derivatives makes the method insensitive to linear baseline drift. The minimum of the following expression is searched /•OO
[x'{t)-af'{t
+ T)fdt
(8.1)
J—00
The above expression is minimized by changing the peak height a and the peak location T. The differentiation with respect to parameters T and a yields /•OO
/S(T)=
/-OO
x'(t)/"(t + T)dt= J —00
Xit)f"'{t
+ T)dt
X{t)f"(t
+ T)dt
(8.2)
J —00
/-OO
/-OO
X'{t)f'(t
+ T)dt
«(T) = ^ ^
= ^ ^
(8.3)
[f'(t)fdt
[f'it)fdt
J —00
J —CO
The mminmin—i.e. the best values for the parameters T and ex—is found when )S(T) = 0. At that value of T is the peak located and the peak intensity is of(T) as calculated by Equation 8.3. The cross-correlation functions of x(t) and f(t) represent the calculation of pseudo-derivatives of the signal. If we assume that both the measured and the model peaks are Gaussians with fit)
= -j^e-^'i^'^'
(8.4)
V27T(J
Equations 8.2 and 8.3 result in /•OO
f{t)f"'{t
J —( >
+ r)dt = ^"'^'^'^^
J{t)f"{t^T)dt
(8.5)
4
= ^"^^^^
(8.6)
The factor ^/2 on the right-hand side of both equations expresses the amount of broadening due to the correlational differentiation. The differentiation via cross-correlation provides a smoothing of noisy signals. The calculations in Equations 8.2 and 8.3 combine calculation of the cross-correlation of the signal and the derivatives of the peak shape model. In Figure 8.1, the use of the matched filter is demonstrated. When the sign of iS(T) changes from negative to positive, a peak location is assigned to that point, and the intensity of that peak is determined as the value of ex(T) at that location. Both the location and
8,2 Peak detection by derivatives
185
S.i Peak detection by matched filter. The peak locations are determined where ^(r) = 0. The estimated peak intensities are CX(T). The heights of the arrows are equal to the estimated intensities by «(T). FIGURE
the intensity of the stand-alone peak in Figure 8.1 are correctly determined by the matched filter. A false peak is detected by iS(T) in the middle of the signal, but air) is negative at that point, accordingly that false detection is rejected. The filter clearly recognizes the fact of peak overlap, and determines the location and the intensity of both peaks. The arrows in the figure help to judge the error of the peak height estimation.
8.2 Peak detection by derivatives The differentiation of the analytical signals enhances the changes within the signal, so derivatives help to detect peak locations. Routine integrators use the first derivative of the chromatogram for peak detection, as the first derivative gives the slope of the signal at each point. The first derivative is calculated at each point of the signal, and the staring point of the peak is detected when the first derivative has reached a predefined threshold. The peak maximum (apex)—and also valley between partially resolved peaks—is then found where the first derivative is zero, and the end of the peak is foimd where the first
186
Peak detection
8.2 Detection of the start and the end of peaks. (Top) a chromatogram with nonlinear baseline is plotted; (Middle) the first derivative of the chromatogram; (Bottom) the second derivative of the chromatogram. The thresholds for derivatives and the integration limits (dotted lines). A course estimation of the integration limits is given by comparing the first derivative of the chromatogram to the threshold. Those points are then refined using the second derivative.
FIGURE
derivative drops belovs^ the threshold and the signal returns to baseline. In order to avoid the detection of a false peak due to abrupt changes caused by the baseline noise, usually a minimum peak width is also predefined and the detected peak is accepted only if the difference between the end and the start of the peak exceeds this threshold. Alternatively, minimimi peak area, minimimi peak height can be set as parameters to confirm that a detected signal is a peak. Some integrators apply a two-step peak finding algorithm [7,8]. First a coarse estimation of peak start is detected when the first derivative exceeds the threshold for two or more consecutive points. Then, the second derivative of the signal is analyzed backward, and the starting point of the peak is assigned to the point at which the second derivative falls below a threshold. The location of apices and valleys can also be fine-tuned by the two-step method.
8,2 Peak detection by derivatives
187
The simplest method to calculate the first derivative of a chromatogram or any other digitized signal is to calculate the difference ^ , ^ X f c i L Z ^ or x ; = ^^^-^-f^ '^ At '^ At The simplest formula for the second derivative is
(8.7)
Unfortunately, the baseline noise is strongly amplified during the numerical calculation of the derivatives. Therefore, other manners are required that combine the differentiation with smoothing. As it has been discussed in Chapter 7, the Savitzky-Golay polynomial filter can easily be used for the calculation of the derivatives of any desired order. Alternatively, the derivative theorem of Fourier transform (see Equation 2.108 on page 32) can also be exploited, and the multipUcation of the Fourier-transformed signal by the appropriate polynomial—followed by the inverse Fourier transformation—yields the derivative of the signal. It must be remembered, however, that the differentiation must be accompanied by a Fourier-domain smoothing to cut off the high-frequency domain of the spectrum, otherwise the intensified highfrequency part that is due to presence of noise will destroy the signal. A Gaussian peak and its derivatives are plotted in Figure 8.3. At the location of the peak maximum, the odd-ordered derivatives are zero, the even-ordered ones have an extremum; the second derivative has a minimum, whereas the fourth derivative has a maximimi. Therefore, if the smoothed derivatives of the signal are calculated, the peak locations can be identified in a noisy signal. The fourth derivative peak is narrower than the second derivative peak, so the fourth derivative is less influenced by adjacent overlapping peaks, but at the same time, the fourth derivative is much more sensitive to the presence of noise. For this reason the following rule is more safely followed when the derivatives are used for peak finding • Calculate the smoothed first and second derivatives of the chromatogram, either by a Savitzky-Golay polynomial or in Fourier domain. • At the location of the peak maximum, the first smoothed derivative alters its sign from negative to positive. The second smoothed derivative of the signal has a maximima at the peak maximum. The resolution power of the second derivative is illustrated in Figure 8.4. The noisy chromatogram of two overlapping peaks was analyzed. The smooth second derivative was calculated by means of frequency-domain differentiation. The chromatogram was Fourier-transformed, and the cut-off frequency was determined by Lanczos' method [9] given by Equation 7.40 on page 173. Then the complex Fourier-transformed signal was multiplied by (ico)'^ = -co^ as determined by Equation 2.108 on page 32, and the modified Tukey smoothing window (see Equation 7.39 on page 170) was used to smooth the Fourier
Peak detection
188
Gaussian
FIGURE
8.3 Gaussian peak and its first four derivatives.
spectrum to zero. Finally, the smoothed second derivative vs^as obtained by an inverse Fourier transform. The smoothness of the second derivative despite of the noise level of the original chromatogram is remarkable. Grushka et. al have examined the use of second derivatives for peak finding [10,11]. When two peaks overlap—as it is shown in Figure 8.4—the second derivative has five extrema: three maxima and two minima. Grushka et. al suggest that the integration limit between the two peaks should be the second maximimi. They found that as long as five extrema exist in the second derivative, this method is insensitive to separation between the peaks since a linear cahbration curve is obtained when the relative mixture composition is changed. The method fails when peak overlap is so severe that the nimiber of extrema reduces to three. If the signal-to-noise ratio is good, the method based on the second derivatives works well. However, in the case of noisy signals a peak height ratio higher than 0.3 is needed for the recognition of peak overlap. Excoffier and Guiochon [12] have proposed a slight modification for the calculation of the derivatives. From Figure 8.2 it is obvious that the flat and broad peaks detected at long retention times might escape detection or they are detected with significant area loss especially at low signal-to-noise ratios, since the derivatives of such peaks do not always pass the threshold. They introduced a pseudo derivative that corrects for peak broadening. The pseudo
Bibliography
FIGURE 8.4 Detection
189
of peak maxima by the smoothed second derivative
of the signal.
derivative of the signal is calculated as Xp^
X\L)
X(t-Wi/2)
(8.9)
v^here 11/1/2 is the peak half width. In most cases, there is a linear relationship between peak width and retention time, therefore the peak half width can be estimated as ^^1/2 = ^^1/2,0 + ^t (8.10) where 1^1/2,0 is the half width of an unretained peak and a is a coefficient. By this approach, the derivative calculation involves a wider time span for late peaks, which enhances their derivatives.
Bibliography [1] BROUWER, G.; JANSEN, J. A. J., Deconvolution Method for Identification of Peaks in Digitized Spectra, Anal. Chem. 1973, 45, 2239-2247. [2] VAN RijswiCK, M. H., Adaptive Program for High Precision Off-line Processing of Chromatograms, Chromatographia 1974, 7,491-501.
190
Peak detection
[3] VAN DEN HEUVEL, E. J.; VAN MALSSEN, K. F.; SMIT, H. C , Optimal Estimation of Intensity of Noisy Peaks by Matched Filtering with Application to Chromatography: Part 1. General Introduction and Theoretical Evaluations, Anal Chim. Acta 1990. 235, 343-353. [4] VAN DEN HEUVEL, E. J.; VAN MALSSEN, K. F.; SMIT, H. C , Optimal Estimation of Intensity of Noisy Peaks by Matched Filtering with Application to Chromatography: Part 2. Program Description and Simulation Experiments, Anal. Chim. Acta 1990,235,355-365. [5] VAN DEN BOGAERT, B.; BOELENS, H. F. M.; SMIT, H. C , Evaluation and Correction of Signal Model Errors in a Matched Filter for the Quantification of Chromatographic Data, Anal Chim. Acta 1993, 274, 71-85. [6] VAN DEN BOGAERT, B.; BOELENS, H. F. M.; SMIT, H. C , Quantification of Chromatographic Data Using a Matched Filter: Robustness Towards Noise Model Errors, Anal. Chim. Acta 1993, 274, 87-97. [7]
Chromatographic Integration Methods, Royal Society of Chemistry: Cambridge, 1990. [8] MiUipore Corporation, Milford, MA "Maxima 820 Chromatography Workstation Reference Manual", 1991.
[9] [10]
DYSON, N.
Applied Analysis, Dover Publications, Inc.: New York, 1988. GRUSHKA, E.; ATAMNA, I., The Use of Derivatives for Establishing Integration Limits of Chromatographic Peaks, Chromatographia 1987, 24, 226-232. [11] GRUSHKA, E.; ISRAEU, D., Characterization of Overlapped Chromatographic Peaks by Their Second Derivative: the Limit of the Method, Anal. Chem. 1990, 62, 717721. [12] EXCOFFIER, J. L; GuiocHON, G., Automatic Peak Detection in Chromatography, Chromatographia 1982,15, 543-545. LANCZOS, C.