Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137
Contents lists available at ScienceDirect
Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemometrics
Diffusion enhancement model and its application in peak detection Jun Li b, Yuanlu Li a, b, *, Weijing Zhao b, Min Jiang b a
Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing, 210044, China b School of Automation, Nanjing University of Information Science & Technology, Nanjing, 210044, China
A R T I C L E I N F O
A B S T R A C T
Keywords: Smoothing Nonlinear diffusion Fractional diffusion Diffusion enhancement Peak detection
It is a challenge for peak detection algorithms to detect some low-amplitude peaks and overlapped peaks contaminated by noise. Among of existing peak detection algorithms, the continuous wavelet transform (CWT)based algorithm is the best. When the Mexican Hat wavelet is selected as the mother wavelet, the CWT of a signal is essentially equivalent to using the Gaussian function to smooth the 2nd derivative of the signal. Therefore, a natural idea is to combine the peak enhancement step and peak-preserving diffusion into peak detection process to improve the performance of peak detection. In the proposed algorithm, the Gaussian smoothing in the CWTbased algorithm is replaced with the peak-preserving diffusion filtering. As an assessment of the proposed algorithm, a simulated spectrum with low-amplitude peaks and overlapped peaks was generated and used to test the enhancement performance. Then 100 groups of simulated proteomics data sets in [1] were used to assess the proposed algorithm. In these data sets, the true peaks are known in each spectrum. Thus, the false discovery rate (FDR) is easy to find. Five typical peak detection programs were chosen to compare the proposed algorithm. The FDR and sensitivity is employed to compare the performance of these algorithms. Result shows that the proposed algorithm can improve the performance of peak detection.
1. Introduction Peak detection is one of the important steps in spectrometry analysis [1–3]. It has been applied in many aspects including disease diagnosis, water quality analysis, and chemical mixture identification [4–6]. The performance of peak detection will directly affect the subsequent analysis. Therefore some peak detection programs have been developed for this purpose [7,8]. For example, PROcess [9], LMS [10], LIMPIC [11], Cromwell [12], CWT [13], etc. Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis has been included in Ref. [1]. The framework of the peak detection usually include following steps: baseline correction, smoothing and peak picking [1]. Similarities and differences among these existing peak detection algorithms lie in the methods used in each step of peak detection process. At present, there have been some methods for the baseline correction, such as monotone minimum [14], linear interpolation [15] and wavelet transforms [16]; Peak picking methods include intensity threshold [17], first derivative [18], second derivative [19] and local maximum [20]. The smoothing step is the most important one in peak detection. There have been many smoothing methods. Among of them, the simplest
one is the moving average filtering, it averages adjacent points instead of the initial center point, but it will distort the peak of signal seriously [21]. The Savitzky-Golay smoothing method is a widely used method for spectra smoothing [22,23]. It uses polynomials to fit data points in a slide window and takes the center point of the fitting data as the smoothing result. Compared with moving average filtering, the Savitzky-Golay smoothing method has a better effect on peak protection [24]. The Gaussian smoothing method is another improved method of moving average filtering [25], it takes the Gaussian function as a smoothing function. Another kind of smoothing method is to discard the high-frequency components in the frequency domain and then reconstruct the signal to get the smoothing signal [26]. Wavelet method is the most used frequency smoothing method [12,27–29]. Smoothing are unrecoverable, that’s to say, if a real peak is removed during smoothing step, this peak can never be recovered in the subsequent analysis. Therefore, a major challenge is the peak-preserving smoothing in peak detection. It was reported that the nonlinear diffusion filtering has capability of peak-preserving smoothing [30]. The diffusion filtering can be traced back to the 1980s. At that time Witkin and his colleagues found that the solution of the homogeneous linear diffusion equation is equivalent to the convolution of the initial signal
* Corresponding author. School of Automation, Nanjing University of Information Science & Technology, Nanjing, 210044, China. E-mail address:
[email protected] (Y. Li). https://doi.org/10.1016/j.chemolab.2019.04.012 Received 20 November 2018; Received in revised form 12 April 2019; Accepted 23 April 2019 Available online 24 April 2019 0169-7439/© 2019 Elsevier B.V. All rights reserved.
J. Li et al.
Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137
Baseline correction
Signal
Signal enhancement
Peak picking
Smoothing
Peak list
Fig. 1. A framework for the peak detection.
smoothing algorithm of the enhanced signal. The cubic spline interpolation is selected as the baseline correction method, the local maximum and amplitude method are selected as the peak picking method. As an assessment of the proposed algorithm, a simulated signal with lowabundance peaks and overlapped peaks was generated and used to test the performance of the proposed algorithm. Then four simulated proteomics data sets in Ref. [1] were used to assess the proposed algorithm. In these data sets, the true peaks are known in each spectrum. Thus, the false discovery rate (FDR) is easy to find. Five typical peak detection algorithms were chosen to compare the proposed algorithm. The FDR and sensitivity is taken to compare the performance of these algorithms.
with Gaussian function at each scale [31]. Because homogeneous linear diffusion filtering may blur the edge of a signal and destroy the peak of a signal, Perona and Malik proposed the nonlinear diffusion model in 1990 [32]. This model can not only remove the noise better, but also protect the signal characteristics such as edges and peaks. Recently, some improved diffusion models have been proposed [33–35]. For example, the time-fractional diffusion model [33], time fractional super-diffusion model [34], spatial-fractional order diffusion model [35]. It is a challenge for peak detection algorithms to detect some lowamplitude peaks and overlapped peaks contaminated by noise. Among of these existing peak detection algorithms, the continuous wavelet transform (CWT)-based algorithm is superior to other ones [1]. The reason for that is the CWT can enhance the low-abundance peaks and improve the resolution of the overlapped peaks. In the CWT-based algorithm, the Mexican Hat wavelet is usually selected as the mother wavelet, which is proportional to the 2nd derivative of the Gaussian function. Thus, the CWT of a signal is essentially equivalent to using the Gaussian function to smooth the 2nd derivative of the signal. The essential reason that CWT provides the best performance lies in using the 2nd derivative to enhance the signal. Therefore, a new framework for the peak detection is proposed. It is shown in Fig. 1. Comparison with the existing framework for the peak detection, in the new framework, the signal enhancement step is added between the baseline correction step and the smoothing step. For the smoothing step, Gaussian smoothing using the CWT-based peak detection algorithm is replaced with the time-space diffusion filtering, which is a peak-preserving smoothing algorithm. In this paper, the time fractional super-diffusion model is used as the
2. Diffusion enhancement model The peak-enhanced signal is obtained by the weighted sum of the original spectra and the negative of its second derivative, it is given by Ref. [36]: f enðxÞ ¼ f ðxÞ cf ðnÞ ðxÞ
(1)
where f is the initial signal, f ðnÞ is the nth order derivative, f en is the enhanced signal, usually n ¼ 2c is the enhancement coefficient. The diffusion enhancement model is described by 8 α ∂ uðx; tÞ ∂2 uðx; tÞ > > ¼ g½uðx; tÞ ; > α > > ∂ t ∂x2 <
1<α<2
∂uðx; 0Þ > ¼ 0; uðx; 0Þ ¼ f ðxÞ c f ð2Þ ðxÞ; > > ∂t > > : uð0; tÞ ¼ uðL; tÞ ¼ 0; 0 < t < T:
1.5
(2)
0 < x < L;
Where uðx; 0Þ is the initial enhanced signal, uðx; tÞ is the smoothed signal
1
1
Real signal 0.5
0
0
Intensity
Intensity
0.5
Noisy signal
-0.5
-0.5 -1
-1 -1.5
-1.5
0
100
200
300
400
500
Diffusion strength Real signal 0
x
100
200
300
400
x
Fig. 2. Noisy simulated signal.
Fig. 3. The real signal and diffusion strength. 131
500
J. Li et al.
Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137
U 1 ¼ A0 U 0 k X ðαÞ ðαÞ U k ¼ Ak1 U k1 bj bj1 U kjþ1 U kj ; k 2
at the instant of time t, g½uðx; tÞ is the diffusion function, which is used to α
control the diffusion process, ∂ uðx;tÞ ∂t α is the Caputo fractional derivative of uðx; tÞ and αis the order of time-fractional derivative. This means that the solution of the diffusion equation is a smoothed signal when the signal enhanced by the second derivative is taken as the initial value of the diffusion equation. In practice, data is discrete. Therefore, the finite difference scheme of Eq.(2) is given as following:
where k is the number of iterations, ðαÞ
bj
2α
¼ j2α ðj 1Þ 2
1.4 1.3 Intensity
1.1 1 1
340 350 360 x signal Real
¼ In matrix ½Ak1 , βk1 i
Automatic 0.6
=2
0.4
0.2
0
100
200
300
400
500
x Fig. 4. Performance evaluation of automatic λ method.
1.5
⋱
βk1 M2
g ik1
ωα;τ ,
⋱ 1 2βk1 M2 βk1 M1
βk1 M2 1 βk1 M1
7 7 k 7; U 7 5
α
τ where ωα;τ ¼ Γð3 αÞ and τ is the time step
1.5
1
1
0.5
a
0.5
d
0 0
-0.5
b
Intensity
Intensity
3 βk1 2
size, usually it should meet τ 0:1 for algorithm stability, g ki denotes the value of the diffusion function of the kth iteration at the ith position of the sequence, i ¼ 1; 2; ⋯M 1, M is the length of the sequence. The pseudo codes for diffusion enhancement model are given as following: Algorithm 1. Diffusion enhancement algorithm Input: Initial signal:f ðxÞ, in practice, it is a column vector The enhancement coefficient: c The order of time-fractional derivative: α Number of iterations: N Diffusion threshold: λ Time step size: τ Output: Smoothed signal: U N , result after N times diffusion 1: Construct the enhanced signal U 0 , it is a column vector; 2. Calculate the Caputo’ derivative coefficient ωα;τ ;
=0.5
0
βk1 1 1 2βk1 2 ⋱
3 uk1 6 uk 7 2 7 ¼6 4 ⋮ 5 ukM1
0.4
0.8
1 βk1 1 βk1 2
(4)
2
0.45
170 180 190 Intenisty
6 k1 6 ¼6 A 6 4
0.5
1.2 1.2
(3)
j¼2
-1 -1.5
-0.5
e
-1
-2 -1.5
f
-2.5 -2
-3
c -3.5
0
100
200
300
400
-2.5
500
x
0
100
200
300
400
500
x
(a) Simulation signals
(b) Original signal and smoothed signals
Fig. 5. Original signal, noisy signals, enhanced signal and its smoothed signals. (a) The original signal, (b) the noisy signal corresponding to the original signal (20 dB), (c) the enhanced signal of the noisy signal, (d) the true peaks marked with asterisk, (e) the smoothed signal of the noisy signal, one can found an overlapped peak cannot be detected because of smoothing operation. (f) The smoothed signal of the enhanced signal, the peaks become more obvious. 132
J. Li et al.
Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137
Initial signal
Intensity(a.u.)
6000 4000 2000 0 500
1000
1500
2000
1000
1500
2000
2500 3000 3500 m/z Corrected signal
4000
4500
5000
4000
4500
5000
Intensity(a.u.)
6000 4000 2000 0 500
2500 3000 m/z
3500
Fig. 6. Baseline correction.
"
4: Construct the first diffusion smoothing matrix A0 ; 5: Calculate the first diffusion result U 1 ¼ ½A0 U 0 ; 6: for k ¼ 2⋯N do "
3: Calculate the diffusion function of the first iteration g ¼ exp 0
2 # U0 ; λ
7: Calculate the t-th diffusion function g k ¼ exp
Intensity(a.u.)
8000 6000
11: end for 12: Calculate the k-th diffusion result: U k ¼ ½Ak1 U k1 sum U; 13: end for When the matrix exceeds the upper limit of storage, one can use
4000 2000
1000
1500
2000
2500 3000 3500 m/z (b) Enhanced signal
4000
4500
5000
10000
Intensity(a.u.)
10000
Enhanced signal Corrected signal Smoothed signal
5000 0 -5,000
5000 1000
1500
2000
2500 3000 3500 m/z (c) Smoothed signal
4000
4500
Intensity(a.u.)
500
5000
8000 Intensity(a.u.)
;
8: Construct the smoothing matrix Ak1 ; 9: for j ¼ 2⋯N do P ðαÞ ðαÞ 10: Calculate sum U ¼ kj¼2 ðbj bj1 ÞðU kjþ1 U kj Þ;
(a) Corrected signal
0 500
2 #
U k1 λ
0
6000 4000 2000 0 500
1000
1500
2000
2500 3000 m/z
3500
4000
4500
-5000 2050
5000
Fig. 7. Enhanced signal and its smoothed result.
2100
2150
2200
2250 m/z
2300
2350
Fig. 8. The partial enlargement of Fig. 7. 133
2400
2450
J. Li et al.
Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137
Fig. 2 is a noisy simulated signal. Fig. 3 shows the diffusion function for signal shown in Fig. 2. One easily found the diffusion is weaker at the peaks than other positions. Due to the difference in characteristics of different signals, it is difficult to give λ by experience. Thus, this paper provides an automatic value method. λ can be determined by the diffusion strength on the highest peak. If one sets the diffusion strength g be 0.05 on the highest peak, then λðtk Þ can be computed by
convolution to calculate the product of matrix and vector. 3. Data and evaluation criteria 3.1. Data Hundreds of proteomics data sets can be available from supplements of [1]. This data set has 100 groups of data and each group has 100 spectra. Here, we use them to compare the results of the proposed algorithm with that of other processing algorithms.
λðtk Þ ¼
maxðuðx; tk1 ÞÞ minðuðx; tk1 ÞÞ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ; lnð0:05Þ
(22)
Of course, you can select another diffusion strength values between 0 and 1, such as 0.01, 0.1, 0.5, etc. In order to illustrate the effect of automatic λ method, its smoothing result is compared with the results of λ ¼ 0.5 and λ ¼ 2, it is as shown in Fig. 4. It can be seen from Fig. 4, when λ is 0.5, the peak-preserving
3.2. Evaluation criteria The false discovery rate (FDR) and sensitivity is used to measure the performance of the algorithms. The FDR is defined as a ratio of the number of falsely identified peaks to the total number of peaks found by algorithms. Sensitivity is defined as the number of correctly identified peaks divided by the total number of true peaks [1]. It is difficult for different algorithms to produce the same FDR, therefore, we divide FDR into small segments, such as [0, 0.1]. When the FDR of different algorithms fall in [0, 0.1], they can be considered to have the same FDR, then we compare their sensitivity, the higher the sensitivity, the better the algorithm performance.
5000
(a) Corrected signal 0
4. Results and discussion
(b) =500 4.1. Peak-preserving diffusion Intensity(a.u.)
-5000
The difference between the nonlinear diffusion model and the linear diffusion model lies in the diffusion function. If the diffusion function is constant, it is the linear diffusion model, which is equivalent to the common Gaussian smoothing. The nonlinear diffusion model takes the smoothed spectra to design the diffusion function. For example, the diffusion function can be designed as 2
uðx; tk1 Þ minðuðx; tk1 Þ g½uðx; tk Þ ¼ exp λ
(c) =1000 -10000
(d) Automatic (21) -15000
where λ is the threshold for controlling the diffusion strength. One can see that g½uðx; tk Þ is between 0 and 1. For a fixed λ, the bigger value of uðx;tk1 Þ, the smaller g½uðx;tk Þ. Thus, the diffusion strength is weak at the peak position. As a result, smoothing of the peaks is slight. So, the nonlinear diffusion model has capability of peak-preserving smoothing. In fact, when the parameter λ is big enough, the nonlinear diffusion will turn into a linear diffusion, that is to say, it is the Gaussian smoothing. In practice, the parameter λ is given according to experience.
(e) =2000 -20000 3600
3800
4000
4200 m/z
4400
Fig. 10. Comparison of smoothing results for different λ.
Intensity(a.u.)
8000
6000
4000
2000
0 500
1000
1500
2000
4600
2500 3000 m/z
Fig. 9. Detected peaks. 134
3500
4000
4500
5000
4800
J. Li et al.
Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137
Table 1 Sensitivity of different models. LIMPIC
Cromwell
CWT
[0,0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5]
0.7493 0.8659 0.9047 0.9423 0.9558 0.6601 0.8880 0.9260 0.9612 0.9742 0.6204 0.8272 0.8762 0.9225 0.9521 0.6916 0.8609 0.9291 0.9543 0.9684
0.2418 0.3023 0.3563 0.5103 0.5593 0.2242 0.5079 0.5003 0.5240 0.5273 0.2042 0.2370 0.2680 0.4106 0.4343 0.1999 0.2371 0.3039 0.2951 0.3330
0.0219 0.0882 0.0620 0.0658 0.0880 0.0189 0.0703 0.0564 0.0587 0.0973 0.0167 0.0709 0.0552 0.0484 0.0857 0.0151 0.0822 0.0529 0.0561 0.0784
0.0000 0.0000 0.7697 0.6899 0.7408 0.4967 0.7449 0.7391 0.6679 0.6476 0.4911 0.6163 0.6619 0.6270 0.7215 0.0000 0.0000 0.0000 0.5976 0.6561
0.0000 0.0000 0.2318 0.3641 0.5458 0.0000 0.0000 0.2163 0.3851 0.5269 0.0000 0.0000 0.2504 0.5227 0.5473 0.0000 0.0000 0.2808 0.2224 0.5637
0.5276 0.6554 0.9240 0.9698 0.9558 0.4949 0.8853 0.9340 0.9481 0.9922 0.4277 0.5149 0.8069 0.8986 0.9500 0.4220 0.6299 0.6899 0.9549 0.9592
1
1
0.9
0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5 0.4
0.4 0.3
0.2
0.2
0.1
0.1 0.1
0.2
0.3
0
0.4
0.2
0.3 FDR
(a) Dataset1
(b) Dataset2
1
1 0.9
0.8
0.8
0.7
0.7
0.6
0.6
0.5 0.4
0.4 0.3
0.2
0.2
0.1
0.1 0.2
0.3
0
0.4
0.1
0.2
0.3
FDR
FDR
(c) Dataset3
(d) Dataset4
5 4 TFDEM 4.5 PROcess 3.5
3 LMS
2.5
2 1 LIMPIC 1.5 Cromwell 0.5
Fig. 11. Sensitivity of different algorithms.
135
0.4
0.5
0.3
0.1
0.1
FDR
0.9
0
5
0.5
0.3
0
6
sensitivity
Dataset4
LMS
sensitivity
Dataset3
PROcess
sensitivity
Dataset2
TFDEM
sensitivity
Dataset1
FDR
0.4
CWT
J. Li et al.
Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137
smoothing performance is the best, but when λ is adjusted to 2, the performance will be much worse. Although the performance of the automatic λ method is not the best, but it cannot too bad. Therefore, for unfamiliar signals, the automatic λ method is a better choice. Comparison with the CWT-based algorithm, the proposed algorithm replaces the Gaussian smoothing with the time fractional diffusion smoothing, which is a peak-preserving smoothing algorithm.
adjust the detection width to achieve different FDR segments, and record the sensitivity in each FDR segment. This paper selects 4 groups of simulation data, each group of data includes 100 spectra. The sensitivity of 100 groups of data in the same FDR segment is averaged. The results of the proposed algorithm and that of the existing programs are shown in Table 1. Taking the median value of each FDR segment as abscissa, the sensitivity in Table 1 are represented in Fig. 11. From Table 1 and Fig. 11, one can find that when FDR is less than 0.2, the sensitivity of the proposed algorithm is obviously higher than the traditional algorithms; when FDR is more than 0.2, the sensitivity of the proposed model is as good as the CWT-based algorithm. To further reveal the reason in Table 1, we randomly selected one result and enlarge it. The result shows in Fig. 12, where the red vertical line shows the true peak positions. As can be seen from Fig. 12, in addition to proposed algorithm and CWT-based algorithm, other algorithms have difficulty distinguishing overlapping peaks or destroy low-abundance peaks, such as the peaks at m/z ¼ 6600, m/z ¼ 6680, m/z ¼ 7050, these problems can decrease the sensitivity of peak detection. Same as the proposed algorithm, CWTbased algorithm can distinguish overlapped peaks and enhance lowabundance peaks, however, its noise level is higher than the proposed algorithm, which will result in a higher FDR. The excellent performance is due to signal enhancement and peakpreserving smoothing. In order to reveal this reason, the enhancement step has added in PROcess, LMS, LIMPIC and Cromwell. The results are presented in Table 2, compared with Table 1, one can clearly see that the sensitivity with enhancement step are higher those without enhancement step. However, the results are not as good as that of the proposed algorithm. It shows peak-preserving smoothing is also important to improve the performance of peak detection.
4.2. The performance of enhancement In this paper, an enhancement step was inserted before smoothing step as shown in Fig. 1. As an assessment of the proposed algorithm, a simulated signal with low amplitudes and overlapped peaks was generated and used to test the performance of the proposed algorithm. The result is shown in Fig. 5, from which one can found the enhancement step can help to improve peak resolution. 4.3. Peak detection process This data set has 100 groups of data and each group has 100 spectra. The true peaks are known in each spectrum. These peak lists are used as samples in our experiment. Because different signal have different optimal parameters when performing peak detection. Therefore, we select the same baseline correction and smoothing parameters firstly, then adjust the peak picking parameters and record the sensitivity of the proposed algorithm in different FDR segments. The peak detection follows the framework in Fig. 1. Fig. 6, Fig. 7 Fig. 8 and Fig. 9, show the process of the peak detection. The cubic spline interpolation was used as the method for baseline correction. Fig. 6 presents the baseline correction result of a simulated mass spectrum. The enhanced signal (Fig. 7 b) is obtained by Eq. (2) after baseline correction. The value of enhancement coefficient c can adjust according to the requirements of FDR and sensitivity. Here, c was set as 100. Then the enhanced signal is smoothed by time fractional diffusion model (TFDEM), where α is 1.15, τ is 0.25, λ is 1000. The smoothing result for the 100th iteration is shown in Fig. 7 c. Fig. 8 is the partial enlargement of Fig. 7 b and Fig. 7 c. The last step is peak picking. In this paper, we combine the amplitude method with the local maximum method to pick peak. Firstly, we set the amplitude threshold as 400, which obtained through trial and error. Secondly, the detection width of the local maximum is determined by the m/z value. Usually, 0.05%–1% of the m/z value is used as the detection width. The larger the m/z value, the larger the detection width. The result of peak picking is shown in Fig. 9. The circle denotes the detected peaks.
5. Conclusion A new framework for peak detection was proposed through analysis of the CWT-based peak detection algorithm. The signal enhancement
Corrected signal 0
TFDM
4.4. The effect of parameters on the results
CWT
-5000
Intensity(a.u.)
In the previous experiment, λ was given as 1000. In order to describe the effect of λ on the result, a piece of mass spectrum is selected as a sample, and λ ¼ 500, λ ¼ 1000, λ ¼ 2000 and automatic λ are selected to smooth the signal. The result is shown in Fig. 10, where the red vertical line shows the true peak positions. Where, (b) still retains much noise, which may cause the higher FDR. (e) has been excessively smoothing, such as weak peaks at m/z ¼ 3700 and m/z ¼ 4650 has been destroyed, which lead to a decrease in sensitivity. The performance of (c) is the best, thus, the selection of λ has a greater effect on the peak detection result. In addition, (d) selects automatic λ method, which performance is slightly worse than (c), but better than (b) and (e). As can be seen, if having no experience in the selection of λ, using automatic λ method is a good choice.
PROcess -10000
LMS
LIMPIC -15000
Cromwell
4.5. Comparison of detection results
-20000 6500
6600
6700
6800
6900
7000
7100
7200
m/z
We keep the same baseline correction and smoothing parameters. In addition, we set the fixed amplitude threshold in peak picking step, only
Fig. 12. Comparison of the proposed algorithm and other algorithms. 136
J. Li et al.
Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137 [9] X. Li, R. Gentleman, X. Lu, Q. Shi, J.D. Iglehart, L. Harris, A. Miron, SELDI-TOF Mass Spectrometry Protein Data, Springer, New York, 2005. [10] Y. Yasui, M. Pepe, M.L. Thompson, B.L. Adam, G.L. Wright, Y. Qu, J.D. Potter, M. Winget, M. Thornquist, Z. Feng, A data-analytic strategy for protein biomarker discovery: profiling of high-dimensional proteomic data for cancer detection, Biostatistics 4 (2003) 449–463. [11] D. Mantini, F. Petrucci, D. Pieragostino, P.D. Boccio, M.D. Nicola, C.D. Ilio, G. Federici, P. Sacchetta, S. Comani, A. Urbani, LIMPIC: a computational method for the separation of protein MALDI-TOF-MS signals from noise, BMC Bioinf. 8 (2007) 1–17. [12] K.R. Coombes, S. Tsavachidis, J.S. Morris, K.A. Baggerly, M.C. Hung, H.M. Kuerer, Improved peak detection and quantification of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform, Proteomics 5 (2010) 4107–4117. [13] P. Du, W.A. Kibbe, S.M. Lin, Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching, Bioinformatics 22 (2006) 2059–2065. [14] Z. Xu, X. Sun, P.D.B. Harrington, Baseline correction method using an orthogonal basis for gas chromatography/mass spectrometry data, Anal. Chem. 83 (2011) 7464–7471. [15] C.A. Wehe, A.C. Niehoff, G.M. Thyssen, M. Sperling, U. Karst, Rapid cell mode switching and dual laser ablation inductively coupled plasma mass spectrometry for elemental bioimaging, Rapid Commun. Mass Spectrom. Rcm 28 (2015) 2627–2635. [16] L.G. Johnsen, S. Thomas, H. Ulf, B. Rasmus, An automated method for baseline correction, peak finding and peak grouping in chromatographic data, Analyst 138 (2013) 3502–3511. [17] L.L.P.V. Stee, U.A.T. Brinkman, Peak detection methods for GC GC: an overview, Trac. Trends Anal. Chem. 83 (2016) 1–13. [18] K.H. Jarman, D.S. Daly, K.K. Anderson, K.L. Wahl, A new approach to automated peak detection, Chemometr. Intell. Lab. Syst. 69 (2003) 61–76. [19] Y. Tu, X. Yang, S. Zhang, Y. Zhu, Determination of theanine and gammaaminobutyric acid in tea by high performance- liquid chromatography with precolumn derivatization, Chin. J. Chromatogr. 30 (2012) 184–189. [20] Y. Zheng, R. Fan, C. Qiu, Z. Liu, D. Tian, An improved algorithm for peak detection in mass spectra based on continuous wavelet transform, Int. J. Mass Spectrom. 409 (2016) 53–58. [21] K.S. Joseph, J. Anguizola, A.J. Jackson, D.S. Hage, Chromatographic analysis of acetohexamide binding to glycated human serum albumin, J. Chromatogr. B 878 (2010) 2775–2781. [22] A. Savitzky, M.J.E. Golay, Smoothing and differentiation of data by simplified least squares procedures, Anal. Chem. 36 (1964) 1627–1639. [23] Y. Liu, B. Dang, Y. Li, H. Lin, H. Ma, Applications of savitzky-golay filter for seismic random noise reduction, Acta Geophys. 64 (2016) 101–124. [24] H.H. Madden, Comments on the Savitzky-Golay convolution method for leastsquares-fit smoothing and differentiation of digital data, Anal. Chem. 50 (1978) 1383–1386. [25] R.J. Paruch, B.J. Garrison, Z. Postawa, Partnering analytic models and dynamic secondary ion mass spectrometry simulations to interpret depth profiles due to kiloelectronvolt cluster bombardment, Anal. Chem. 84 (2012) 3010–3016. [26] J. Zhu, H. Wang, Adaptive beamforming for correlated signal and interference: a frequency domain smoothing approach, Acoust. Speech Signal Process. 38 (1990) 193–195. [27] M. Lang, H. Guo, J.E. Odegard, C.S. Burrus, R.O. Wells, Noise reduction using an undecimated discrete wavelet transform, IEEE Signal Process. Lett. 3 (2002) 10–12. [28] T. Rejtar, H.S. Chen, V. Andreev, E. Moskovets, B.L. Karger, Increased identification of peptides by enhanced data processing of high-resolution MALDI TOF/TOF mass spectra prior to database searching, Anal. Chem. 76 (2004) 6017–6028. [29] ] M. Lang, H. Guo, J.E. Odegard, C.S. Burrus, R.O. Wells, Noise reduction using an undecimated discrete wavelet transform, IEEE Signal Process. Lett. 3 (2002) 10–12. [30] Y. Li, Y. Ding, T. Li, Nonlinear diffusion filtering for peak-preserving smoothing of a spectrum signal, Chemometr. Intell. Lab. Syst. 156 (2016) 157–165. [31] A. Witkin, Scale-space filtering: a new approach to multi-scale description, in: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP, 2003, pp. 150–153. [32] P. Perona, J. Malik, Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. Mach. Intell. 12 (2002) 629–639. [33] Y. Li, F. Liu, I.W. Turner, T. Li, Time-fractional diffusion equation for signal smoothing, Appl. Math. Comput. 326 (2018) 108–116. [34] Y. Li, M. Jiang, F. Liu, Time fractional super-diffusion model and its application in peak-preserving smoothing, Chemometr. Intell. Lab. Syst. 175 (2018) 13–19. [35] Y. Li, M. Jiang, Spatial-fractional order diffusion filtering, J. Math. Chem. 56 (2018) 257–267. [36] Y. Li, C. Pan, Y. Xue, X. Meng, Y. Ding, A novel signal enhancement method for overlapped peaks with noise immunity, Spectrosc. Lett. 49 (2016) 285–293.
Table 2 Comparison of the sensitivity with enhancement step and without enhancement step.
Dataset1
Dataset2
Dataset3
Dataset4
FDR
PROcess
LMS
LIMPIC
Cromwell
[0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5]
0.3462 0.4442 0.5017 0.5438 0.6465 0.2502 0.5105 0.5836 0.6125 0.7180 0.2142 0.2486 0.3132 0.5167 0.6421 0.2355 0.2505 0.3222 0.4646 0.5570
0.1700 0.3183 0.3977 0.4912 0.5487 0.1305 0.1900 0.2945 0.3983 0.4695 0.0922 0.1368 0.1764 0.2615 0.4494 0.1252 0.2027 0.2767 0.3298 0.5100
0.2662 0.3856 0.7116 0.8279 0.8381 0.5050 0.7458 0.7764 0.7889 0.8512 0.5039 0.6684 0.7093 0.7458 0.7969 0.1049 0.1939 0.3631 0.6285 0.7996
0.1032 0.2109 0.3258 0.5638 0.6659 0.1188 0.2251 0.3215 0.3981 0.6578 0.1032 0.2109 0.3258 0.5638 0.6659 0.1188 0.2251 0.3215 0.3981 0.6578
step can help to improve the detection performance, which has been verified by adding the enhancement step for PROcess, LMS, LIMPIC and Cromwell programs. A peak detection algorithm was proposed by combining the signal enhancement and peak-preserving smoothing. Compared with the existing peak detection algorithms, the proposed algorithm has better detection performance for low FDR. Acknowledgements The work was partly supported by the National Natural Science Foundation of China (Grant: 61671010), the Natural Science Foundation of Jiangsu Province of China (Grant: BK20161513), and Qing Lan Project of Jiangsu Province. References [1] Y. Chao, Z. He, W. Yu, Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis, BMC Bioinf. 10 (2009) 4. [2] G. Viv o-Truyols, J.R. Torres-Lapasi o, A.M.V. Nederkassel, Y.V. Heyden, D.L. Massart, Automatic program for peak detection and deconvolution of multioverlapped chromatographic signals : Part II: peak model and deconvolution algorithms, J. Chromatogr. A 1096 (2005) 146–155. [3] N. Dyson, Chromatographic Integration Methods, second ed., Royal Society of Chemistry, Cambridge, 1998. [4] B.L. Hood, T.D. Veenstra, T.P. Conrads, Mass spectrometry-based proteomics, Int. Congr. 1266 (2004) 375–380. [5] S. Bottini, N. Hamouda-Tekaya, B. Tanasa, L.E. Zaragosi, V. Grandjean, E. Repetto, M. Trabucchi, From benchmarking HITS-CLIP peak detection programs to a new method for identification of miRNA-binding sites from Ago2-CLIP data, Nucleic Acids Res. 45 (2017) e71-e71. [6] Y.J. Yu, Q.L. Xia, S. Wang, B. Wang, F.W. Xie, X.B. Zhang, Y.M. Ma, H.L. Wu, Chemometric strategy for automatic chromatographic peak detection and background drift correction in chromatographic data, J. Chromatogr. A 1359 (2014) 262–270. [7] Z. Jianqiu, G. Elias, H. Travis, H. William, H. Yufei, Review of peak detection algorithms in liquid-chromatography-mass spectrometry, Curr. Genom. 10 (2009) 388–401 -. [8] S. Roy, New algorithms for processing and peak detection in liquid chromatography/mass spectrometry data, Rapid Commun. Mass Spectrom. Rcm 16 (2010) 462–467.
137