Diffusion enhancement model and its application in peak detection

Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory...

Download PDF

816KB Sizes 0 Downloads 21 Views

Report

PDF Reader
Full Text

Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemometrics

Diffusion enhancement model and its application in peak detection Jun Li b, Yuanlu Li a, b, *, Weijing Zhao b, Min Jiang b a

Jiangsu Collaborative Innovation Center of Atmospheric Environment and Equipment Technology (CICAEET), Nanjing University of Information Science & Technology, Nanjing, 210044, China b School of Automation, Nanjing University of Information Science & Technology, Nanjing, 210044, China

A R T I C L E I N F O

A B S T R A C T

Keywords: Smoothing Nonlinear diffusion Fractional diffusion Diffusion enhancement Peak detection

It is a challenge for peak detection algorithms to detect some low-amplitude peaks and overlapped peaks contaminated by noise. Among of existing peak detection algorithms, the continuous wavelet transform (CWT)based algorithm is the best. When the Mexican Hat wavelet is selected as the mother wavelet, the CWT of a signal is essentially equivalent to using the Gaussian function to smooth the 2nd derivative of the signal. Therefore, a natural idea is to combine the peak enhancement step and peak-preserving diffusion into peak detection process to improve the performance of peak detection. In the proposed algorithm, the Gaussian smoothing in the CWTbased algorithm is replaced with the peak-preserving diffusion ﬁltering. As an assessment of the proposed algorithm, a simulated spectrum with low-amplitude peaks and overlapped peaks was generated and used to test the enhancement performance. Then 100 groups of simulated proteomics data sets in [1] were used to assess the proposed algorithm. In these data sets, the true peaks are known in each spectrum. Thus, the false discovery rate (FDR) is easy to ﬁnd. Five typical peak detection programs were chosen to compare the proposed algorithm. The FDR and sensitivity is employed to compare the performance of these algorithms. Result shows that the proposed algorithm can improve the performance of peak detection.

1. Introduction Peak detection is one of the important steps in spectrometry analysis [1–3]. It has been applied in many aspects including disease diagnosis, water quality analysis, and chemical mixture identiﬁcation [4–6]. The performance of peak detection will directly affect the subsequent analysis. Therefore some peak detection programs have been developed for this purpose [7,8]. For example, PROcess [9], LMS [10], LIMPIC [11], Cromwell [12], CWT [13], etc. Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis has been included in Ref. [1]. The framework of the peak detection usually include following steps: baseline correction, smoothing and peak picking [1]. Similarities and differences among these existing peak detection algorithms lie in the methods used in each step of peak detection process. At present, there have been some methods for the baseline correction, such as monotone minimum [14], linear interpolation [15] and wavelet transforms [16]; Peak picking methods include intensity threshold [17], ﬁrst derivative [18], second derivative [19] and local maximum [20]. The smoothing step is the most important one in peak detection. There have been many smoothing methods. Among of them, the simplest

one is the moving average ﬁltering, it averages adjacent points instead of the initial center point, but it will distort the peak of signal seriously [21]. The Savitzky-Golay smoothing method is a widely used method for spectra smoothing [22,23]. It uses polynomials to ﬁt data points in a slide window and takes the center point of the ﬁtting data as the smoothing result. Compared with moving average ﬁltering, the Savitzky-Golay smoothing method has a better effect on peak protection [24]. The Gaussian smoothing method is another improved method of moving average ﬁltering [25], it takes the Gaussian function as a smoothing function. Another kind of smoothing method is to discard the high-frequency components in the frequency domain and then reconstruct the signal to get the smoothing signal [26]. Wavelet method is the most used frequency smoothing method [12,27–29]. Smoothing are unrecoverable, that’s to say, if a real peak is removed during smoothing step, this peak can never be recovered in the subsequent analysis. Therefore, a major challenge is the peak-preserving smoothing in peak detection. It was reported that the nonlinear diffusion ﬁltering has capability of peak-preserving smoothing [30]. The diffusion ﬁltering can be traced back to the 1980s. At that time Witkin and his colleagues found that the solution of the homogeneous linear diffusion equation is equivalent to the convolution of the initial signal

* Corresponding author. School of Automation, Nanjing University of Information Science & Technology, Nanjing, 210044, China. E-mail address: [email protected] (Y. Li). https://doi.org/10.1016/j.chemolab.2019.04.012 Received 20 November 2018; Received in revised form 12 April 2019; Accepted 23 April 2019 Available online 24 April 2019 0169-7439/© 2019 Elsevier B.V. All rights reserved.

J. Li et al.

Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137

Baseline correction

Signal

Signal enhancement

Peak picking

Smoothing

Peak list

Fig. 1. A framework for the peak detection.

smoothing algorithm of the enhanced signal. The cubic spline interpolation is selected as the baseline correction method, the local maximum and amplitude method are selected as the peak picking method. As an assessment of the proposed algorithm, a simulated signal with lowabundance peaks and overlapped peaks was generated and used to test the performance of the proposed algorithm. Then four simulated proteomics data sets in Ref. [1] were used to assess the proposed algorithm. In these data sets, the true peaks are known in each spectrum. Thus, the false discovery rate (FDR) is easy to ﬁnd. Five typical peak detection algorithms were chosen to compare the proposed algorithm. The FDR and sensitivity is taken to compare the performance of these algorithms.

with Gaussian function at each scale [31]. Because homogeneous linear diffusion ﬁltering may blur the edge of a signal and destroy the peak of a signal, Perona and Malik proposed the nonlinear diffusion model in 1990 [32]. This model can not only remove the noise better, but also protect the signal characteristics such as edges and peaks. Recently, some improved diffusion models have been proposed [33–35]. For example, the time-fractional diffusion model [33], time fractional super-diffusion model [34], spatial-fractional order diffusion model [35]. It is a challenge for peak detection algorithms to detect some lowamplitude peaks and overlapped peaks contaminated by noise. Among of these existing peak detection algorithms, the continuous wavelet transform (CWT)-based algorithm is superior to other ones [1]. The reason for that is the CWT can enhance the low-abundance peaks and improve the resolution of the overlapped peaks. In the CWT-based algorithm, the Mexican Hat wavelet is usually selected as the mother wavelet, which is proportional to the 2nd derivative of the Gaussian function. Thus, the CWT of a signal is essentially equivalent to using the Gaussian function to smooth the 2nd derivative of the signal. The essential reason that CWT provides the best performance lies in using the 2nd derivative to enhance the signal. Therefore, a new framework for the peak detection is proposed. It is shown in Fig. 1. Comparison with the existing framework for the peak detection, in the new framework, the signal enhancement step is added between the baseline correction step and the smoothing step. For the smoothing step, Gaussian smoothing using the CWT-based peak detection algorithm is replaced with the time-space diffusion ﬁltering, which is a peak-preserving smoothing algorithm. In this paper, the time fractional super-diffusion model is used as the

2. Diffusion enhancement model The peak-enhanced signal is obtained by the weighted sum of the original spectra and the negative of its second derivative, it is given by Ref. [36]: f enðxÞ ¼ f ðxÞ cf ðnÞ ðxÞ

(1)

where f is the initial signal, f ðnÞ is the nth order derivative, f en is the enhanced signal, usually n ¼ 2c is the enhancement coefﬁcient. The diffusion enhancement model is described by 8 α ∂ uðx; tÞ ∂2 uðx; tÞ > > ¼ g½uðx; tÞ ; > α > > ∂ t ∂x2 <

1<α<2

∂uðx; 0Þ > ¼ 0; uðx; 0Þ ¼ f ðxÞ c f ð2Þ ðxÞ; > > ∂t > > : uð0; tÞ ¼ uðL; tÞ ¼ 0; 0 < t < T:

1.5

(2)

0 < x < L;

Where uðx; 0Þ is the initial enhanced signal, uðx; tÞ is the smoothed signal

1

1

Real signal 0.5

0

0

Intensity

Intensity

0.5

Noisy signal

-0.5

-0.5 -1

-1 -1.5

-1.5

0

100

200

300

400

500

Diffusion strength Real signal 0

x

100

200

300

400

x

Fig. 2. Noisy simulated signal.

Fig. 3. The real signal and diffusion strength. 131

500

J. Li et al.

Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137

U 1 ¼ A0 U 0 k X ðαÞ ðαÞ U k ¼ Ak1 U k1 bj bj1 U kjþ1 U kj ; k 2

at the instant of time t, g½uðx; tÞ is the diffusion function, which is used to α

control the diffusion process, ∂ uðx;tÞ ∂t α is the Caputo fractional derivative of uðx; tÞ and αis the order of time-fractional derivative. This means that the solution of the diffusion equation is a smoothed signal when the signal enhanced by the second derivative is taken as the initial value of the diffusion equation. In practice, data is discrete. Therefore, the ﬁnite difference scheme of Eq.(2) is given as following:

where k is the number of iterations, ðαÞ

bj

2α

¼ j2α ðj 1Þ 2

1.4 1.3 Intensity

1.1 1 1

340 350 360 x signal Real

¼ In matrix ½Ak1 , βk1 i

Automatic 0.6

=2

0.4

0.2

0

100

200

300

400

500

x Fig. 4. Performance evaluation of automatic λ method.

1.5

⋱

βk1 M2

g ik1

ωα;τ ,

⋱ 1 2βk1 M2 βk1 M1

βk1 M2 1 βk1 M1

7 7 k 7; U 7 5

α

τ where ωα;τ ¼ Γð3 αÞ and τ is the time step

1.5

1

1

0.5

a

0.5

d

0 0

-0.5

b

Intensity

Intensity

3 βk1 2

size, usually it should meet τ 0:1 for algorithm stability, g ki denotes the value of the diffusion function of the kth iteration at the ith position of the sequence, i ¼ 1; 2; ⋯M 1, M is the length of the sequence. The pseudo codes for diffusion enhancement model are given as following: Algorithm 1. Diffusion enhancement algorithm Input: Initial signal:f ðxÞ, in practice, it is a column vector The enhancement coefﬁcient: c The order of time-fractional derivative: α Number of iterations: N Diffusion threshold: λ Time step size: τ Output: Smoothed signal: U N , result after N times diffusion 1: Construct the enhanced signal U 0 , it is a column vector; 2. Calculate the Caputo’ derivative coefﬁcient ωα;τ ;

=0.5

0

βk1 1 1 2βk1 2 ⋱

3 uk1 6 uk 7 2 7 ¼6 4 ⋮ 5 ukM1

0.4

0.8

1 βk1 1 βk1 2

(4)

2

0.45

170 180 190 Intenisty

6 k1 6 ¼6 A 6 4

0.5

1.2 1.2

(3)

j¼2

-1 -1.5

-0.5

e

-1

-2 -1.5

f

-2.5 -2

-3

c -3.5

0

100

200

300

400

-2.5

500

x

0

100

200

300

400

500

x

(a) Simulation signals

(b) Original signal and smoothed signals

Fig. 5. Original signal, noisy signals, enhanced signal and its smoothed signals. (a) The original signal, (b) the noisy signal corresponding to the original signal (20 dB), (c) the enhanced signal of the noisy signal, (d) the true peaks marked with asterisk, (e) the smoothed signal of the noisy signal, one can found an overlapped peak cannot be detected because of smoothing operation. (f) The smoothed signal of the enhanced signal, the peaks become more obvious. 132

J. Li et al.

Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137

Initial signal

Intensity(a.u.)

6000 4000 2000 0 500

1000

1500

2000

1000

1500

2000

2500 3000 3500 m/z Corrected signal

4000

4500

5000

4000

4500

5000

Intensity(a.u.)

6000 4000 2000 0 500

2500 3000 m/z

3500

Fig. 6. Baseline correction.

"

4: Construct the ﬁrst diffusion smoothing matrix A0 ; 5: Calculate the ﬁrst diffusion result U 1 ¼ ½A0 U 0 ; 6: for k ¼ 2⋯N do "

3: Calculate the diffusion function of the ﬁrst iteration g ¼ exp 0

2 # U0 ; λ

7: Calculate the t-th diffusion function g k ¼ exp

Intensity(a.u.)

8000 6000

11: end for 12: Calculate the k-th diffusion result: U k ¼ ½Ak1 U k1 sum U; 13: end for When the matrix exceeds the upper limit of storage, one can use

4000 2000

1000

1500

2000

2500 3000 3500 m/z (b) Enhanced signal

4000

4500

5000

10000

Intensity(a.u.)

10000

Enhanced signal Corrected signal Smoothed signal

5000 0 -5,000

5000 1000

1500

2000

2500 3000 3500 m/z (c) Smoothed signal

4000

4500

Intensity(a.u.)

500

5000

8000 Intensity(a.u.)

;

8: Construct the smoothing matrix Ak1 ; 9: for j ¼ 2⋯N do P ðαÞ ðαÞ 10: Calculate sum U ¼ kj¼2 ðbj bj1 ÞðU kjþ1 U kj Þ;

(a) Corrected signal

0 500

2 #

U k1 λ

0

6000 4000 2000 0 500

1000

1500

2000

2500 3000 m/z

3500

4000

4500

-5000 2050

5000

Fig. 7. Enhanced signal and its smoothed result.

2100

2150

2200

2250 m/z

2300

2350

Fig. 8. The partial enlargement of Fig. 7. 133

2400

2450

J. Li et al.

Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137

Fig. 2 is a noisy simulated signal. Fig. 3 shows the diffusion function for signal shown in Fig. 2. One easily found the diffusion is weaker at the peaks than other positions. Due to the difference in characteristics of different signals, it is difﬁcult to give λ by experience. Thus, this paper provides an automatic value method. λ can be determined by the diffusion strength on the highest peak. If one sets the diffusion strength g be 0.05 on the highest peak, then λðtk Þ can be computed by

convolution to calculate the product of matrix and vector. 3. Data and evaluation criteria 3.1. Data Hundreds of proteomics data sets can be available from supplements of [1]. This data set has 100 groups of data and each group has 100 spectra. Here, we use them to compare the results of the proposed algorithm with that of other processing algorithms.

λðtk Þ ¼

maxðuðx; tk1 ÞÞ minðuðx; tk1 ÞÞ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ; lnð0:05Þ

(22)

Of course, you can select another diffusion strength values between 0 and 1, such as 0.01, 0.1, 0.5, etc. In order to illustrate the effect of automatic λ method, its smoothing result is compared with the results of λ ¼ 0.5 and λ ¼ 2, it is as shown in Fig. 4. It can be seen from Fig. 4, when λ is 0.5, the peak-preserving

3.2. Evaluation criteria The false discovery rate (FDR) and sensitivity is used to measure the performance of the algorithms. The FDR is deﬁned as a ratio of the number of falsely identiﬁed peaks to the total number of peaks found by algorithms. Sensitivity is deﬁned as the number of correctly identiﬁed peaks divided by the total number of true peaks [1]. It is difﬁcult for different algorithms to produce the same FDR, therefore, we divide FDR into small segments, such as [0, 0.1]. When the FDR of different algorithms fall in [0, 0.1], they can be considered to have the same FDR, then we compare their sensitivity, the higher the sensitivity, the better the algorithm performance.

5000

(a) Corrected signal 0

4. Results and discussion

(b) =500 4.1. Peak-preserving diffusion Intensity(a.u.)

-5000

The difference between the nonlinear diffusion model and the linear diffusion model lies in the diffusion function. If the diffusion function is constant, it is the linear diffusion model, which is equivalent to the common Gaussian smoothing. The nonlinear diffusion model takes the smoothed spectra to design the diffusion function. For example, the diffusion function can be designed as 2

uðx; tk1 Þ minðuðx; tk1 Þ g½uðx; tk Þ ¼ exp λ

(c) =1000 -10000

(d) Automatic (21) -15000

where λ is the threshold for controlling the diffusion strength. One can see that g½uðx; tk Þ is between 0 and 1. For a ﬁxed λ, the bigger value of uðx;tk1 Þ, the smaller g½uðx;tk Þ. Thus, the diffusion strength is weak at the peak position. As a result, smoothing of the peaks is slight. So, the nonlinear diffusion model has capability of peak-preserving smoothing. In fact, when the parameter λ is big enough, the nonlinear diffusion will turn into a linear diffusion, that is to say, it is the Gaussian smoothing. In practice, the parameter λ is given according to experience.

(e) =2000 -20000 3600

3800

4000

4200 m/z

4400

Fig. 10. Comparison of smoothing results for different λ.

Intensity(a.u.)

8000

6000

4000

2000

0 500

1000

1500

2000

4600

2500 3000 m/z

Fig. 9. Detected peaks. 134

3500

4000

4500

5000

4800

J. Li et al.

Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137

Table 1 Sensitivity of different models. LIMPIC

Cromwell

CWT

[0,0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5]

0.7493 0.8659 0.9047 0.9423 0.9558 0.6601 0.8880 0.9260 0.9612 0.9742 0.6204 0.8272 0.8762 0.9225 0.9521 0.6916 0.8609 0.9291 0.9543 0.9684

0.2418 0.3023 0.3563 0.5103 0.5593 0.2242 0.5079 0.5003 0.5240 0.5273 0.2042 0.2370 0.2680 0.4106 0.4343 0.1999 0.2371 0.3039 0.2951 0.3330

0.0219 0.0882 0.0620 0.0658 0.0880 0.0189 0.0703 0.0564 0.0587 0.0973 0.0167 0.0709 0.0552 0.0484 0.0857 0.0151 0.0822 0.0529 0.0561 0.0784

0.0000 0.0000 0.7697 0.6899 0.7408 0.4967 0.7449 0.7391 0.6679 0.6476 0.4911 0.6163 0.6619 0.6270 0.7215 0.0000 0.0000 0.0000 0.5976 0.6561

0.0000 0.0000 0.2318 0.3641 0.5458 0.0000 0.0000 0.2163 0.3851 0.5269 0.0000 0.0000 0.2504 0.5227 0.5473 0.0000 0.0000 0.2808 0.2224 0.5637

0.5276 0.6554 0.9240 0.9698 0.9558 0.4949 0.8853 0.9340 0.9481 0.9922 0.4277 0.5149 0.8069 0.8986 0.9500 0.4220 0.6299 0.6899 0.9549 0.9592

1

1

0.9

0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4

0.4 0.3

0.2

0.2

0.1

0.1 0.1

0.2

0.3

0

0.4

0.2

0.3 FDR

(a) Dataset1

(b) Dataset2

1

1 0.9

0.8

0.8

0.7

0.7

0.6

0.6

0.5 0.4

0.4 0.3

0.2

0.2

0.1

0.1 0.2

0.3

0

0.4

0.1

0.2

0.3

FDR

FDR

(c) Dataset3

(d) Dataset4

5 4 TFDEM 4.5 PROcess 3.5

3 LMS

2.5

2 1 LIMPIC 1.5 Cromwell 0.5

Fig. 11. Sensitivity of different algorithms.

135

0.4

0.5

0.3

0.1

0.1

FDR

0.9

0

5

0.5

0.3

0

6

sensitivity

Dataset4

LMS

sensitivity

Dataset3

PROcess

sensitivity

Dataset2

TFDEM

sensitivity

Dataset1

FDR

0.4

CWT

J. Li et al.

Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137

smoothing performance is the best, but when λ is adjusted to 2, the performance will be much worse. Although the performance of the automatic λ method is not the best, but it cannot too bad. Therefore, for unfamiliar signals, the automatic λ method is a better choice. Comparison with the CWT-based algorithm, the proposed algorithm replaces the Gaussian smoothing with the time fractional diffusion smoothing, which is a peak-preserving smoothing algorithm.

adjust the detection width to achieve different FDR segments, and record the sensitivity in each FDR segment. This paper selects 4 groups of simulation data, each group of data includes 100 spectra. The sensitivity of 100 groups of data in the same FDR segment is averaged. The results of the proposed algorithm and that of the existing programs are shown in Table 1. Taking the median value of each FDR segment as abscissa, the sensitivity in Table 1 are represented in Fig. 11. From Table 1 and Fig. 11, one can ﬁnd that when FDR is less than 0.2, the sensitivity of the proposed algorithm is obviously higher than the traditional algorithms; when FDR is more than 0.2, the sensitivity of the proposed model is as good as the CWT-based algorithm. To further reveal the reason in Table 1, we randomly selected one result and enlarge it. The result shows in Fig. 12, where the red vertical line shows the true peak positions. As can be seen from Fig. 12, in addition to proposed algorithm and CWT-based algorithm, other algorithms have difﬁculty distinguishing overlapping peaks or destroy low-abundance peaks, such as the peaks at m/z ¼ 6600, m/z ¼ 6680, m/z ¼ 7050, these problems can decrease the sensitivity of peak detection. Same as the proposed algorithm, CWTbased algorithm can distinguish overlapped peaks and enhance lowabundance peaks, however, its noise level is higher than the proposed algorithm, which will result in a higher FDR. The excellent performance is due to signal enhancement and peakpreserving smoothing. In order to reveal this reason, the enhancement step has added in PROcess, LMS, LIMPIC and Cromwell. The results are presented in Table 2, compared with Table 1, one can clearly see that the sensitivity with enhancement step are higher those without enhancement step. However, the results are not as good as that of the proposed algorithm. It shows peak-preserving smoothing is also important to improve the performance of peak detection.

4.2. The performance of enhancement In this paper, an enhancement step was inserted before smoothing step as shown in Fig. 1. As an assessment of the proposed algorithm, a simulated signal with low amplitudes and overlapped peaks was generated and used to test the performance of the proposed algorithm. The result is shown in Fig. 5, from which one can found the enhancement step can help to improve peak resolution. 4.3. Peak detection process This data set has 100 groups of data and each group has 100 spectra. The true peaks are known in each spectrum. These peak lists are used as samples in our experiment. Because different signal have different optimal parameters when performing peak detection. Therefore, we select the same baseline correction and smoothing parameters ﬁrstly, then adjust the peak picking parameters and record the sensitivity of the proposed algorithm in different FDR segments. The peak detection follows the framework in Fig. 1. Fig. 6, Fig. 7 Fig. 8 and Fig. 9, show the process of the peak detection. The cubic spline interpolation was used as the method for baseline correction. Fig. 6 presents the baseline correction result of a simulated mass spectrum. The enhanced signal (Fig. 7 b) is obtained by Eq. (2) after baseline correction. The value of enhancement coefﬁcient c can adjust according to the requirements of FDR and sensitivity. Here, c was set as 100. Then the enhanced signal is smoothed by time fractional diffusion model (TFDEM), where α is 1.15, τ is 0.25, λ is 1000. The smoothing result for the 100th iteration is shown in Fig. 7 c. Fig. 8 is the partial enlargement of Fig. 7 b and Fig. 7 c. The last step is peak picking. In this paper, we combine the amplitude method with the local maximum method to pick peak. Firstly, we set the amplitude threshold as 400, which obtained through trial and error. Secondly, the detection width of the local maximum is determined by the m/z value. Usually, 0.05%–1% of the m/z value is used as the detection width. The larger the m/z value, the larger the detection width. The result of peak picking is shown in Fig. 9. The circle denotes the detected peaks.

5. Conclusion A new framework for peak detection was proposed through analysis of the CWT-based peak detection algorithm. The signal enhancement

Corrected signal 0

TFDM

4.4. The effect of parameters on the results

CWT

-5000

Intensity(a.u.)

In the previous experiment, λ was given as 1000. In order to describe the effect of λ on the result, a piece of mass spectrum is selected as a sample, and λ ¼ 500, λ ¼ 1000, λ ¼ 2000 and automatic λ are selected to smooth the signal. The result is shown in Fig. 10, where the red vertical line shows the true peak positions. Where, (b) still retains much noise, which may cause the higher FDR. (e) has been excessively smoothing, such as weak peaks at m/z ¼ 3700 and m/z ¼ 4650 has been destroyed, which lead to a decrease in sensitivity. The performance of (c) is the best, thus, the selection of λ has a greater effect on the peak detection result. In addition, (d) selects automatic λ method, which performance is slightly worse than (c), but better than (b) and (e). As can be seen, if having no experience in the selection of λ, using automatic λ method is a good choice.

PROcess -10000

LMS

LIMPIC -15000

Cromwell

4.5. Comparison of detection results

-20000 6500

6600

6700

6800

6900

7000

7100

7200

m/z

We keep the same baseline correction and smoothing parameters. In addition, we set the ﬁxed amplitude threshold in peak picking step, only

Fig. 12. Comparison of the proposed algorithm and other algorithms. 136

J. Li et al.

Chemometrics and Intelligent Laboratory Systems 189 (2019) 130–137 [9] X. Li, R. Gentleman, X. Lu, Q. Shi, J.D. Iglehart, L. Harris, A. Miron, SELDI-TOF Mass Spectrometry Protein Data, Springer, New York, 2005. [10] Y. Yasui, M. Pepe, M.L. Thompson, B.L. Adam, G.L. Wright, Y. Qu, J.D. Potter, M. Winget, M. Thornquist, Z. Feng, A data-analytic strategy for protein biomarker discovery: proﬁling of high-dimensional proteomic data for cancer detection, Biostatistics 4 (2003) 449–463. [11] D. Mantini, F. Petrucci, D. Pieragostino, P.D. Boccio, M.D. Nicola, C.D. Ilio, G. Federici, P. Sacchetta, S. Comani, A. Urbani, LIMPIC: a computational method for the separation of protein MALDI-TOF-MS signals from noise, BMC Bioinf. 8 (2007) 1–17. [12] K.R. Coombes, S. Tsavachidis, J.S. Morris, K.A. Baggerly, M.C. Hung, H.M. Kuerer, Improved peak detection and quantiﬁcation of mass spectrometry data acquired from surface-enhanced laser desorption and ionization by denoising spectra with the undecimated discrete wavelet transform, Proteomics 5 (2010) 4107–4117. [13] P. Du, W.A. Kibbe, S.M. Lin, Improved peak detection in mass spectrum by incorporating continuous wavelet transform-based pattern matching, Bioinformatics 22 (2006) 2059–2065. [14] Z. Xu, X. Sun, P.D.B. Harrington, Baseline correction method using an orthogonal basis for gas chromatography/mass spectrometry data, Anal. Chem. 83 (2011) 7464–7471. [15] C.A. Wehe, A.C. Niehoff, G.M. Thyssen, M. Sperling, U. Karst, Rapid cell mode switching and dual laser ablation inductively coupled plasma mass spectrometry for elemental bioimaging, Rapid Commun. Mass Spectrom. Rcm 28 (2015) 2627–2635. [16] L.G. Johnsen, S. Thomas, H. Ulf, B. Rasmus, An automated method for baseline correction, peak ﬁnding and peak grouping in chromatographic data, Analyst 138 (2013) 3502–3511. [17] L.L.P.V. Stee, U.A.T. Brinkman, Peak detection methods for GC GC: an overview, Trac. Trends Anal. Chem. 83 (2016) 1–13. [18] K.H. Jarman, D.S. Daly, K.K. Anderson, K.L. Wahl, A new approach to automated peak detection, Chemometr. Intell. Lab. Syst. 69 (2003) 61–76. [19] Y. Tu, X. Yang, S. Zhang, Y. Zhu, Determination of theanine and gammaaminobutyric acid in tea by high performance- liquid chromatography with precolumn derivatization, Chin. J. Chromatogr. 30 (2012) 184–189. [20] Y. Zheng, R. Fan, C. Qiu, Z. Liu, D. Tian, An improved algorithm for peak detection in mass spectra based on continuous wavelet transform, Int. J. Mass Spectrom. 409 (2016) 53–58. [21] K.S. Joseph, J. Anguizola, A.J. Jackson, D.S. Hage, Chromatographic analysis of acetohexamide binding to glycated human serum albumin, J. Chromatogr. B 878 (2010) 2775–2781. [22] A. Savitzky, M.J.E. Golay, Smoothing and differentiation of data by simpliﬁed least squares procedures, Anal. Chem. 36 (1964) 1627–1639. [23] Y. Liu, B. Dang, Y. Li, H. Lin, H. Ma, Applications of savitzky-golay ﬁlter for seismic random noise reduction, Acta Geophys. 64 (2016) 101–124. [24] H.H. Madden, Comments on the Savitzky-Golay convolution method for leastsquares-ﬁt smoothing and differentiation of digital data, Anal. Chem. 50 (1978) 1383–1386. [25] R.J. Paruch, B.J. Garrison, Z. Postawa, Partnering analytic models and dynamic secondary ion mass spectrometry simulations to interpret depth proﬁles due to kiloelectronvolt cluster bombardment, Anal. Chem. 84 (2012) 3010–3016. [26] J. Zhu, H. Wang, Adaptive beamforming for correlated signal and interference: a frequency domain smoothing approach, Acoust. Speech Signal Process. 38 (1990) 193–195. [27] M. Lang, H. Guo, J.E. Odegard, C.S. Burrus, R.O. Wells, Noise reduction using an undecimated discrete wavelet transform, IEEE Signal Process. Lett. 3 (2002) 10–12. [28] T. Rejtar, H.S. Chen, V. Andreev, E. Moskovets, B.L. Karger, Increased identiﬁcation of peptides by enhanced data processing of high-resolution MALDI TOF/TOF mass spectra prior to database searching, Anal. Chem. 76 (2004) 6017–6028. [29] ] M. Lang, H. Guo, J.E. Odegard, C.S. Burrus, R.O. Wells, Noise reduction using an undecimated discrete wavelet transform, IEEE Signal Process. Lett. 3 (2002) 10–12. [30] Y. Li, Y. Ding, T. Li, Nonlinear diffusion ﬁltering for peak-preserving smoothing of a spectrum signal, Chemometr. Intell. Lab. Syst. 156 (2016) 157–165. [31] A. Witkin, Scale-space ﬁltering: a new approach to multi-scale description, in: Acoustics, Speech, and Signal Processing, IEEE International Conference on ICASSP, 2003, pp. 150–153. [32] P. Perona, J. Malik, Scale-space and edge detection using anisotropic diffusion, IEEE Trans. Pattern Anal. Mach. Intell. 12 (2002) 629–639. [33] Y. Li, F. Liu, I.W. Turner, T. Li, Time-fractional diffusion equation for signal smoothing, Appl. Math. Comput. 326 (2018) 108–116. [34] Y. Li, M. Jiang, F. Liu, Time fractional super-diffusion model and its application in peak-preserving smoothing, Chemometr. Intell. Lab. Syst. 175 (2018) 13–19. [35] Y. Li, M. Jiang, Spatial-fractional order diffusion ﬁltering, J. Math. Chem. 56 (2018) 257–267. [36] Y. Li, C. Pan, Y. Xue, X. Meng, Y. Ding, A novel signal enhancement method for overlapped peaks with noise immunity, Spectrosc. Lett. 49 (2016) 285–293.

Table 2 Comparison of the sensitivity with enhancement step and without enhancement step.

Dataset1

Dataset2

Dataset3

Dataset4

FDR

PROcess

LMS

LIMPIC

Cromwell

[0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5] [0.0.1] [0.1,0.2] [0.2,0.3] [0.3,0.4] [0.4,0.5]

0.3462 0.4442 0.5017 0.5438 0.6465 0.2502 0.5105 0.5836 0.6125 0.7180 0.2142 0.2486 0.3132 0.5167 0.6421 0.2355 0.2505 0.3222 0.4646 0.5570

0.1700 0.3183 0.3977 0.4912 0.5487 0.1305 0.1900 0.2945 0.3983 0.4695 0.0922 0.1368 0.1764 0.2615 0.4494 0.1252 0.2027 0.2767 0.3298 0.5100

0.2662 0.3856 0.7116 0.8279 0.8381 0.5050 0.7458 0.7764 0.7889 0.8512 0.5039 0.6684 0.7093 0.7458 0.7969 0.1049 0.1939 0.3631 0.6285 0.7996

0.1032 0.2109 0.3258 0.5638 0.6659 0.1188 0.2251 0.3215 0.3981 0.6578 0.1032 0.2109 0.3258 0.5638 0.6659 0.1188 0.2251 0.3215 0.3981 0.6578

step can help to improve the detection performance, which has been veriﬁed by adding the enhancement step for PROcess, LMS, LIMPIC and Cromwell programs. A peak detection algorithm was proposed by combining the signal enhancement and peak-preserving smoothing. Compared with the existing peak detection algorithms, the proposed algorithm has better detection performance for low FDR. Acknowledgements The work was partly supported by the National Natural Science Foundation of China (Grant: 61671010), the Natural Science Foundation of Jiangsu Province of China (Grant: BK20161513), and Qing Lan Project of Jiangsu Province. References [1] Y. Chao, Z. He, W. Yu, Comparison of public peak detection algorithms for MALDI mass spectrometry data analysis, BMC Bioinf. 10 (2009) 4. [2] G. Viv o-Truyols, J.R. Torres-Lapasi o, A.M.V. Nederkassel, Y.V. Heyden, D.L. Massart, Automatic program for peak detection and deconvolution of multioverlapped chromatographic signals : Part II: peak model and deconvolution algorithms, J. Chromatogr. A 1096 (2005) 146–155. [3] N. Dyson, Chromatographic Integration Methods, second ed., Royal Society of Chemistry, Cambridge, 1998. [4] B.L. Hood, T.D. Veenstra, T.P. Conrads, Mass spectrometry-based proteomics, Int. Congr. 1266 (2004) 375–380. [5] S. Bottini, N. Hamouda-Tekaya, B. Tanasa, L.E. Zaragosi, V. Grandjean, E. Repetto, M. Trabucchi, From benchmarking HITS-CLIP peak detection programs to a new method for identiﬁcation of miRNA-binding sites from Ago2-CLIP data, Nucleic Acids Res. 45 (2017) e71-e71. [6] Y.J. Yu, Q.L. Xia, S. Wang, B. Wang, F.W. Xie, X.B. Zhang, Y.M. Ma, H.L. Wu, Chemometric strategy for automatic chromatographic peak detection and background drift correction in chromatographic data, J. Chromatogr. A 1359 (2014) 262–270. [7] Z. Jianqiu, G. Elias, H. Travis, H. William, H. Yufei, Review of peak detection algorithms in liquid-chromatography-mass spectrometry, Curr. Genom. 10 (2009) 388–401 -. [8] S. Roy, New algorithms for processing and peak detection in liquid chromatography/mass spectrometry data, Rapid Commun. Mass Spectrom. Rcm 16 (2010) 462–467.

137

Diffusion enhancement model and its application in peak detection

Diffusion enhancement model and its application in peak detection

Recommend Documents