CMSENN: Computational Modification Sites with Ensemble Neural Network

Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72 Contents lists available at ScienceDirect Chemometrics and Intelligent Laboratory S...

Download PDF

2MB Sizes 0 Downloads 23 Views

Report

PDF Reader
Full Text

Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72

Contents lists available at ScienceDirect

Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemometrics

CMSENN: Computational Modiﬁcation Sites with Ensemble Neural Network Wenzheng Bao a, Bin Yang b, *, Dan Li a, Zhengwei Li c, Yong Zhou c, Rong Bao a a

School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, 221018, China School of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, China c School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China b

A R T I C L E I N F O

A B S T R A C T

Keywords: Post translation modiﬁcation Amino acid property Neural network

With the rapid development of high-through technology, vast amounts of protein molecular data has been generated, which is crucial to advance our understanding of biological organisms. An increasing number of protein post translation modiﬁcation sites identiﬁcation approaches have been designed and developed to detect such modiﬁcation sites among the protein sequences. Nevertheless, these methods are merely suitable for one type of modiﬁcation site, their performance deteriorate rapidly when applied to other types of modiﬁcation sites’ prediction. In this paper, with the method of different types of neural network algorithm ensemble, a novel method, named CMSENN (http://121.250.173.184/) Computational Modiﬁcation Sites with Ensemble Neural Network, was proposed to detect protein modiﬁcation. The algorithm mainly consists of several steps: First, the predicted peptide sequences translate to the feature vectors. Second, the three types of employed amino acid residues properties should be normalized. Finally, various combination of features and classiﬁcation model have been compared the performances with several current typical algorithms. The results demonstrate that the proposed model have well performance at the sensitivity, speciﬁcity, F1 score and Matthews correlation coefﬁcient (MCC) value in the identiﬁcation modiﬁcation with the approach of the selected features and algorithm combination.

1. Introduction More than 30,000 genes can be shown in the human genome information. Unfortunately merely 40% of them can be reported in their biology roles [1,2]. The protein functions play key roles in many biological processes in the ﬁeld of biology [3,4]. The traditional methods of protein functions identiﬁcation are regarded as expensive and consuming [5–10]. And such methods can hardly be utilized in general function identiﬁcation issue. We can ﬁnd out that the protein functions may be caused by the protein post translational modiﬁcation in the level of protein [11]. When it comes to post translational modiﬁcation (PTM), such modiﬁcation can be regarded as one type of complex biological modiﬁcations and plays the fundamental role in several key biological regulations. It was pointed that there are more than 500 types of modiﬁcation in the protein level. These modiﬁcation types play various roles in many regulation processes. Among these modiﬁcation types phosphorylation can be treated as one of the most common types [12,13]. The discovery of phosphorylation can date back to 1990s. During these years’ efforts, such

modiﬁcation has been reported in metabolic regulation and cancer research [14–23]. On the other hand, several art-of-the-state resources and methods should be listed in this work. The Swiss-Prot [24,25] includes a great deal of modiﬁcation information and is one of the most well-known protein databases among the world [26]. Phospho.ELM [27], which was constructed by ELM (Eukaryotic Linear 50 Motif), can be treated as a novel resource in this ﬁled [28,29]. O-GLYCBASE [30,31], which is glycoproteins database, contains the majority of O-linked glycosylation sites among several species [32,33]. In this work, we have the collected PTM data come from 65 external biological data sources. Therefore, we have designed computational algorithm to comprehensively identify phosphorylation sites, glycosylation sites and sulfation sites [34–37]. Protein structural properties and functional information, such as the membrane-buried preference parameters, confrontational parameter of β-turn and average ﬂexibility indices are provided for researchers who are investigating the protein PTM mechanisms. Several types of neural network algorithms and models have been employed as the ensemble identiﬁcation post translation modiﬁcation sites tool.

* Corresponding author. E-mail address: [email protected] (B. Yang). https://doi.org/10.1016/j.chemolab.2018.12.009 Received 18 October 2018; Received in revised form 11 December 2018; Accepted 21 December 2018 Available online 3 January 2019 0169-7439/© 2019 Elsevier B.V. All rights reserved.

W. Bao et al.

Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72

2. Methods and materials

Table 1 Comparison with other methods.

2.1. Data As is well known, different post translational modiﬁcation may have their features [38–43]. Therefore, we select some different modiﬁcation types, including glycosylation and phosphorylation sites, in this work. The employed modiﬁcation sites data sources are extracted from three well-known resources, including Swiss-Prot, Phospho.ELM and O-GLYCBASE [43–46]. Meanwhile, we have utilized the CD-HIT to reduce the repeated and high similarity protein in this work.

Method

Sn(%)

Sp(%)

FNR(%)

F1

MCC

PUL-PUP PSoL SVM_balance Naïve Bayesian Neural Network Proposed Method

82.24 67.50 76.71 82.78 83.93 89.23

91.57 73.60 63.65 86.40 89.57 97.50

17.76 32.50 33.29 17.22 16.07 10.76

0.8626 0.6962 0.7201 0.8431 0.8637 0.9308

0.7413 0.4118 0.4071 0.6923 0.7362 0.8703

vi ← vi þ Uð0; ϕ1 Þ ðpi xi Þ þ Uð0; ϕ2 Þ pg xi xi ← xi þ vi

2.2. Algorithm&model

(1)

Step7 If a criterion is met (usually a sufﬁciently good ﬁtness or a maximum number of iterations), exit loop. Step8 End loop

Neural Network can be regarded as one of the most available algorithms in the ﬁeld of machine learning and classiﬁcation issue [47–50]. During several decades’ efforts, there are several types of neural network algorithms have been proposed [51–54]. In this work, we employed several types of neural network and related models in this identiﬁcation modiﬁcation sites problem. The detailed outline demonstrates in the Fig. 1. From the Fig. 1, we can easily ﬁnd out that the classiﬁcation model, including the neural network, extreme learning machine and support vector machine. Initially, we introduce the neural network, which is employed, in this work. There are many types of such models. In this work, we employ the Particle Swarm Optimization (PSO) algorithm to optimize the parameters of neural network [53,55,56]. The detailed steps show in algorithm 1. Algorithm 1 Original PSO:

Notes: U(0,φi) represents a vector of random numbers uniformly distributed in [0,φi] which is randomly generated at each iteration and for each particle. The symbol is component-wise multi-plication. In the original version of PSO, each component of vi is kept within the range [Vmax,þVmax]. In order to evaluate the adaptive such method, we utilized the adaptive particle swarm optimization (APSO) to enhance it [57–61]. The APSO can be described as Eq. (2):

vid ðt þ1Þ ¼ wvid ðtÞ þ c1randðÞ pid ðtÞ xid ðtÞ þc2randðÞ pgd ðtÞ xid ðtÞ xid ðt þ1Þ ¼ xid ðtÞþ vid ðt þ 1Þ 1in 1dD (2)

Step1 Initialization a population array of particles with random positions and velocities on D dimensions in the search space. Step2 loop Step3 For each particle, evaluate the desired optimization ﬁtness function in D variables. Step4 Comparison each particle's ﬁtness evaluation with its pbesti. If current value is better than pbesti, then set pbesti equal to the current value, and pi equal to the current location xi in D-dimensional space. Step5 Identiﬁcation the particle in the neighborhood with the best success so far, and assign its index to the variable g. Step6 Changing the velocity and position of the particle according to the Eq. (1):

Where, w is a new inertial weight. Such new algorithm by adjusting the parameter w can make w reduce gradually as the generation increases. Secondly, we employ a new fast learning neural algorithm referred to as extreme learning machine (ELM) with additive hidden nodes and radial basis function (RBF) kernels has been developed for single-hidden layer feedforward networks (SLFNs) in Refs. [62–65]. ELM [66,67] has been successfully applied to many real world applications [68–70] and has been shown to generate good generalization performance at extremely high learning speed [71–75]. The output of a SLFN with N hidden nodes (additive or RBF nodes in Eq. (3)) can be represented by Ref. [76]. fN ðxÞ ¼

N X

βi Gðx; ci ; ai Þx 2 Rn ;

c i 2 Rn

(3)

i¼1

Where, ci and ai are the learning parameters of hidden nodes, βi is the weight connecting the ith hidden node to the output node, and Gðx; ci ; ai Þ is the output of the ith hidden node with respect to the input x. For additive hidden nodes with the sigmoid or threshold activation function gðxÞ : R → R; Gðx; ci ; ai Þ is given by Eq. (4) Table 2 Comparison with other features.

Fig. 1. The outlines of this work. 66

Features

Sn(%)

Sp(%)

FNR(%)

F1

MCC

Binary Encoding AA Composition Grouping AA Composition Physicochemical Properties KNN Features Secondary Tendency Structure PSSM Binary Coding AA Pair Composition AA Appearance Proposed Method

43.36 64.14 41.78 55.53 64.94 59.96 51.20 64.04 76.24 82.71 89.23

75.80 52.79 76.04 63.93 55.85 57.40 69.39 78.60 75.32 78.55 97.50

56.64 35.86 58.22 44.47 35.06 40.04 48.80 35.96 23.76 17.29 10.77

0.5175 0.6070 0.5042 0.5796 0.6212 0.5920 0.5632 0.6907 0.7589 0.8102 0.9308

0.2026 0.1704 0.1897 0.1953 0.2088 0.1737 0.2094 0.4310 0.5156 0.6131 0.8703

W. Bao et al.

Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72

Table 3 The performance of proposed model.

f ðxÞ ¼ sign

L X

!

αi zi ðxÞ

(6)

i¼1

Model

Sens(%)

Spec(%)

FNR(%)

F1

MCC

ELMþA APSOþNNþA LSVMþA PSVMþA ELMþB APSOþNNþB LSVMþB PSVMþB ELMþC APSOþNNþC LSVMþC PSVMþC Final Model

91.33 74.12 88.71 86.78 94.32 75.97 74.97 99.80 92.07 68.23 87.87 86.49 89.23

80.05 69.75 79.32 68.78 96.69 62.64 51.85 36.17 88.57 50.33 57.72 60.90 97.50

8.67 25.88 11.29 13.22 5.68 24.03 25.03 0.20 7.93 31.77 12.13 13.51 10.76

0.8645 0.7254 0.8473 0.7961 0.9545 0.7122 0.6720 0.7571 0.9049 0.6263 0.7636 0.7668 0.9308

0.7184 0.4391 0.6833 0.5648 0.9104 0.3896 0.2757 0.4663 0.8069 0.1886 0.4782 0.4902 0.8703

Meanwhile, it was pointed that several subtypes of SVM have been proposed in this ﬁled. In this work, we utilize two subtypes, including LSVM and PSVM, to compare the proposed algorithm. 2.3. Features selection There are a large number of amino acid and protein features in this ﬁeld. Those features contain physical, chemical, statistical and other related areas [77–79]. Among these huge numbers of features, the membrane-buried preference parameters, confrontational parameter of β-turn and average ﬂexibility indices can be employed as the classiﬁcation feature in this work.

Notes: Feature A is the membrane-buried preference parameters. Feature B is the Confrontational Parameter of β-turn. And the feature C is the average ﬂexibility indices.

Gðx; ci ; ai Þ ¼ gðx; ci ; ai Þai 2 R

3. Results To test the performance of such proposed model, the Root Mean Square (RMS) [80–82], which has been utilized as evaluation function of the output results, has been employed [83–85]. The overall accuracy (OA) means the average computing of each sub-data set [86–89]. Moreover, the following several measuring performances have been evaluated the prediction accuracy, such as the Sensitivity (Sens), Speciﬁcity (Spec), F1 score, which is the harmonic mean of precision and sensitivity, and Matthews correlation coefﬁcient (MCC) value. The selected performances have been demonstrated in the equations (7)-(11).

(4)

Where, ci is the weight vector connecting the input layer to the ith hidden node, ai is the bias of the ith hidden node, and cix denotes the inner product of vectors ci and x in Rn. For RBF hidden nodes with the Gaussian or triangular activation function gðxÞ : R → R; Gðx; ci ; ai Þ is given by Gðx; ci ; ai Þ ¼ gðai kx ci kÞai 2 Rþ

(5)

Where, ci and ai are the center and impact factor of ith RBF node. Rþ indicates the set of all positive real values. Last but not least, we employ the support vector machine (SVM) as the classiﬁcation model. Such model was proposed by Cortes and Vapnik and has been widely utilized in several classiﬁcation and regression issues in the ﬁeld of machine learning. In such modiﬁcation sites feature space, a linear decision function is constructed eq. (6).

Sens ¼

TP TP þ FN

(7)

Spec ¼

TN TN þ FP

(8)

FNR ¼

FN FN ¼ P FN þ TP

(9)

Fig. 2. The ROC curses of membrane-buried preference parameters. 67

W. Bao et al.

Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72

Fig. 3. The ROC Curses of Confrontational Parameter of β-turn.

F1 ¼

2TP FP þ FN þ 2TP

approaches. So from Table 1 and Table 2, on the one hand, these evaluation indicators on the two-type classiﬁcation demonstrate the various functions in the ﬁeld of identiﬁcation such type modiﬁcation sites among these employed data. From the Table 3, it will be obvious ﬁnd out that the Sens parameter can range from 68.23% to 99.80%. The second parameter's scope can range from 50.33% to 97.50%. At the same time, the FDR's value can from 0.20% to 31.77%. The F1's evaluation parameter

(10)

TP TN FP FN MCC ¼ pﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ ðTP þ FPÞ ðTP þ FNÞ ðTN þ FPÞ ðTN þ FNÞ

(11)

Therefore, the performance of this algorithm compare with other

Fig. 4. The ROC curses of average ﬂexibility indices. 68

W. Bao et al.

Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72

proposed in this research. The following three ﬁgures show the Receiver Operating Characteristic (ROC) curses of the features, including membrane-buried preference parameters, confrontational parameter of β-turn and average ﬂexibility indices. So, it was pointed that the property of average ﬂexibility indices seems to play the most signiﬁcant role among these three features and the ROCs show in Fig. 2–4. On the other hand, the combination of the different features also seems to be an interesting issue in such type of classiﬁcation of the modiﬁcation sites. In this work, the employed classiﬁcation features, which include the membrane-buried preference parameters, confrontational parameter of β-turn and average ﬂexibility indices, are the properties of amino acid residues from the AAIndex database. Therefore, the three selected features will have several different kinds of combination and the detailed information shows in Table 4. The Adaptive PSOþNN, ELM, LSVM and PSVM have been employed as the independent classiﬁcation model, respectively. The detail ROC curses of these combination types have been demonstrated in the Fig. 5–8. So, the above ROC curses of feature combination demonstrate that the ﬁrst combination, including the membrane-buried preference parameters and confrontational parameter of β-turn, merely the PSVM and ELM algorithm can get some ideal classiﬁcation results in the identiﬁcation such modiﬁcation sites. The other two approaches, however, can hardly get the well performance in this combination. The second combination, including confrontational parameter of β-turn and average ﬂexibility indices, merely achieve the relatively ideal classiﬁcation results with the method of extreme learning machine. Unfortunately, the other three models could hardly get the ideal results by utilizing the second feature combination. The next combination contains the feature of the membrane-buried preference parameters and average ﬂexibility indices. So, in this combination, the performance of the PSVM and the LSVM's will get the relatively ideal results, which is slightly lower than ELM. The last combination, including all the employed features, will get the results with the approach of ELM. Nevertheless, the adaptive PSOþNN can hardly get

Table 4 The combination types of features. Combination

Feature Index

I II III IV

A,B A,C B,C A,B,C

Notes: Feature A is the membrane-buried preference parameters. Feature B is the Confrontational Parameter of β-turn. And the feature C is the average ﬂexibility indices.

can range from 0.6263 to 0.9545. The last evaluation index of the 15 selected features can from 0.1886 to 0.9104. On the other hand, it is easily to ﬁnd that the extreme learning machine has the well ability to classiﬁcation the three types of selected properties in this research. At the same time, it is interesting to ﬁnd out that the three types of properties have various roles in the identiﬁcation the post modiﬁcation sites. The LSVM algorithm also has the well performance in the classiﬁcation task. And then the importance of the three types of properties follows the same way of the ELM algorithm. However, the model of PSVM can hardly follow the identiﬁcation feature importance, which confrontational parameter of β-turn plays the most signiﬁcant role in such classiﬁcation issue. Nevertheless, the APSOþNN follow the same importance as the PSVM model's. 4. Discussions and conclusions 4.1. Discussions In order to ﬁnd out the deep importance of the selected three types of amino acid residues’ properties, some combination of different algorithms have been utilized in the work So, the following step, the ensemble classiﬁcation model of different algorithm have been

Fig. 5. The ROC curses of 1st combination. 69

W. Bao et al.

Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72

Fig. 6. The ROC curses of 2nd combination.

Modiﬁcation Sites with Ensemble Neural Network (CMSENN), was proposed to detect protein modiﬁcation. The algorithm mainly consists of several steps: First, the predicted peptide sequences translate to the feature vectors. Second, the three types of employed amino acid residues properties should be normalized. Finally, various combination of features and classiﬁcation model have been compared the performances with several current typical algorithms. The results demonstrate that the

the accurately results. Therefore, in the following work, several features should be utilized in our identiﬁcation model. 4.2. Conclusions In this paper, with the method of different types of neural network algorithm ensemble, a novel method, named Computational

Fig. 7. The ROC curses of 3th combination. 70

W. Bao et al.

Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72

Fig. 8. The ROC curses of 4th combination.

proposed model have well performance at the sensitivity, speciﬁcity, F1 score and MCC in the identiﬁcation modiﬁcation with the approach of the selected features and algorithm combination. With this method, we ﬁnd that the combination features and the ensemble neural network play key roles in this classiﬁcation issue. In the next step, we hope to discover more key features and more effective classiﬁcation model in this ﬁeld.

[9] T.R. Hughes, et al., Functional discovery via a compendium of expression proﬁles, Cell 102 (1) (2000) 109–126. [10] M. Pellegrini, E.M. Marcotte, M.J. Thompson, D. Eisenberg, T.O. Yeates, Assigning protein functions by comparative genome analysis: protein phylogenetic proﬁles, Proc. Natl. Acad. Sci. U.S.A. 96 (8) (1999) 4285–4288. [11] D. Eisenberg, E.M. Marcotte, I. Xenarios, T.O. Yeates, Protein function in the postgenomic era, Nature 405 (6788) (2000) 823–826. [12] C. Von Mering, et al., Comparative assessment of large-scale data sets of protein–protein interactions, Nature 417 (6887) (2002) 399–403. [13] C.T. Walsh, S. Garneautsodikova, G.J. Gatto, Protein posttranslational modiﬁcations: the chemistry of proteome diversiﬁcations, Angew. Chem. 44 (45) (2005) 7342–7372. [14] S.E. Mayer, E.G. Krebs, Studies on the phosphorylation and activation of skeletal muscle phosphorylase and phosphorylase kinase in vivo, J. Biol. Chem. 245 (12) (1970) 3153–3160. [15] H.E. Varmus, H. Hirai, D.O. Morgan, J.M. Kaplan, J.M. Bishop, Function, location, and regulation of the src protein-tyrosine kinase, Princess Takamatsu Symp. 20 (1989) 63. [16] B.M. Sefton, T. Hunter, K. Beemon, W. Eckhart, Evidence that the phosphorylation of tyrosine is essential for cellular transformation by Rous sarcoma virus, Cell 20 (3) (1980) 807–816. [17] R.B. Pearson, B.E. Kemp, Protein kinase phosphorylation site sequences and consensus speciﬁcity motifs: tabulations, Methods Enzymol. 200 (1991) 62–81. [18] F. Diella, et al., Phospho.ELM: a database of experimentally veriﬁed phosphorylation sites in eukaryotic proteins, BMC Bioinf. 5 (1) (2004), 79-79. [19] D. Francesca, C.M. Gould, C. Claudia, V. Allegra, T.J. Gibson, E.L.M. Phospho, A database of phosphorylation sites–update 2008, Nucleic Acids Res. 36 (Database issue) (2008) 240–244. [20] Y. Xue, et al., GPS 2.1: enhanced prediction of kinase-speciﬁc phosphorylation sites with an algorithm of motif length selection, Protein Eng. Des. Sel. 24 (3) (2011) 255–260. [21] H. Steen, J.A. Jebanathirajah, J. Rush, N. Morrice, M.W. Kirschner, Phosphorylation analysis by mass spectrometry myths, facts, and the consequences for qualitative and quantitative measurements, Mol. Cell. Proteomics 5 (1) (2006) 172–181. [22] N. Farriolmathis, et al., Annotation of post-translational modiﬁcations in the SwissProt knowledge base, Proteomics 4 (6) (2004) 1537–1550. [23] W. Bao, Y. Chen, D. Wang, Prediction of protein structure classes with ﬂexible neural tree, Bio Med. Mater. Eng. 24 (6) (2014) 3797–3806. [24] R. Apweiler, H. Hermjakob, N. Sharon, On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database, Biochim. Biophys. Acta 1473 (1) (1999) 4–8. [25] B. Boeckmann, et al., The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Res. 31 (1) (2003) 365–370. [26] A.M. Bairoch, R. Apweiler, The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000, Nucleic Acids Res. 28 (1) (2000) 45–48. [27] F. Diella, et al., Understanding eukaryotic linear motifs and their role in cell signaling and regulation, Front. Biosci. 13 (13) (2008) 6580.

Acknowledgment This work was supported by the grants of the National Science Foundation of China, Nos. 61702445, 61873270, 61520106006, 31571364, 61532008, 61572364, 61411140249, 61402334, 61472282, 61472280, 61472173, 61572447, and 61373098, China Postdoctoral Science Foundation Grant, Nos. 2014M561513, and partly supported by the National High-Tech R&D Program (863) (2014AA021502 & 2015AA020101), and the grant from the Ph.D. Programs Foundation of Ministry of Education of China (No. 20120072110040). References [1] E.S. Lander, L. Linton, B. Birren, International human genome sequencing consortium, Nature 431 (2001) 931–945. [2] J.C. Venter, et al., The sequence of the human genome, Science 291 (5507) (2001) 1304–1351. [3] A.M. Lesk, L.L. Conte, T. Hubbard, Assessment of novel fold targets in CASP4: predictions of three-dimensional structures, secondary structures, and interresidue contacts, Proteins 45 (2001) 98–118. [4] B. Wang, P. Chen, D. Huang, J. Li, T. Lok, M.R. Lyu, Predicting protein interaction sites from residue spatial sequence proﬁle and evolution rate, FEBS (Fed. Eur. Biochem. Soc.) Lett. 580 (2) (2006) 380–384. [5] P.O. Brown, D. Botstein, Exploring the new world of the genome with DNA microarrays, Nat. Genet. 21 (1999) 33–37. [6] D. Huang, C. Zheng, Independent component analysis-based penalized discriminant method for tumor classiﬁcation using gene expression data, Bioinformatics 22 (15) (2006) 1855–1862. [7] E.M. Marcotte, M. Pellegrini, H. Ng, D.W. Rice, T.O. Yeates, D. Eisenberg, Detecting protein function and protein-protein interactions from genome sequences, Science 285 (5428) (1999) 751–753. [8] E.M. Marcotte, M. Pellegrini, M.J. Thompson, T.O. Yeates, D. Eisenberg, A combined algorithm for genome-wide prediction of protein function, Nature 402 (6757) (1999) 83–86.

71

W. Bao et al.

Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72 [61] D.W. Boeringer, D.H. Werner, Particle swarm optimization versus genetic algorithms for phased array synthesis, IEEE Trans. Antenn. Propag. 52 (3) (2004) 771–779. [62] J. Salerno, Using the particle swarm optimization technique to train a recurrent neural model. International Conference on Tools with Artiﬁcial Intelligence, 1997, pp. 45–49. [63] C. Zhang, H. Shao, Y. Li, Particle swarm optimisation for evolving artiﬁcial neural network, in: Systems Man and Cybernetics, vol. 4, 2000, pp. 2487–2490. [64] Y. Zhang, S. Wang, P. Phillips, G. Ji, Binary PSO with mutation operator for feature selection using decision tree applied to spam detection, Knowl. Base Syst. 64 (1) (2014) 22–31. [65] M. Sharaﬁ, T.Y. Elmekkawy, Multi-objective optimal design of hybrid renewable energy systems using PSO-simulation based approach, Renew. Energy 68 (2014) 67–79. [66] H.H. Inbarani, A.T. Azar, G. Jothi, Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis, Comput. Methods Progr. Biomed. 113 (1) (2014) 175–185. [67] L. Zhang, Y. Tang, C. Hua, X. Guan, A new particle swarm optimization algorithm with adaptive inertia weight based on Bayesian techniques, Appl. Soft Comput. 28 (2015) 138–149. [68] F. Valdez, P. Melin, O. Castillo, Modular Neural Networks architecture optimization with a new nature inspired method using a fuzzy combination of Particle Swarm Optimization and Genetic Algorithms, Inf. Sci. 270 (2014) 143–153. [69] I. Fister, X. Yang, J. Brest, "A Brief Review of Nature-Inspired Algorithms for Optimization," arXiv: Neural and Evolutionary Computing, 2013. [70] M. Mahi, O.K. Baykan, H. Kodaz, A new hybrid method based on particle swarm optimization, ant colony optimization and 3-Opt algorithms for traveling salesman problem, Appl. Soft Comput. 30 (2015) 484–490. [71] T. Khatib, A. Mohamed, K. Sopian, A review of photovoltaic systems size optimization techniques, Renew. Sustain. Energy Rev. 22 (22) (2013) 454–465. [72] C. Sbarufatti, M. Corbetta, M. Giglio, F. Cadini, Adaptive prognosis of lithium-ion batteries based on the combination of particle ﬁlters and radial basis function neural networks, J. Power Sources 344 (2017) 128–140. [73] A.R.H. Heryudono, T.A. Driscoll, Radial basis function interpolation on irregular domain through conformal transplantation, J. Sci. Comput. 44 (3) (2010) 286–300. [74] G. Huang, Q. Zhu, C.K. Siew, Real-time learning capability of neural networks, IEEE Trans. Neural Network. 17 (4) (2006) 863–878. [75] J. Park, I.W. Sandberg, Universal approximation using radial-basis-function networks, Neural Comput. 3 (2) (2014) 246–257. [76] H. Lu, C. Tsai, M. Chang, Radial basis function neural network with sliding mode control for robotic manipulators, in: Systems, Man and Cybernetics, 2010, pp. 1209–1215. [77] G. Huang, Q. Zhu, C.K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (1) (2006) 489–501. [78] N. Liang, P. Saratchandran, G. Huang, N. Sundararajan, CLASSIFICATION OF MENTAL TASKS FROM EEG SIGNALS USING EXTREME LEARNING MACHINE, Int. J. Neural Syst. 16 (1) (2006) 29–38. [79] S.D. Handoko, K.C. Keong, O.Y. Soon, G.L. Zhang, V. Brusic, Extreme learning machine for predicting HLA-Peptide binding, Int. Symp. Neural Network. 3973 (2006) 716–721. [80] J. Xu, W. Wang, J.C.H. Goh, G. Lee, Internal model approach for gait modeling and classiﬁcation, in: International Conference of the Ieee Engineering in Medicine and Biology Society, vol. 7, 2005, pp. 7688–7691. [81] C.T. Yeu, M. Lim, G. Huang, A. Agarwal, Y. Ong, A new machine learning paradigm for terrain reconstruction, Geosci. Rem. Sens. Lett. IEEE 3 (3) (2006) 382–386. [82] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297. [83] C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Discov. 2 (2) (1998) 121–167. [84] C. Hsu, C. Lin, A comparison of methods for multiclass support vector machines, IEEE Trans. Neural Network. 13 (2) (2002) 415–425. [85] Q. Liu, Q. He, Z. Shi, Extreme support vector machine classiﬁer, in: Knowledge Discovery and Data Mining, 2008, pp. 222–233. [86] B. Frenay, M. Verleysen, Using SVMs with randomised feature spaces: an extreme learning approach, in: The European Symposium on Artiﬁcial Neural Networks, 2010. [87] Y. Tang, H.H. Zhang, Multiclass proximal support vector machines, J. Comput. Graph Stat. 15 (2) (2006) 339–355. [88] J.A.K. Suykens, J. Vandewalle, Training multilayer perceptron classiﬁers based on a modiﬁed support vector method, IEEE Trans. Neural Network. 10 (4) (1999) 907–911. [89] S. Haykin, Neural networks: a comprehensive foundation, in: Neural Networks, A Comprehensive Foundation, 1994, pp. 71–80.

[28] H. Dinkel, et al., The eukaryotic linear motif resource ELM: 10 years and counting, Nucleic Acids Res. 42 (2014) 259–266. [29] H. Dinkel, et al., ELM 2016—data update and new functionality of the eukaryotic linear motif resource, Nucleic Acids Res. 44 (2016) 294–300. [30] R. Gupta, H. Birch, K. Rapacki, S. Brunak, J.E. Hansen, O-GLYCBASE version 2.0: a revised database of O-glycosylated proteins, Nucleic Acids Res. 27 (1) (1997) 370–372. [31] R. Gupta, H. Birch, K. Rapacki, S. Brunak, J.E. Hansen, O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins, Nucleic Acids Res. 27 (1) (1999) 370–372. [32] H. J E, L. O, N. J O, H. J E, B. S, O-GLYCBASE: a revised database of O-glycosylated proteins, Nucleic Acids Res. 24 (1) (1996) 248–252. [33] J.E. Hansen, O. Lund, J. Nilsson, K. Rapacki, S. Brunak, O-GLYCBASE Version 3.0: a revised database of O-glycosylated proteins, Nucleic Acids Res. 26 (1) (1998) 387–389. [34] W. Bao, Z. Huang, C.A. Yuan, D.S. Huang, Pupylation sites prediction with ensemble classiﬁcation model, Int. J. Data Min. Bioinf. 18 (2) (2017) 91–104. [35] W. Bao, D. Wang, Y. Chen, Classiﬁcation of protein structure classes on ﬂexible neutral tree, IEEE ACM Trans. Comput. Biol. Bioinf 14 (5) (2017) 1122–1133. [36] W. Bao, Z. Jiang, D. Huang, Novel human microbe-disease association prediction using network consistency projection, BMC Bioinf. 18 (16) (2017) 543. [37] W. Bao, et al., Mutli-features prediction of protein translational modiﬁcation sites, IEEE ACM Trans. Comput. Biol. Bioinf 15 (5) (2018) 1453–1460. [38] J.S. Garavelli, The RESID Database of Protein Modiﬁcations as a resource and annotation tool, Proteomics 4 (6) (2010) 1527–1533. [39] M.P. Luisa, et al., The PSI-MOD community standard for representation of protein modiﬁcation data, Nat. Biotechnol. 26 (8) (2008) 864–866. [40] J.S. Garavelli, The RESID database of protein modiﬁcations: 2003 developments, Nucleic Acids Res. 31 (1) (2003) 499–501. [41] G. J S, The RESID Database of protein structure modiﬁcations, Nucleic Acids Res. 27 (1) (2001) 198–199. [42] M.A. Harris, et al., The Gene Ontology (GO) database and informatics resource, Nucleic Acids Res. 32 (2004). [43] M. Ashburner, et al., Gene ontology: tool for the uniﬁcation of biology. The Gene Ontology Consortium, Nat. Genet. 25 (1) (2000) 25–29. [44] S. Maere, K. Heymans, M. Kuiper, BiNGO : a cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21 (16) (2005) 3448–3449. [45] D. Sylva, Creating the gene ontology resource: design and implementation, Genome Res. (2001). [46] C.H. Wu, et al., The protein information resource, Nucleic Acids Res. 31 (1) (2003) 345–347. [47] C.H. Wu, et al., The Protein Information Resource: an integrated public resource of functional annotation of proteins, Nucleic Acids Res. 30 (1) (2002) 35–37. [48] D. Huang, J. Du, A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks, IEEE Trans. Neural Network. 19 (12) (2008) 2099–2115. [49] A. Van Ooyen, B. Nienhuis, Improving the convergence of the back-propagation algorithm, Neural Network. 5 (3) (1992) 465–471. [50] W. Tong, R. Jin, Semi-supervised learning by mixed label propagation, in: National Conference on Artiﬁcial Intelligence, 2007, pp. 651–656. [51] R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Network. 1 (4) (1987) 295–307. [52] M.K. Weir, A method for self-determination of adaptive learning rates in back propagation, Neural Network. 4 (3) (1991) 371–379. [53] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182–197. [54] G.M. Morris, et al., Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function, J. Comput. Chem. 19 (14) (1998) 1639–1662. [55] G. Jones, P. Willett, R.C. Glen, A.R. Leach, R. Taylor, Development and validation of a genetic algorithm for ﬂexible docking, J. Mol. Biol. 267 (3) (1997) 727–748. [56] J. Kennedy, R.C. Eberhart, Particle swarm optimization, in: International Symposium on Neural Networks, vol. 4, 1995, pp. 1942–1948. [57] J. Kennedy, Small worlds and mega-minds: effects of neighborhood topology on particle swarm performance, in: Congress on Evolutionary Computation, vol. 3, 1999, pp. 1931–1938. [58] H. Yamazaki, L.R. Haury, A new Lagrangian model to study animal aggregation, Ecol. Model. 69 (1993) 99–111. [59] M.S. Arumugam, M.V.C. Rao, A. Chandramohan, A new and improved version of particle swarm optimization algorithm with global–local best parameters, Knowl. Inf. Syst. 16 (3) (2008) 331–357. [60] Eberhart, Y. Shi, Particle swarm optimization: developments, applications and resources, in: Congress on Evolutionary Computation, vol. 1, 2001, pp. 81–86, 1.

72

CMSENN: Computational Modification Sites with Ensemble Neural Network

CMSENN: Computational Modification Sites with Ensemble Neural Network

Recommend Documents