Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72
Contents lists available at ScienceDirect
Chemometrics and Intelligent Laboratory Systems journal homepage: www.elsevier.com/locate/chemometrics
CMSENN: Computational Modification Sites with Ensemble Neural Network Wenzheng Bao a, Bin Yang b, *, Dan Li a, Zhengwei Li c, Yong Zhou c, Rong Bao a a
School of Information and Electrical Engineering, Xuzhou University of Technology, Xuzhou, 221018, China School of Information Science and Engineering, Zaozhuang University, Zaozhuang, 277100, China c School of Computer Science and Technology, China University of Mining and Technology, Xuzhou, 221116, China b
A R T I C L E I N F O
A B S T R A C T
Keywords: Post translation modification Amino acid property Neural network
With the rapid development of high-through technology, vast amounts of protein molecular data has been generated, which is crucial to advance our understanding of biological organisms. An increasing number of protein post translation modification sites identification approaches have been designed and developed to detect such modification sites among the protein sequences. Nevertheless, these methods are merely suitable for one type of modification site, their performance deteriorate rapidly when applied to other types of modification sites’ prediction. In this paper, with the method of different types of neural network algorithm ensemble, a novel method, named CMSENN (http://121.250.173.184/) Computational Modification Sites with Ensemble Neural Network, was proposed to detect protein modification. The algorithm mainly consists of several steps: First, the predicted peptide sequences translate to the feature vectors. Second, the three types of employed amino acid residues properties should be normalized. Finally, various combination of features and classification model have been compared the performances with several current typical algorithms. The results demonstrate that the proposed model have well performance at the sensitivity, specificity, F1 score and Matthews correlation coefficient (MCC) value in the identification modification with the approach of the selected features and algorithm combination.
1. Introduction More than 30,000 genes can be shown in the human genome information. Unfortunately merely 40% of them can be reported in their biology roles [1,2]. The protein functions play key roles in many biological processes in the field of biology [3,4]. The traditional methods of protein functions identification are regarded as expensive and consuming [5–10]. And such methods can hardly be utilized in general function identification issue. We can find out that the protein functions may be caused by the protein post translational modification in the level of protein [11]. When it comes to post translational modification (PTM), such modification can be regarded as one type of complex biological modifications and plays the fundamental role in several key biological regulations. It was pointed that there are more than 500 types of modification in the protein level. These modification types play various roles in many regulation processes. Among these modification types phosphorylation can be treated as one of the most common types [12,13]. The discovery of phosphorylation can date back to 1990s. During these years’ efforts, such
modification has been reported in metabolic regulation and cancer research [14–23]. On the other hand, several art-of-the-state resources and methods should be listed in this work. The Swiss-Prot [24,25] includes a great deal of modification information and is one of the most well-known protein databases among the world [26]. Phospho.ELM [27], which was constructed by ELM (Eukaryotic Linear 50 Motif), can be treated as a novel resource in this filed [28,29]. O-GLYCBASE [30,31], which is glycoproteins database, contains the majority of O-linked glycosylation sites among several species [32,33]. In this work, we have the collected PTM data come from 65 external biological data sources. Therefore, we have designed computational algorithm to comprehensively identify phosphorylation sites, glycosylation sites and sulfation sites [34–37]. Protein structural properties and functional information, such as the membrane-buried preference parameters, confrontational parameter of β-turn and average flexibility indices are provided for researchers who are investigating the protein PTM mechanisms. Several types of neural network algorithms and models have been employed as the ensemble identification post translation modification sites tool.
* Corresponding author. E-mail address:
[email protected] (B. Yang). https://doi.org/10.1016/j.chemolab.2018.12.009 Received 18 October 2018; Received in revised form 11 December 2018; Accepted 21 December 2018 Available online 3 January 2019 0169-7439/© 2019 Elsevier B.V. All rights reserved.
W. Bao et al.
Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72
2. Methods and materials
Table 1 Comparison with other methods.
2.1. Data As is well known, different post translational modification may have their features [38–43]. Therefore, we select some different modification types, including glycosylation and phosphorylation sites, in this work. The employed modification sites data sources are extracted from three well-known resources, including Swiss-Prot, Phospho.ELM and O-GLYCBASE [43–46]. Meanwhile, we have utilized the CD-HIT to reduce the repeated and high similarity protein in this work.
Method
Sn(%)
Sp(%)
FNR(%)
F1
MCC
PUL-PUP PSoL SVM_balance Naïve Bayesian Neural Network Proposed Method
82.24 67.50 76.71 82.78 83.93 89.23
91.57 73.60 63.65 86.40 89.57 97.50
17.76 32.50 33.29 17.22 16.07 10.76
0.8626 0.6962 0.7201 0.8431 0.8637 0.9308
0.7413 0.4118 0.4071 0.6923 0.7362 0.8703
vi ← vi þ Uð0; ϕ1 Þ ðpi xi Þ þ Uð0; ϕ2 Þ pg xi xi ← xi þ vi
2.2. Algorithm&model
(1)
Step7 If a criterion is met (usually a sufficiently good fitness or a maximum number of iterations), exit loop. Step8 End loop
Neural Network can be regarded as one of the most available algorithms in the field of machine learning and classification issue [47–50]. During several decades’ efforts, there are several types of neural network algorithms have been proposed [51–54]. In this work, we employed several types of neural network and related models in this identification modification sites problem. The detailed outline demonstrates in the Fig. 1. From the Fig. 1, we can easily find out that the classification model, including the neural network, extreme learning machine and support vector machine. Initially, we introduce the neural network, which is employed, in this work. There are many types of such models. In this work, we employ the Particle Swarm Optimization (PSO) algorithm to optimize the parameters of neural network [53,55,56]. The detailed steps show in algorithm 1. Algorithm 1 Original PSO:
Notes: U(0,φi) represents a vector of random numbers uniformly distributed in [0,φi] which is randomly generated at each iteration and for each particle. The symbol is component-wise multi-plication. In the original version of PSO, each component of vi is kept within the range [Vmax,þVmax]. In order to evaluate the adaptive such method, we utilized the adaptive particle swarm optimization (APSO) to enhance it [57–61]. The APSO can be described as Eq. (2):
vid ðt þ1Þ ¼ wvid ðtÞ þ c1randðÞ pid ðtÞ xid ðtÞ þc2randðÞ pgd ðtÞ xid ðtÞ xid ðt þ1Þ ¼ xid ðtÞþ vid ðt þ 1Þ 1in 1dD (2)
Step1 Initialization a population array of particles with random positions and velocities on D dimensions in the search space. Step2 loop Step3 For each particle, evaluate the desired optimization fitness function in D variables. Step4 Comparison each particle's fitness evaluation with its pbesti. If current value is better than pbesti, then set pbesti equal to the current value, and pi equal to the current location xi in D-dimensional space. Step5 Identification the particle in the neighborhood with the best success so far, and assign its index to the variable g. Step6 Changing the velocity and position of the particle according to the Eq. (1):
Where, w is a new inertial weight. Such new algorithm by adjusting the parameter w can make w reduce gradually as the generation increases. Secondly, we employ a new fast learning neural algorithm referred to as extreme learning machine (ELM) with additive hidden nodes and radial basis function (RBF) kernels has been developed for single-hidden layer feedforward networks (SLFNs) in Refs. [62–65]. ELM [66,67] has been successfully applied to many real world applications [68–70] and has been shown to generate good generalization performance at extremely high learning speed [71–75]. The output of a SLFN with N hidden nodes (additive or RBF nodes in Eq. (3)) can be represented by Ref. [76]. fN ðxÞ ¼
N X
βi Gðx; ci ; ai Þx 2 Rn ;
c i 2 Rn
(3)
i¼1
Where, ci and ai are the learning parameters of hidden nodes, βi is the weight connecting the ith hidden node to the output node, and Gðx; ci ; ai Þ is the output of the ith hidden node with respect to the input x. For additive hidden nodes with the sigmoid or threshold activation function gðxÞ : R → R; Gðx; ci ; ai Þ is given by Eq. (4) Table 2 Comparison with other features.
Fig. 1. The outlines of this work. 66
Features
Sn(%)
Sp(%)
FNR(%)
F1
MCC
Binary Encoding AA Composition Grouping AA Composition Physicochemical Properties KNN Features Secondary Tendency Structure PSSM Binary Coding AA Pair Composition AA Appearance Proposed Method
43.36 64.14 41.78 55.53 64.94 59.96 51.20 64.04 76.24 82.71 89.23
75.80 52.79 76.04 63.93 55.85 57.40 69.39 78.60 75.32 78.55 97.50
56.64 35.86 58.22 44.47 35.06 40.04 48.80 35.96 23.76 17.29 10.77
0.5175 0.6070 0.5042 0.5796 0.6212 0.5920 0.5632 0.6907 0.7589 0.8102 0.9308
0.2026 0.1704 0.1897 0.1953 0.2088 0.1737 0.2094 0.4310 0.5156 0.6131 0.8703
W. Bao et al.
Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72
Table 3 The performance of proposed model.
f ðxÞ ¼ sign
L X
!
αi zi ðxÞ
(6)
i¼1
Model
Sens(%)
Spec(%)
FNR(%)
F1
MCC
ELMþA APSOþNNþA LSVMþA PSVMþA ELMþB APSOþNNþB LSVMþB PSVMþB ELMþC APSOþNNþC LSVMþC PSVMþC Final Model
91.33 74.12 88.71 86.78 94.32 75.97 74.97 99.80 92.07 68.23 87.87 86.49 89.23
80.05 69.75 79.32 68.78 96.69 62.64 51.85 36.17 88.57 50.33 57.72 60.90 97.50
8.67 25.88 11.29 13.22 5.68 24.03 25.03 0.20 7.93 31.77 12.13 13.51 10.76
0.8645 0.7254 0.8473 0.7961 0.9545 0.7122 0.6720 0.7571 0.9049 0.6263 0.7636 0.7668 0.9308
0.7184 0.4391 0.6833 0.5648 0.9104 0.3896 0.2757 0.4663 0.8069 0.1886 0.4782 0.4902 0.8703
Meanwhile, it was pointed that several subtypes of SVM have been proposed in this filed. In this work, we utilize two subtypes, including LSVM and PSVM, to compare the proposed algorithm. 2.3. Features selection There are a large number of amino acid and protein features in this field. Those features contain physical, chemical, statistical and other related areas [77–79]. Among these huge numbers of features, the membrane-buried preference parameters, confrontational parameter of β-turn and average flexibility indices can be employed as the classification feature in this work.
Notes: Feature A is the membrane-buried preference parameters. Feature B is the Confrontational Parameter of β-turn. And the feature C is the average flexibility indices.
Gðx; ci ; ai Þ ¼ gðx; ci ; ai Þai 2 R
3. Results To test the performance of such proposed model, the Root Mean Square (RMS) [80–82], which has been utilized as evaluation function of the output results, has been employed [83–85]. The overall accuracy (OA) means the average computing of each sub-data set [86–89]. Moreover, the following several measuring performances have been evaluated the prediction accuracy, such as the Sensitivity (Sens), Specificity (Spec), F1 score, which is the harmonic mean of precision and sensitivity, and Matthews correlation coefficient (MCC) value. The selected performances have been demonstrated in the equations (7)-(11).
(4)
Where, ci is the weight vector connecting the input layer to the ith hidden node, ai is the bias of the ith hidden node, and cix denotes the inner product of vectors ci and x in Rn. For RBF hidden nodes with the Gaussian or triangular activation function gðxÞ : R → R; Gðx; ci ; ai Þ is given by Gðx; ci ; ai Þ ¼ gðai kx ci kÞai 2 Rþ
(5)
Where, ci and ai are the center and impact factor of ith RBF node. Rþ indicates the set of all positive real values. Last but not least, we employ the support vector machine (SVM) as the classification model. Such model was proposed by Cortes and Vapnik and has been widely utilized in several classification and regression issues in the field of machine learning. In such modification sites feature space, a linear decision function is constructed eq. (6).
Sens ¼
TP TP þ FN
(7)
Spec ¼
TN TN þ FP
(8)
FNR ¼
FN FN ¼ P FN þ TP
(9)
Fig. 2. The ROC curses of membrane-buried preference parameters. 67
W. Bao et al.
Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72
Fig. 3. The ROC Curses of Confrontational Parameter of β-turn.
F1 ¼
2TP FP þ FN þ 2TP
approaches. So from Table 1 and Table 2, on the one hand, these evaluation indicators on the two-type classification demonstrate the various functions in the field of identification such type modification sites among these employed data. From the Table 3, it will be obvious find out that the Sens parameter can range from 68.23% to 99.80%. The second parameter's scope can range from 50.33% to 97.50%. At the same time, the FDR's value can from 0.20% to 31.77%. The F1's evaluation parameter
(10)
TP TN FP FN MCC ¼ pffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ðTP þ FPÞ ðTP þ FNÞ ðTN þ FPÞ ðTN þ FNÞ
(11)
Therefore, the performance of this algorithm compare with other
Fig. 4. The ROC curses of average flexibility indices. 68
W. Bao et al.
Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72
proposed in this research. The following three figures show the Receiver Operating Characteristic (ROC) curses of the features, including membrane-buried preference parameters, confrontational parameter of β-turn and average flexibility indices. So, it was pointed that the property of average flexibility indices seems to play the most significant role among these three features and the ROCs show in Fig. 2–4. On the other hand, the combination of the different features also seems to be an interesting issue in such type of classification of the modification sites. In this work, the employed classification features, which include the membrane-buried preference parameters, confrontational parameter of β-turn and average flexibility indices, are the properties of amino acid residues from the AAIndex database. Therefore, the three selected features will have several different kinds of combination and the detailed information shows in Table 4. The Adaptive PSOþNN, ELM, LSVM and PSVM have been employed as the independent classification model, respectively. The detail ROC curses of these combination types have been demonstrated in the Fig. 5–8. So, the above ROC curses of feature combination demonstrate that the first combination, including the membrane-buried preference parameters and confrontational parameter of β-turn, merely the PSVM and ELM algorithm can get some ideal classification results in the identification such modification sites. The other two approaches, however, can hardly get the well performance in this combination. The second combination, including confrontational parameter of β-turn and average flexibility indices, merely achieve the relatively ideal classification results with the method of extreme learning machine. Unfortunately, the other three models could hardly get the ideal results by utilizing the second feature combination. The next combination contains the feature of the membrane-buried preference parameters and average flexibility indices. So, in this combination, the performance of the PSVM and the LSVM's will get the relatively ideal results, which is slightly lower than ELM. The last combination, including all the employed features, will get the results with the approach of ELM. Nevertheless, the adaptive PSOþNN can hardly get
Table 4 The combination types of features. Combination
Feature Index
I II III IV
A,B A,C B,C A,B,C
Notes: Feature A is the membrane-buried preference parameters. Feature B is the Confrontational Parameter of β-turn. And the feature C is the average flexibility indices.
can range from 0.6263 to 0.9545. The last evaluation index of the 15 selected features can from 0.1886 to 0.9104. On the other hand, it is easily to find that the extreme learning machine has the well ability to classification the three types of selected properties in this research. At the same time, it is interesting to find out that the three types of properties have various roles in the identification the post modification sites. The LSVM algorithm also has the well performance in the classification task. And then the importance of the three types of properties follows the same way of the ELM algorithm. However, the model of PSVM can hardly follow the identification feature importance, which confrontational parameter of β-turn plays the most significant role in such classification issue. Nevertheless, the APSOþNN follow the same importance as the PSVM model's. 4. Discussions and conclusions 4.1. Discussions In order to find out the deep importance of the selected three types of amino acid residues’ properties, some combination of different algorithms have been utilized in the work So, the following step, the ensemble classification model of different algorithm have been
Fig. 5. The ROC curses of 1st combination. 69
W. Bao et al.
Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72
Fig. 6. The ROC curses of 2nd combination.
Modification Sites with Ensemble Neural Network (CMSENN), was proposed to detect protein modification. The algorithm mainly consists of several steps: First, the predicted peptide sequences translate to the feature vectors. Second, the three types of employed amino acid residues properties should be normalized. Finally, various combination of features and classification model have been compared the performances with several current typical algorithms. The results demonstrate that the
the accurately results. Therefore, in the following work, several features should be utilized in our identification model. 4.2. Conclusions In this paper, with the method of different types of neural network algorithm ensemble, a novel method, named Computational
Fig. 7. The ROC curses of 3th combination. 70
W. Bao et al.
Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72
Fig. 8. The ROC curses of 4th combination.
proposed model have well performance at the sensitivity, specificity, F1 score and MCC in the identification modification with the approach of the selected features and algorithm combination. With this method, we find that the combination features and the ensemble neural network play key roles in this classification issue. In the next step, we hope to discover more key features and more effective classification model in this field.
[9] T.R. Hughes, et al., Functional discovery via a compendium of expression profiles, Cell 102 (1) (2000) 109–126. [10] M. Pellegrini, E.M. Marcotte, M.J. Thompson, D. Eisenberg, T.O. Yeates, Assigning protein functions by comparative genome analysis: protein phylogenetic profiles, Proc. Natl. Acad. Sci. U.S.A. 96 (8) (1999) 4285–4288. [11] D. Eisenberg, E.M. Marcotte, I. Xenarios, T.O. Yeates, Protein function in the postgenomic era, Nature 405 (6788) (2000) 823–826. [12] C. Von Mering, et al., Comparative assessment of large-scale data sets of protein–protein interactions, Nature 417 (6887) (2002) 399–403. [13] C.T. Walsh, S. Garneautsodikova, G.J. Gatto, Protein posttranslational modifications: the chemistry of proteome diversifications, Angew. Chem. 44 (45) (2005) 7342–7372. [14] S.E. Mayer, E.G. Krebs, Studies on the phosphorylation and activation of skeletal muscle phosphorylase and phosphorylase kinase in vivo, J. Biol. Chem. 245 (12) (1970) 3153–3160. [15] H.E. Varmus, H. Hirai, D.O. Morgan, J.M. Kaplan, J.M. Bishop, Function, location, and regulation of the src protein-tyrosine kinase, Princess Takamatsu Symp. 20 (1989) 63. [16] B.M. Sefton, T. Hunter, K. Beemon, W. Eckhart, Evidence that the phosphorylation of tyrosine is essential for cellular transformation by Rous sarcoma virus, Cell 20 (3) (1980) 807–816. [17] R.B. Pearson, B.E. Kemp, Protein kinase phosphorylation site sequences and consensus specificity motifs: tabulations, Methods Enzymol. 200 (1991) 62–81. [18] F. Diella, et al., Phospho.ELM: a database of experimentally verified phosphorylation sites in eukaryotic proteins, BMC Bioinf. 5 (1) (2004), 79-79. [19] D. Francesca, C.M. Gould, C. Claudia, V. Allegra, T.J. Gibson, E.L.M. Phospho, A database of phosphorylation sites–update 2008, Nucleic Acids Res. 36 (Database issue) (2008) 240–244. [20] Y. Xue, et al., GPS 2.1: enhanced prediction of kinase-specific phosphorylation sites with an algorithm of motif length selection, Protein Eng. Des. Sel. 24 (3) (2011) 255–260. [21] H. Steen, J.A. Jebanathirajah, J. Rush, N. Morrice, M.W. Kirschner, Phosphorylation analysis by mass spectrometry myths, facts, and the consequences for qualitative and quantitative measurements, Mol. Cell. Proteomics 5 (1) (2006) 172–181. [22] N. Farriolmathis, et al., Annotation of post-translational modifications in the SwissProt knowledge base, Proteomics 4 (6) (2004) 1537–1550. [23] W. Bao, Y. Chen, D. Wang, Prediction of protein structure classes with flexible neural tree, Bio Med. Mater. Eng. 24 (6) (2014) 3797–3806. [24] R. Apweiler, H. Hermjakob, N. Sharon, On the frequency of protein glycosylation, as deduced from analysis of the SWISS-PROT database, Biochim. Biophys. Acta 1473 (1) (1999) 4–8. [25] B. Boeckmann, et al., The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003, Nucleic Acids Res. 31 (1) (2003) 365–370. [26] A.M. Bairoch, R. Apweiler, The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000, Nucleic Acids Res. 28 (1) (2000) 45–48. [27] F. Diella, et al., Understanding eukaryotic linear motifs and their role in cell signaling and regulation, Front. Biosci. 13 (13) (2008) 6580.
Acknowledgment This work was supported by the grants of the National Science Foundation of China, Nos. 61702445, 61873270, 61520106006, 31571364, 61532008, 61572364, 61411140249, 61402334, 61472282, 61472280, 61472173, 61572447, and 61373098, China Postdoctoral Science Foundation Grant, Nos. 2014M561513, and partly supported by the National High-Tech R&D Program (863) (2014AA021502 & 2015AA020101), and the grant from the Ph.D. Programs Foundation of Ministry of Education of China (No. 20120072110040). References [1] E.S. Lander, L. Linton, B. Birren, International human genome sequencing consortium, Nature 431 (2001) 931–945. [2] J.C. Venter, et al., The sequence of the human genome, Science 291 (5507) (2001) 1304–1351. [3] A.M. Lesk, L.L. Conte, T. Hubbard, Assessment of novel fold targets in CASP4: predictions of three-dimensional structures, secondary structures, and interresidue contacts, Proteins 45 (2001) 98–118. [4] B. Wang, P. Chen, D. Huang, J. Li, T. Lok, M.R. Lyu, Predicting protein interaction sites from residue spatial sequence profile and evolution rate, FEBS (Fed. Eur. Biochem. Soc.) Lett. 580 (2) (2006) 380–384. [5] P.O. Brown, D. Botstein, Exploring the new world of the genome with DNA microarrays, Nat. Genet. 21 (1999) 33–37. [6] D. Huang, C. Zheng, Independent component analysis-based penalized discriminant method for tumor classification using gene expression data, Bioinformatics 22 (15) (2006) 1855–1862. [7] E.M. Marcotte, M. Pellegrini, H. Ng, D.W. Rice, T.O. Yeates, D. Eisenberg, Detecting protein function and protein-protein interactions from genome sequences, Science 285 (5428) (1999) 751–753. [8] E.M. Marcotte, M. Pellegrini, M.J. Thompson, T.O. Yeates, D. Eisenberg, A combined algorithm for genome-wide prediction of protein function, Nature 402 (6757) (1999) 83–86.
71
W. Bao et al.
Chemometrics and Intelligent Laboratory Systems 185 (2019) 65–72 [61] D.W. Boeringer, D.H. Werner, Particle swarm optimization versus genetic algorithms for phased array synthesis, IEEE Trans. Antenn. Propag. 52 (3) (2004) 771–779. [62] J. Salerno, Using the particle swarm optimization technique to train a recurrent neural model. International Conference on Tools with Artificial Intelligence, 1997, pp. 45–49. [63] C. Zhang, H. Shao, Y. Li, Particle swarm optimisation for evolving artificial neural network, in: Systems Man and Cybernetics, vol. 4, 2000, pp. 2487–2490. [64] Y. Zhang, S. Wang, P. Phillips, G. Ji, Binary PSO with mutation operator for feature selection using decision tree applied to spam detection, Knowl. Base Syst. 64 (1) (2014) 22–31. [65] M. Sharafi, T.Y. Elmekkawy, Multi-objective optimal design of hybrid renewable energy systems using PSO-simulation based approach, Renew. Energy 68 (2014) 67–79. [66] H.H. Inbarani, A.T. Azar, G. Jothi, Supervised hybrid feature selection based on PSO and rough sets for medical diagnosis, Comput. Methods Progr. Biomed. 113 (1) (2014) 175–185. [67] L. Zhang, Y. Tang, C. Hua, X. Guan, A new particle swarm optimization algorithm with adaptive inertia weight based on Bayesian techniques, Appl. Soft Comput. 28 (2015) 138–149. [68] F. Valdez, P. Melin, O. Castillo, Modular Neural Networks architecture optimization with a new nature inspired method using a fuzzy combination of Particle Swarm Optimization and Genetic Algorithms, Inf. Sci. 270 (2014) 143–153. [69] I. Fister, X. Yang, J. Brest, "A Brief Review of Nature-Inspired Algorithms for Optimization," arXiv: Neural and Evolutionary Computing, 2013. [70] M. Mahi, O.K. Baykan, H. Kodaz, A new hybrid method based on particle swarm optimization, ant colony optimization and 3-Opt algorithms for traveling salesman problem, Appl. Soft Comput. 30 (2015) 484–490. [71] T. Khatib, A. Mohamed, K. Sopian, A review of photovoltaic systems size optimization techniques, Renew. Sustain. Energy Rev. 22 (22) (2013) 454–465. [72] C. Sbarufatti, M. Corbetta, M. Giglio, F. Cadini, Adaptive prognosis of lithium-ion batteries based on the combination of particle filters and radial basis function neural networks, J. Power Sources 344 (2017) 128–140. [73] A.R.H. Heryudono, T.A. Driscoll, Radial basis function interpolation on irregular domain through conformal transplantation, J. Sci. Comput. 44 (3) (2010) 286–300. [74] G. Huang, Q. Zhu, C.K. Siew, Real-time learning capability of neural networks, IEEE Trans. Neural Network. 17 (4) (2006) 863–878. [75] J. Park, I.W. Sandberg, Universal approximation using radial-basis-function networks, Neural Comput. 3 (2) (2014) 246–257. [76] H. Lu, C. Tsai, M. Chang, Radial basis function neural network with sliding mode control for robotic manipulators, in: Systems, Man and Cybernetics, 2010, pp. 1209–1215. [77] G. Huang, Q. Zhu, C.K. Siew, Extreme learning machine: theory and applications, Neurocomputing 70 (1) (2006) 489–501. [78] N. Liang, P. Saratchandran, G. Huang, N. Sundararajan, CLASSIFICATION OF MENTAL TASKS FROM EEG SIGNALS USING EXTREME LEARNING MACHINE, Int. J. Neural Syst. 16 (1) (2006) 29–38. [79] S.D. Handoko, K.C. Keong, O.Y. Soon, G.L. Zhang, V. Brusic, Extreme learning machine for predicting HLA-Peptide binding, Int. Symp. Neural Network. 3973 (2006) 716–721. [80] J. Xu, W. Wang, J.C.H. Goh, G. Lee, Internal model approach for gait modeling and classification, in: International Conference of the Ieee Engineering in Medicine and Biology Society, vol. 7, 2005, pp. 7688–7691. [81] C.T. Yeu, M. Lim, G. Huang, A. Agarwal, Y. Ong, A new machine learning paradigm for terrain reconstruction, Geosci. Rem. Sens. Lett. IEEE 3 (3) (2006) 382–386. [82] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (3) (1995) 273–297. [83] C.J.C. Burges, A tutorial on support vector machines for pattern recognition, Data Min. Knowl. Discov. 2 (2) (1998) 121–167. [84] C. Hsu, C. Lin, A comparison of methods for multiclass support vector machines, IEEE Trans. Neural Network. 13 (2) (2002) 415–425. [85] Q. Liu, Q. He, Z. Shi, Extreme support vector machine classifier, in: Knowledge Discovery and Data Mining, 2008, pp. 222–233. [86] B. Frenay, M. Verleysen, Using SVMs with randomised feature spaces: an extreme learning approach, in: The European Symposium on Artificial Neural Networks, 2010. [87] Y. Tang, H.H. Zhang, Multiclass proximal support vector machines, J. Comput. Graph Stat. 15 (2) (2006) 339–355. [88] J.A.K. Suykens, J. Vandewalle, Training multilayer perceptron classifiers based on a modified support vector method, IEEE Trans. Neural Network. 10 (4) (1999) 907–911. [89] S. Haykin, Neural networks: a comprehensive foundation, in: Neural Networks, A Comprehensive Foundation, 1994, pp. 71–80.
[28] H. Dinkel, et al., The eukaryotic linear motif resource ELM: 10 years and counting, Nucleic Acids Res. 42 (2014) 259–266. [29] H. Dinkel, et al., ELM 2016—data update and new functionality of the eukaryotic linear motif resource, Nucleic Acids Res. 44 (2016) 294–300. [30] R. Gupta, H. Birch, K. Rapacki, S. Brunak, J.E. Hansen, O-GLYCBASE version 2.0: a revised database of O-glycosylated proteins, Nucleic Acids Res. 27 (1) (1997) 370–372. [31] R. Gupta, H. Birch, K. Rapacki, S. Brunak, J.E. Hansen, O-GLYCBASE version 4.0: a revised database of O-glycosylated proteins, Nucleic Acids Res. 27 (1) (1999) 370–372. [32] H. J E, L. O, N. J O, H. J E, B. S, O-GLYCBASE: a revised database of O-glycosylated proteins, Nucleic Acids Res. 24 (1) (1996) 248–252. [33] J.E. Hansen, O. Lund, J. Nilsson, K. Rapacki, S. Brunak, O-GLYCBASE Version 3.0: a revised database of O-glycosylated proteins, Nucleic Acids Res. 26 (1) (1998) 387–389. [34] W. Bao, Z. Huang, C.A. Yuan, D.S. Huang, Pupylation sites prediction with ensemble classification model, Int. J. Data Min. Bioinf. 18 (2) (2017) 91–104. [35] W. Bao, D. Wang, Y. Chen, Classification of protein structure classes on flexible neutral tree, IEEE ACM Trans. Comput. Biol. Bioinf 14 (5) (2017) 1122–1133. [36] W. Bao, Z. Jiang, D. Huang, Novel human microbe-disease association prediction using network consistency projection, BMC Bioinf. 18 (16) (2017) 543. [37] W. Bao, et al., Mutli-features prediction of protein translational modification sites, IEEE ACM Trans. Comput. Biol. Bioinf 15 (5) (2018) 1453–1460. [38] J.S. Garavelli, The RESID Database of Protein Modifications as a resource and annotation tool, Proteomics 4 (6) (2010) 1527–1533. [39] M.P. Luisa, et al., The PSI-MOD community standard for representation of protein modification data, Nat. Biotechnol. 26 (8) (2008) 864–866. [40] J.S. Garavelli, The RESID database of protein modifications: 2003 developments, Nucleic Acids Res. 31 (1) (2003) 499–501. [41] G. J S, The RESID Database of protein structure modifications, Nucleic Acids Res. 27 (1) (2001) 198–199. [42] M.A. Harris, et al., The Gene Ontology (GO) database and informatics resource, Nucleic Acids Res. 32 (2004). [43] M. Ashburner, et al., Gene ontology: tool for the unification of biology. The Gene Ontology Consortium, Nat. Genet. 25 (1) (2000) 25–29. [44] S. Maere, K. Heymans, M. Kuiper, BiNGO : a cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks, Bioinformatics 21 (16) (2005) 3448–3449. [45] D. Sylva, Creating the gene ontology resource: design and implementation, Genome Res. (2001). [46] C.H. Wu, et al., The protein information resource, Nucleic Acids Res. 31 (1) (2003) 345–347. [47] C.H. Wu, et al., The Protein Information Resource: an integrated public resource of functional annotation of proteins, Nucleic Acids Res. 30 (1) (2002) 35–37. [48] D. Huang, J. Du, A constructive hybrid structure optimization methodology for radial basis probabilistic neural networks, IEEE Trans. Neural Network. 19 (12) (2008) 2099–2115. [49] A. Van Ooyen, B. Nienhuis, Improving the convergence of the back-propagation algorithm, Neural Network. 5 (3) (1992) 465–471. [50] W. Tong, R. Jin, Semi-supervised learning by mixed label propagation, in: National Conference on Artificial Intelligence, 2007, pp. 651–656. [51] R.A. Jacobs, Increased rates of convergence through learning rate adaptation, Neural Network. 1 (4) (1987) 295–307. [52] M.K. Weir, A method for self-determination of adaptive learning rates in back propagation, Neural Network. 4 (3) (1991) 371–379. [53] K. Deb, A. Pratap, S. Agarwal, T. Meyarivan, A fast and elitist multiobjective genetic algorithm: NSGA-II, IEEE Trans. Evol. Comput. 6 (2) (2002) 182–197. [54] G.M. Morris, et al., Automated docking using a Lamarckian genetic algorithm and an empirical binding free energy function, J. Comput. Chem. 19 (14) (1998) 1639–1662. [55] G. Jones, P. Willett, R.C. Glen, A.R. Leach, R. Taylor, Development and validation of a genetic algorithm for flexible docking, J. Mol. Biol. 267 (3) (1997) 727–748. [56] J. Kennedy, R.C. Eberhart, Particle swarm optimization, in: International Symposium on Neural Networks, vol. 4, 1995, pp. 1942–1948. [57] J. Kennedy, Small worlds and mega-minds: effects of neighborhood topology on particle swarm performance, in: Congress on Evolutionary Computation, vol. 3, 1999, pp. 1931–1938. [58] H. Yamazaki, L.R. Haury, A new Lagrangian model to study animal aggregation, Ecol. Model. 69 (1993) 99–111. [59] M.S. Arumugam, M.V.C. Rao, A. Chandramohan, A new and improved version of particle swarm optimization algorithm with global–local best parameters, Knowl. Inf. Syst. 16 (3) (2008) 331–357. [60] Eberhart, Y. Shi, Particle swarm optimization: developments, applications and resources, in: Congress on Evolutionary Computation, vol. 1, 2001, pp. 81–86, 1.
72