Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network

ANALYTICAL BIOCHEMISTRY Analytical Biochemistry 357 (2006) 116–121 www.elsevier.com/locate/yabio Predicting protein structural class with pseudo-amin...

Download PDF

142KB Sizes 0 Downloads 74 Views

Report

PDF Reader
Full Text

ANALYTICAL BIOCHEMISTRY Analytical Biochemistry 357 (2006) 116–121 www.elsevier.com/locate/yabio

Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network Chao Chen, Xibin Zhou, Yuanxin Tian, Xiaoyong Zou *, Peixiang Cai School of Chemistry and Chemical Engineering, Sun Yat-Sen University, Guangzhou 510275, PR China Received 18 May 2006 Available online 7 August 2006

Abstract Because a priori knowledge of a protein structural class can provide useful information about its overall structure, the determination of protein structural class is a quite meaningful topic in protein science. However, with the rapid increase in newly found protein sequences entering into databanks, it is both time-consuming and expensive to do so based solely on experimental techniques. Therefore, it is vitally important to develop a computational method for predicting the protein structural class quickly and accurately. To deal with the challenge, this article presents a dual-layer support vector machine (SVM) fusion network that is featured by using a diﬀerent pseudo-amino acid composition (PseAA). The PseAA here contains much information that is related to the sequence order of a protein and the distribution of the hydrophobic amino acids along its chain. As a showcase, the rigorous jackknife cross-validation test was performed on the two benchmark data sets constructed by Zhou. A signiﬁcant enhancement in success rates was observed, indicating that the current approach may serve as a powerful complementary tool to other existing methods in this area. 2006 Elsevier Inc. All rights reserved. Keywords: Support vector machine; Fusion; Amino acid composition; Pair-coupled amino acid composition; Pseudo-amino acid composition; Protein structural class

According to the deﬁnition by Levitt and Chothia [1], proteins can be classiﬁed into the following four structural classes: (i) all-a, (ii) all-b, (iii) a/b, and (iv) a + b. A series of previous studies showed that the structural class of a protein correlates strongly with its amino acid composition (AA).1 Actually, most classiﬁers were constructed to predict the protein structural classes based on their AAs [2–13] (for a systematic description in this area, see comprehensive reviews by Chou [14–16]). In representing a protein sample with its AA, however, many important features associated with the sequence order were completely missed, undoubtedly reducing the success rate of prediction. In view of this, various descriptors were pro*

Corresponding author. Fax: +86 20 84112245. E-mail address: [email protected] (X. Zou). 1 Abbreviations used: AA, amino acid composition; pair-coupled AA, pair-coupled amino acid composition; PseAA, pseudo-amino acid composition; SVM, support vector machine; RBF, radial basis function. 0003-2697/$ - see front matter 2006 Elsevier Inc. All rights reserved. doi:10.1016/j.ab.2006.07.022

posed to improve the predictive accuracy, including the pair-coupled amino acid composition (pair-coupled AA) [17], the polypeptide composition [18,19], the pseudoamino acid composition (PseAA) [20,21], and other compositions [22,23]. Another important progress in this area is the introduction of functional domain composition by Chou and Cai that can signiﬁcantly enhance the success rates in predicting protein structural class [24] and many other protein attributes [25–27]. Besides more accurate protein sample representations, the predictive quality can also be improved by methods such as boosting and bagging [28,29] that focus on, and give more weight to, the hard samples that could not be classiﬁed correctly with the previous weak classiﬁers. Alternatively, it is widely accepted that combining multiple classiﬁers can provide advantages over the traditional monolithic approaches [30]. Considering that the various classiﬁers may make diﬀerent and perhaps complementary errors, the aim is to design a composite system that outperforms

Predicting protein structural class / C. Chen et al. / Anal. Biochem. 357 (2006) 116–121

any individual classiﬁer by pooling together the decisions of all classiﬁers. In view of the above facts, this article presents a framework with a dual-layer support vector machine (SVM) fusion network following the main idea of Refs. [31–35]. In its ﬁrst layer, there are three SVM classiﬁers trained by various protein features. Then the computational results are combined and input into the second layer, where another SVM classiﬁer performs the fusion work and makes the ﬁnal decisions adaptively. It is demonstrated through two diﬀerent working data sets that the success rates are improved signiﬁcantly. Materials and methods

incorporated. However, the number of k must be smaller than the number of amino acid residues of the shortest protein chain in the data set concerned. On the other hand, because of the information loss during the jackkniﬁng, the overall success rate by the jackknife test does not always increase monotonically with k [20]. Because jackknife tests are accepted as the most objective methods for cross-validation in statistics [9,12,14], the optimal value for k should be the one that results in the best overall jackknife-tested rate. Besides, it should be pointed out that k may have diﬀerent optimal values for diﬀerent training data sets. For the current study, the optimal value for k is 11; that is, the dimension of PseAA considered here is 20 + 11 = 31. Given a protein X, its PseAA can be deﬁned in a 31-D space as given by 2

Protein features A protein sample can be represented by its AA, paircoupled AA, or PseAA. Because the ﬁrst two have explicit deﬁnitions, only the PseAA is stated below. Since its introduction [20], the PseAA has been used widely and successfully to improve the prediction quality in diverse applications of bioinformatics [36–41]. It is proposed that the sequence order eﬀect along a protein chain can be approximately reﬂected with a set of sequence order correlation factors deﬁned as follows [20]: hm ¼

1 Lm

Lm X

HðRi ; Riþm Þ; ðm ¼ 1; 2; . . . ; k and k < LÞ:

i¼1

ð1Þ In Eq. (1), L denotes the length of the protein and hm is called the mth rank of coupling factor that harbors the mth sequence order correlation factor. It is worth noting that for various studies, the correlation function H(Ri, Rj) may have diﬀerent appropriate forms. In this study, H(Ri, Rj) is formulated as follows: HðRi ; Rj Þ ¼ H ðRi Þ H ðRj Þ;

ð2Þ

where H(Ri) and H(Rj) are the hydrophobicity values of the amino acids Ri and Rj, respectively, taken from Ref. [42]. Eq. (2) is part of the amphiphilic PseAA formulated by Chou for predicting enzyme subfamily classes (see Eq. (3) in Ref. [40]) and membrane protein types (see Eq. (3) in Ref. [43]). Note that before substituting the hydrophobicity values into Eq. (2), they all were subjected to a standard conversion as described by the following equation: P20 H 0 ðRi Þ k¼1 H 0 ðRk Þ=20 H ðRi Þ ¼ qﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃﬃ : ð3Þ P20 P20 2 ½H ðR Þ H ðR Þ=20 =20 0 u k u¼1 k¼1 0 In Eq. (3), Ri ði ¼ 1; 2; . . . ; 20Þ denote the 20 native amino acids and H0(Ri) is the original hydrophobicity value of amino acid Ri. In general, the larger the number of the correlation factors, the more the sequence order eﬀects

117

x1

3

6x 7 6 2 7 6. 7 6. 7 6. 7 7 X¼6 6 x 7; 6 i 7 6. 7 6. 7 4. 5 x31

ð4Þ

where

xu ¼

8 P20 fu P11 ; > > < i¼1 fi þx j¼1 hj

ð1 6 u 6 20Þ

> > : P20

ð20 þ 1 6 u 6 20 þ 11Þ:

i¼1

xhu20 fi þx

P11 j¼1

hj

;

ð5Þ

In the above equation, fi (i = 1, 2, . . . , 20), the same as in the conventional AA, are the normalized occurrence frequencies of the 20 native amino acids in the protein X, and hj (j = 1, 2,. . . 11) are the j-tier sequence order correlation factors computed according to Eqs. (1)–(3). Of the 31-D components, the ﬁrst 20 components reﬂect the eﬀect of the AA, whereas the components from 20 + 1 to 20 + 11 reﬂect the eﬀect of sequence order. The parameter x is a weight factor that adjusts the 31-D components to be in similar scales. Here it was set at 0.1. SVM classiﬁers The SVM learning system, ﬁrst proposed by Cortes and Vapnik [44], is based on statistical learning theory. Compared with other machine learning systems, the SVM has many attractive features, including the absence of local minima, its speed and scalability, and its ability to condense information contained in the training set. During the past decade, SVMs have performed well in predicting protein secondary structure [32,45], predicting protein subcellular localization [46], predicting membrane protein types [47], and so on. In this research, the publicly available LIBSVM software is used [48]. With AA, pair-coupled AA, and PseAA as inputs to LIBSVM

118

Predicting protein structural class / C. Chen et al. / Anal. Biochem. 357 (2006) 116–121

for training, three classiﬁers are constructed: SVM1, SVM2, and SVM3. The computational results from them are the protein structural classes, which can be (i) all-a, (ii) all-b, (iii) a/b, and (iv) a + b. For the four-class problem, we use the ‘‘one versus others’’ method to transfer it into a two-class problem. Therefore, so far as one kind of composition descriptors is concerned (e.g., AA), there are actually four SVM classiﬁers (rather than just one): one for classifying the all-a class, one for classifying the all-b class, and the other two for classifying the other two classes. Despite these diﬀerences, for simplicity, all four classiﬁers are denoted as SVM1. Considering that it is not a simple linear relationship between the composition descriptors and the structural classes, we select the radial basis function (RBF) as the kernel function: 2

Kð~ u;~ vÞ ¼ expðcjj~ u ~ vjj Þ:

ð6Þ

Subsequently, the parameter set (c, C) needs to optimize. We perform grid searches for maximal jackknife-tested overall rates with c ranging among 0.001, 0.01, 0.1, 1, 10, and 100 and with C ranging among 0.1, 1, 10, 100, and 1000. Of all the possible combinations, the set (100, 100) is found to have the best average performance. Although we can use particular sets for diﬀerent classiﬁers, for simplicity, we choose the set (100, 100) and the total departure from the optimal value is very trivial. Fusion From the ﬁrst-layer SVM classiﬁers, there are 3 (actually, 4 * 3 = 12) computational results. One can employ methods such as vote counting and decision templates to make the ﬁnal decision. Furthermore, one can assign various weights for all 12 results with respect to the reliabilities of the diﬀerent classiﬁers. Nevertheless, in this work, we construct an SVM fusion system to make decisions. If we suppose that a protein data set contains N samples, it can be represented as an N · 20 matrix with AA, as an N · 210 matrix with pair-coupled AA, or as an N · 31 matrix with PseAA. These matrices are input into the ﬁrst-layer SVM1, SVM2, and SVM3, respectively. The outputs of the ﬁrst-layer SVMs are three N · 4 matrices representing the probability that the protein sample belongs to that structural class. Then the three matrices are combined to form a new N · 12 matrix that is used as the inputs of the second-layer SVM. Diﬀerent from the ﬁrst-layer SVMs, the relationship between the inputs and the outputs may be linear here. Accordingly, we select the linear kernel rather than the RBF kernel for fusion, in which case only one parameter (i.e., the regularization parameter C) needs to optimize. We set it within a range among 0.001, 0.01, 0.1, 1, 10, and 100. As a result, we ﬁnd that the number 0.01 is the optimal value for fusing three of the former structural classes, that is, all except the a + b class. For a + b, the optimal value of C is 1. The whole procedure is shown in Fig. 1.

Fig. 1. Flowchart of the SVM fusion network.

Results and discussion For the fusion network proposed here, we used the two data sets constructed by Zhou [9] to test its prediction quality. One consists of 277 domains: 70 all-a domains, 61 all-b domains, 81 a/b domains, and 65 a + b domains. The other data set consists of 498 domains: 107 all-a domains, 126 all-b domains, 136 a/b domains, and 129 a + b domains. In general, a prediction method is usually evaluated by the resubstitution test, the independent data set test, or the jackknife test. Of these three tests, the jackknife test is accepted as the most rigorous and objective one [9,12,14]. In the jackknife test, each protein in the data set is singled out as an independent test sample and all of the rule parameters are calculated without using this protein. The success rates by the jackknife cross-validation test are listed in Table 1. As this table shows, the success rates are improved signiﬁcantly. Especially for the most diﬃcult case of a + b, the success rates for the two data sets are increased by 27.7 and 13.2% in comparison with the best classiﬁer among the three individuals and are even increased by 9.2 and 12.4% in comparison with the values by the oracle. Noting that in the fusion SVM classiﬁer the inputs are 12-D vectors combining the computational results for all four structural classes, it is not strange that the success rates by fusion exceed the theoretical limits by the oracle. The overall rates are also increased by 7.2 and 3.6%. As illustrated in Table 2, we further carried out a comparison with some prior works. From this table, it can be seen that our method is superior or comparable to other

Predicting protein structural class / C. Chen et al. / Anal. Biochem. 357 (2006) 116–121

119

Table 1 Results of the jackknife test Data set

Classiﬁer

Success rate (%) All-a

All-b

a/b

a+b

Overall

Z277

SVM1 SVM2 SVM3 Oraclea Fusion

82.9 82.9 87.1 87.1 85.7

90.2 85.3 88.5 91.8 90.2

93.8 93.8 91.4 93.8 93.8

52.3 44.6 52.3 70.8 80.0

80.5 77.6 80.5 83.8 87.7

Z498

SVM1 SVM2 SVM3 Oraclea Fusion

99.1 98.1 98.1 99.1 99.1

96.0 94.4 96.0 96.8 96.0

80.9 78.7 80.9 81.6 80.9

76.7 71.3 78.3 79.1 91.5

87.6 84.9 87.8 88.6 91.4

a The oracle works as follows: assign the correct structural class label to protein sample X if at least one individual classiﬁer produces the correct structural class label of X, so that the theoretical maximal success rate by the fusion technique can be expected [49].

Table 2 Comparison with other algorithms by the jackknife test Data set

Algorithm

Success rate (%) All-a

All-b

a/b

a+b

Overall

Z277

Component coupleda Neural networkb SVMc LogitBoostd Rough Setse Our method

84.3 68.6 74.3 81.4 77.1 85.7

82.0 85.2 82.0 88.5 77.0 90.2

81.5 86.4 87.7 92.6 93.8 93.8

67.7 56.9 72.3 72.3 66.2 80.0

79.1 74.7 79.4 84.1 79.4 87.7

Z498

Component coupleda Neural networkb SVMc LogitBoostd Rough Setse Our method

93.5 86.0 88.8 92.5 87.9 99.1

88.9 96.0 95.2 96.0 91.3 96.0

90.4 88.2 96.3 97.1 97.1 80.9

84.5 86.0 91.5 93.0 86.0 91.5

89.2 89.2 93.2 94.8 90.8 91.4

a b c d e

Results Results Results Results Results

are are are are are

from from from from from

Ref. Ref. Ref. Ref. Ref.

[9]. [10]. [11]. [28]. [50].

existing methods. Accordingly, from both the rationality of testing procedure and the success rates of test results, the current SVM fusion network can signiﬁcantly improve the prediction quality of protein structural class. Conclusions Rather than constructing a more complicated classiﬁer, it is believed that fusing the relatively simple individuals, which may play complementary roles to each other, is a good idea. In this article, a novel SVM fusion network for predicting the protein structural class has been presented. In its ﬁrst layer, three classiﬁers are constructed with various statistics as inputs. The computational values are then combined and input to the second layer to fuse, where the ﬁnal decisions are made. The results from two diﬀerent working data sets show that the fusion network can signif-

icantly improve the predictive accuracy, even compared with the best individual classiﬁer. Moreover, it can be anticipated that the current fusion network may also have impacts on improving the success rates for many other protein attributes such as subcellular localization, membrane types, and enzyme family and subfamily classes. Acknowledgments The authors thank the anonymous reviewers whose constructive comments were very helpful in strengthening the presentation of this article. Financial support from the National Natural Science Foundation of China (20475068 and 20575082), the Natural Science Foundation of Guangdong Province (031577), and the Scientiﬁc Technology Project of Guangdong Province (2005B30101003) is acknowledged.

120

Predicting protein structural class / C. Chen et al. / Anal. Biochem. 357 (2006) 116–121

References [1] M. Levitt, C. Chothia, Structural patterns in globular proteins, Nature 261 (1976) 552–558. [2] K.C. Chou, C.T. Zhang, A correlation-coeﬃcient method to predicting protein-structural classes from amino-acid compositions, Eur. J. Biochem. 207 (1992) 429–433. [3] G.F. Zhou, X.H. Xu, C.T. Zhang, A weighting method for predicting protein structural class from amino-acid composition, Eur. J. Biochem. 210 (1992) 747–749. [4] C.T. Zhang, K.C. Chou, An optimization approach to predicting protein structural class from amino-acid composition, Protein Sci. 1 (1992) 401–408. [5] K.C. Chou, C.T. Zhang, Predicting protein-folding types by distance functions that make allowances for amino-acid interactions, J. Biol. Chem. 269 (1994) 22014–22020. [6] C.T. Zhang, K.C. Chou, G.M. Maggiora, Predicting protein structural classes from amino-acid composition: application of fuzzy clustering, Protein Eng. 8 (1995) 425–435. [7] K.C. Chou, A novel approach to predicting protein structural classes in a (20–1)-D amino-acid composition space, Proteins: Struct. Funct. Genet. 21 (1995) 319–344. [8] I. Bahar, A.R. Atilgan, R.L. Jernigan, B. Erman, Understanding the recognition of protein structural classes by amino acid composition, Proteins 29 (1997) 172–185. [9] G.P. Zhou, An intriguing controversy over protein structural class prediction, J. Protein Chem. 17 (1998) 729–738. [10] Y.D. Cai, G.P. Zhou, Prediction of protein structural classes by neural network, Biochimie 82 (2000) 783–785. [11] Y.D. Cai, X.J. Liu, X.B. Xu, G.P. Zhou, Support vector machines for predicting protein structural class, BMC Bioinform. 2 (2001) 1–5. [12] G.P. Zhou, N. Assa-Munt, Some insights into protein structural class prediction, Proteins: Struct. Funct. Genet. 44 (2001) 57–59. [13] H.B. Shen, J. Yang, X.J. Liu, K.C. Chou, Using supervised fuzzy clustering to predict protein structural classes, Biochem. Biophys. Res. Commun. 334 (2005) 577–581. [14] K.C. Chou, C.T. Zhang, Prediction of protein structural classes, Crit. Rev. Biochem. Mol. Biol. 30 (1995) 275–349. [15] K.C. Chou, Prediction of protein structural classes and subcellular locations, Curr. Protein Peptide Sci. 1 (2000) 171–208. [16] K.C. Chou, Progress in protein structural class prediction and its impact to bioinformatics and proteomics, Curr. Protein Peptide Sci. 6 (2005) 423–436. [17] K.C. Chou, Using pair-coupled amino acid composition to predict protein secondary structure content, J. Protein Chem. 18 (1999) 473–480. [18] R.Y. Luo, Z.P. Feng, J.K. Liu, Prediction of protein structural class by amino acid and polypeptide composition, Eur. J. Biochem. 269 (2002) 4219–4225. [19] X.D. Sun, R.B. Huang, Prediction of protein structural classes using support vector machines, Amino Acids 30 (2006) 469–475. [20] K.C. Chou, Prediction of protein cellular attributes using pseudoamino acid composition, Proteins: Struct. Funct. Genet. 43 (2001) 246–255. [21] X. Xiao, S.H. Shao, Z.D. Huang, K.C. Chou, Using pseudo amino acid composition to predict protein structural classes: approached with complexity measure factor, J. Comput. Chem. 27 (2006) 478–482. [22] Q.S. Du, D.Q. Wei, K.C. Chou, Correlations of amino acids in proteins, Peptides 24 (2003) 1863–1869. [23] Q.S. Du, Z.Q. Jiang, W.Z. He, D.P. Li, K.C. Chou, Amino acid principal component analysis (AAPCA) and its applications in protein structural class prediction, J. Biomol. Struct. Dyn. 23 (2006) 635–640. [24] K.C. Chou, Y.D. Cai, Predicting protein structural class by functional domain composition, Biochem. Biophys. Res. Commun. 321 (2004) 1007–1009.

[25] Y.D. Cai, K.C. Chou, Nearest neighbor algorithm for predicting protein subcellular location by combining functional domain composition and pseudo-amino acid composition, Biochem. Biophys. Res. Commun. 305 (2003) 407–411. [26] Y.D. Cai, K.C. Chou, Using functional domain composition to predict enzyme family classes, J. Proteome Res. 4 (2005) 109–111. [27] Y.D. Cai, K.C. Chou, Predicting enzyme subclass by functional domain composition and pseudo amino acid composition, J. Proteome Res. 4 (2005) 967–971. [28] K.Y. Feng, Y.D. Cai, K.C. Chou, Boosting classiﬁer for predicting protein domain structural class, Biochem. Biophys. Res. Commun. 334 (2005) 213–217. [29] Y.D. Cai, K.Y. Feng, W.C. Lu, K.C. Chou, Using LogitBoost classiﬁer to predict protein structural classes, J. Theor. Biol. 238 (2006) 172–176. [30] L. Nanni, Fusion of classiﬁers for protein fold recognition, Neurocomputing 68 (2005) 315–321. [31] C. Yan, D. Dobbs, V. Honavar, A two-stage classiﬁer for identiﬁcation of protein–protein interface residues, Bioinformatics 20 (Suppl. 1) (2004) i371–i378. [32] J. Guo, H. Chen, Z.R. Sun, Y.L. Lin, A novel method for protein secondary structure prediction using dual-layer SVM and proﬁles, Proteins: Struct. Funct. Bioinform. 54 (2004) 738–743. [33] M.N. Nguyen, J.C. Rajapakse, Prediction of protein relative solvent accessibility with a two-stage SVM approach, Proteins: Struct. Funct. Bioinform. 59 (2005) 30–37. [34] M.N. Nguyen, J.C. Rajapakse, Two-stage multi-class support vector machines to protein secondary structure prediction, Paciﬁc Symp. Biocomp. (2005) 346–357. [35] M.N. Nguyen, J.C. Rajapakse, Two-stage support vector regression approach for predicting accessible surface areas of amino acids, Proteins: Struct. Funct. Bioinform. 63 (2006) 542–550. [36] K.C. Chou, Y.D. Cai, Predicting protein quaternary structure by pseudo amino acid composition, Proteins: Struct. Funct. Genet. 53 (2003) 282–289. [37] K.C. Chou, Y.D. Cai, Predicting subcellular localization of proteins by hybridizing functional domain composition and pseudo-amino acid composition, J. Cell. Biochem. 91 (2004) 1197–1203. [38] H.B. Shen, K.C. Chou, Using optimized evidence—theoretic K-nearest neighbor classiﬁer and pseudo-amino acid composition to predict membrane protein types, Biochem. Biophys. Res. Commun. 334 (2005) 288–292. [39] H.B. Shen, K.C. Chou, Predicting protein subnuclear location with optimized evidence—theoretic K-nearest classiﬁer and pseudo amino acid composition, Biochem. Biophys. Res. Commun. 337 (2005) 752–756. [40] K.C. Chou, Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes, Bioinformatics 21 (2005) 10–19. [41] S.W. Zhang, Q. Pan, H.C. Zhang, Z.C. Shao, J.Y. Shi, Prediction of protein homo-oligomer types by pseudo amino acid composition: approached with an improved feature extraction and naive Bayes feature fusion, Amino Acids 30 (2006) 461–468. [42] J. Kyte, R.F. Doolittle, A simple method for displaying the hydropathic character of a protein, J. Mol. Biol. 157 (1982) 105–132. [43] K.C. Chou, Y.D. Cai, Prediction of membrane protein types by incorporating amphipathic eﬀects, J. Chem. Inform. Model. 45 (2005) 407–413. [44] C. Cortes, V. Vapnik, Support-vector networks, Mach. Learn. 20 (1995) 273–297. [45] M. Kumar, M. Bhasin, N.K. Natt, G.P.S. Raghava, BhairPred: prediction of b-hairpins in a protein from multiple alignment information using ANN and SVM techniques, Nucleic Acids Res. 33 (2005) W154–W159. [46] K.C. Chou, Y.D. Cai, Using functional domain composition and support vector machines for prediction of protein subcellular location, J. Biol. Chem. 277 (2002) 45765–45769.

Predicting protein structural class / C. Chen et al. / Anal. Biochem. 357 (2006) 116–121 [47] Y.D. Cai, G.P. Zhou, K.C. Chou, Support vector machines for predicting membrane protein types by using functional domain composition, Biophys. J. 84 (2003) 3257–3263. [48] C.C. Chang, C.J. Lin, LIBSVM: A Library for Support Vector Machines [software], 2001, www.csie.ntu.edu.tw/~cjlin/ libsvm.

121

[49] L.I. Kuncheva, Switching between selection and fusion in combining classiﬁers: an experiment, IEEE Trans. Syst. Man Cybern. B Cybern. 32 (2002) 146–156. [50] Y.F. Cao, S. Liu, L.D. Zhang, J. Qin, J. Wang, K.X. Tang, Prediction of protein structural class with Rough Sets, BMC Bioinform. 7 (2006), doi:10.1186/1471-2105-720.

Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network

Predicting protein structural class with pseudo-amino acid composition and support vector machine fusion network

Recommend Documents