Journal of Genetics and Genomics (Formerly Acta Genetica Sinica) July 2007, 34(7): 649-660
Research Article
Evidence for Positive Darwinian Selection of Vip Gene in Bacillus thuringiensis Jinyu Wu1, *, Fangqing Zhao2, *, Jie Bai1, Gang Deng1, Song Qin2, Qiyu Bao1,① 1. Institute of Biomedical Informatics / Zhejiang Provincial Key Laboratory of Medical Genetics, Wenzhou Medical College, Wenzhou 325000, China; 2. Institute of Oceanology, Chinese Academy of Sciences, Qingdao 266071, China
Abstract: Vegetative insecticidal proteins (VIPs), produced during the vegetative stage of their growth in Bacillus thuringiensis, are a group of insecticidal proteins and represent the second generation of insecticidal trans-genes that will complement the novel δ-endotoxins in future. Fewer structural and functional relationships of Vip proteins are known in comparison with those of δ-endotoxins. In this study, both the maximum-likelihood methods and the maximum parsimony based sliding window analysis were used to evaluate the molecular evolution of Vip proteins. As a result, strong evidence was found that Vip proteins are subject to the high rates of positive selection, and 16 sites are identified to be under positive selection using the Bayes Empirical Bayesian method. Interestingly, all these positively selected sites are located from site-705 to site-809 in the C-terminus of the Vip proteins. Most of these sites are exposed and clustered in the loop regions when mapped onto its computational predicted secondary tertiary and a part of the tertiary structure. It has been postulated that the high divergence in the C-terminal of Vip proteins may not result from the lack of functional constraints, but rather from the rapid mutation to adapt their targeted insects, driven by positive selection. The potential positive selection pressures may be an attempt to adapt for the “arm race” between Vip proteins and the targeted insects, or to enlarge their target’s host range. Sites identified to be under positive selection may be related to the insect host range, which may shed a light on the investigation of the Vip proteins’ structural and functional relationships. Keywords: Bacillus thuringiensis; positive selection; sliding window; maximum likelihood; Vip proteins
Bacillus thuringiensis, a naturally occurring gram-positive bacterium, can produce several proteins that have insecticidal properties affecting a selective range of insect orders [1]. One of the main representatives of insecticidal proteins is δ-endotoxin or Cry protein. During the past decades, more than ten B. thuringiensis strains have been identified and over 150 Cry toxins have been cloned and tested. After insects ingest the Cry proteins, they are solubilized in the alkaline environment of the insect gut, releasing
their constituent Cry proteins as protoxins. The midgut converts the protoxin into a biologically active toxin by proteolytic enzymes. This activated toxin then binds to specific receptors on the surface of the midgut epithelial cells and enters the cell membranes, forming pores and channels in the gut cell membrane, followed by destruction of the epithelial cells [2]. Five Cry protein’s tertiary structures have been determined through X-ray crystallography, including Cry1Aa, Cry2Aa, Cry3Aa, Cry3Bb, and Cry4Ba. They all
Received: 2006−12−31; Accepted: 2007−02−24 This work was supported by the National Natural Science Foundation of China (No. 30571009). * These authors contributed equally to this work. ① Corresponding author. E-mail:
[email protected]; Tel: +86-577-8889 2799 www.jgenetgenomics.org
650
suggest that the active toxins are globular molecules with three conserved domains [3]. Receptor binding is the major determinant of host specificity by the different Cry proteins. In view of such properties, B. thuringiensis is developed as a type of microbial insecticide and has already been a useful alternative or supplement to synthetic chemical pesticide application in commercial agriculture, forest management, and mosquito control. It is also a key source of genes for pest-resistant transgenic plants [2]. However, several cases of resistance have been reported for B. thuringiensis insecticides in field populations, as well as in laboratory selection experiments [4], which lead to a great loss of agriculture crop. In addition to the Cry toxic proteins, some insecticidal proteins expressed by the bacterium during the vegetative stage have been identified. These proteins are called vegetative insecticidal proteins (VIPs) and show no homology to the δ-endotoxin family. Since the first Vip protein reported in 1996 [5], more and more researchers and groups have focused on this protein. Vip proteins have potent, broad-spectrum activity against insects, represent the second generation of insecticidal trans-genes, and offer a promise of extending the usefulness of B. thuringiensis toxins, to delay the onset of resistance in insects [6]. Currently, all the Vip-related sequences that have been reported are classified into three subfamilies: Vip1, Vip2, and Vip3. A new Vip nomenclature has been formulated to accommodate the growing list of new proteins according to their amino acid sequence identities [7] (available at http://www.lifesci.sussex.ac.uk/home/ Neil_Crickmore/Bt/). The Vip1 and Vip2, which are mainly produced by B. cereus, are the binary toxins with coleopteran specificity, whereas, Vip3 proteins are toxins with lepidopteran specificity, which are derived from B. thuringiensis [8]. In comparison with the extensive literatures describing the function and the specificity determination regions of δ-endotoxins, the structural, functional, and evolutionary analyses of Vip proteins are rare. Posi-
Journal of Genetics and Genomics
遗传学报
Vol.34 No.7 2007
tive selection is a phenomenon favoring the retention of mutations that are beneficial to an individual or population, and is thought to be an ephemeral case frequently resulting in the occurring of a protein with a novel function [9]. The nonsynonymous to synonymous substitution rate ratio (ω =dN /dS) provides a sensitive measure of selective pressures at the protein level with ω values < 1, = 1, and > 1, indicating purifying selection, neutral evolution, and positive selection, respectively. The specific goals of this study are to identify whether the Vip protein is subject to positive selection in the process of evolution by using a combination of a maximum likelihood approach and a maximum parsimony based sliding window approach. This study shows that the Vip protein encounters high rates of positive selection and the positively selected sites are mostly exposed and clustered in the loop regions when mapped onto the computational predicted structure. It has been postulated that the potential positive selection pressures driving the diversification of Vip protein, maybe in an attempt to adapt for the ‘arm race’ between Vip protein and the targeted insects, or to enlarge their target range, therefore resulted in the functional divergence among different Vip proteins. The positive selected sites identified here may be related to the insect host range, and provide the necessary foundation for further experimental investigations of Vip proteins’ structural and functional analyses.
1
Materials and Methods
1.1. Sequence alignment and phylogeny reconstruction The analysis of the Vip3 proteins other than Vip1 and Vip2 proteins is performed here, because these proteins are mainly produced by B. cereus, whereas, Vip3 proteins are produced by B. thuringiensis. However, the main reason that only Vip3 proteins have been analyzed is because only a few Vip1 and Vip2 sequences are available, which is not enough to validate www.jgenetgenomics.org
Jinyu Wu et al.: Evidence for Positive Darwinian Selection of Vip Gene in Bacillus thuringiensis
the results of the following maximum likelihood analysis. It is suggested that limited sequences can decrease the power of the LRT analysis [10]. All the 16 Vip3 sequences available were retrieved from GenBank database, with the accession numbers AAQ12340, CAI43275, CAI43276, CAI43277, CAI96522, AAR36859, AAW62286, AAW65132, AAU89707, AAV70653, AAX49395, AAY41427, AAY41428, AAC37036, AAC37037, and AAO32350. Among them, 10 Vip3 sequences (i.e., CAI96522, AAR36859, AAW62286, AAW65132, AAU89707, AAX49395, AAY41427, AAC37036, AAC37037, and AAO32350) shared high identities between each other. To account for possible differences in selective pressure analysis, a large data set containing all the above 16 Vip3 sequences and a small data set containing seven sequences (i.e., AAU89707, AAY41428, AAQ12340, CAI43275, CAI43276, CAI43277, and AAV70653), were both used to perform the maximum likelihood analysis. The seven sequences selected in the small data were based on the principle that only one randomly selected sequence from these 10 sequences was held. The sequences were initially aligned by application of the ClustalX [11], with default settings at the amino acid level. Then the corresponding nucleotide sequence alignment was generated according to the protein sequence alignment. The whole level of saturation was determined using the index of substitution saturation [12], implemented in the DAMBE software package [13], as saturation could lead to the underestimation of dS and an inflation of the dN / dS ratio. This test was based on the notion of entropy in information theory and yielded a critical value, permitting the saturation degree of a given set of aligned sequences to be assessed. A maximum likelihood phylogeny was reconstructed with PAUP* 4.10b [14] constrained with the Modeltest parameters [15]. The reliability of the tree was evaluated by the bootstrap method, with 1,000 replications. The model selected for both data sets was the GTR www.jgenetgenomics.org
651
with a gamma distribution. The phylogeny was also reconstructed by MP and NJ methods available in the Phylip program [16], but in both cases they shared a substantively similar topology with the maximum likelihood trees (data not shown). 1.2
Testing for positive selection and identification of corresponding sites To analyze the possibility of positive selection
acting on the Vip protein and to infer amino acid sites under positive selection on the two data sets, initially the maximum-likelihood method of Nielsen and Yang was applied
[17]
, implemented in the codeml program
from the PAML package
[18]
. Several site-specific
models that allow for various dN / dS ratios among sites, were used to detect positive selection
[19, 20]
. In
the simplest model (M0 or one-ratio model), it is assumed that the ω ratio is an average over all the sites. The ‘nearly neutral’ model (M1a) allows for conserved sites when 0 < ω < 1 and completely neutral sites when ω = 1. The “positive selection” model (M2a) adds a third class to M1a in which ω can take values > 1. Model M3 (discrete) has three classes with proportions p0, p1, and p2 and ω0, ω1, and ω2 values estimated from the data. Model M7 (β) assumes a beta distribution over the interval (0, 1), which provides a null hypothesis for testing positive selection, and does not allow for sites with ω > 1. Model M8 (β and ω) adds an extra class of sites to M7, so that a proportion p0 of sites come from the β distribution B (p, q) and the remaining sites (p1 = 1−p0) have a ω that can be greater than 1. Likelihood ratio tests (LRTs) have been conducted to compare M1a with M2a, M0 with M3, and M7 with M8. The level of significance is calculated as per the following steps [21]: (1) twice the difference of the likelihood scores (2ΔlnL) estimated by each model and the corresponding null distribution; and (2) approximated by χ2 distribution with the degrees freedom calculated from the difference in the number of estimated parameters between models.
652
Only models M2a, M3, and M8 can detect sites to be
Journal of Genetics and Genomics
遗传学报
Vol.34 No.7 2007
under positive selection. For the nonnested models,
of this program was that it could find the saturation of synonymous sites.
the Akaike’s Information Criterion (AIC) [22] has been
1.3
used. The AIC serves as an asymptotically unbiased estimator of a variant of Kullback’s directed divergence, which is under the assumption that the true model is correctly specified or overfitted. The AIC takes into account not only the goodness of fit, but also the variance of the parameter estimates. The model with the lowest AIC is expected to be the closest model to the true model, among the candidate models. As AIC is on a relative scale, it is useful to present also the AIC differences (or ΔAIC). Here, the AIC is defined as AIC = −2L + 2K, where L is the Log-likelihood score of the model, given the data, and K is the number of estimated parameters included in the model. ΔAIC is calculated as: ΔAIC = AIC – minAIC, where minAIC is the AIC value of the best model. Next, to identify which sites potentially belong to the positively selected class identified by the M2a and M8 models, the posterior probabilities (p) were calculated from the parameter estimates of these models, using the Bayes empirical Bayesian approach [20]. For model M3, the Naive Empirical Bayes approach was used to identify the sites under positive selection because no Bayes empirical Bayesian approach was implemented. Sites with a high probability of coming from the class with ω > 1 were likely to be under positive selection. Evidence of positive selection in the large data set was also estimated using the maximum parsimony based sliding window approach implemented in the SWAPSC program [23]. To identify the positively selected sites, this method initially determined the optimum window size and slides along the aligned sequences. Subsequently, the positively selected sites were inferred where ω was significantly greater than 1. The SWAPSC program was also employed to ensure that the sites identified by PAML to be under positive selection were not saturated, because another ability
Localizing the selected sites in the Vip structure
In the absence of whole tertiary structures of Vip proteins, the sites identified to be under positive selection were initially mapped with the posterior probabilities > 90% (P > 90%) under M8, onto the secondary structure of AAU89707 (as a working model), to establish the relevance of these findings with the Vip proteins. The secondary structure of AAU89707 was predicted by the Jnet server [24]. The Conseq server [25] was used to identify whether residues were exposed or buried in the alignment, with AAU89707 as the reference. The Phyre server (http://www.sbg.bio.ic.ac.uk/~phyre/) was used to predict the tertiary structure of AAU89707, which was based on the protein homology/analogy recognition. Notably, the reason why AAU89707 was used as a working model was that the sequence of AAU89707 had the smallest E-value among all the 16 Vip proteins when make Blast search against the candidate structures in the PDB. Swiss-PDBviewer [26] and Grasp2 [27] were used for all structural manipulations.
2 2.1
Results Sequence alignment and phylogeny reconstruction
Sequence alignment of Vip proteins by means of the program ClustalX showed that the N-terminus of Vip proteins were more conservative than their C-terminus. Besides amino acid variation, an insertion of 20 amino acids (from 462 to 481), with unknown functions, was found in the protein sequence AAV70653 (Fig. 1). Before carrying out any tests of positive selection in the Vip proteins, the substitution saturation was initially examined, as saturation could lead to the underestimation of dS and an inflation of the dN /dS ratio. The results showed that the index www.jgenetgenomics.org
Jinyu Wu et al.: Evidence for Positive Darwinian Selection of Vip Gene in Bacillus thuringiensis
653
Fig. 1 Amino acid sequence alignment of the seven Vip proteins The first seven lines are the sequences of the Vip protein with the accession number as the title. The eighth line “Jnet” shows the predicted results of the secondary structure by using the working model of AAU89707 as input (‘H’, ‘E’, and ‘-’ represent α-helix, β-sheet, and loop, respectively). The ninth line ‘Conseq’ shows the predicted exposed status of the site (i.e., b, buried, versus e, exposed). Asterisks indicate the positively selected sites with P > 90% under the best approximation model M8 (see the Results for detail). www.jgenetgenomics.org
654
Journal of Genetics and Genomics
score (ISS) was significantly lower than the critical score (ISS.C) for the large data set (ISS = 0.188, ISS.C = 0.814; P < 0.001). In the small data, the observed ISS value was also significantly lower than ISS.C (ISS = 0.265, ISS.C = 0.825; P < 0.001). Therefore, bothdata sets did not indicate any sign of substitution saturation, and therefore, the following analysis of positive selection would be significant. Table 1
2.2
遗传学报
Vol.34 No.7 2007
Positive selection drives the diversification of Vip proteins
The likelihood model analysis of the selection pressures acting on both data sets provides strong evidence for positive selection (Table 1). For the large data set, the one-ratio model (M0) gives a log likelihood of −8,290.97, with the estimate ω = 0.181. The log likelihood under “nearly neutral” model (M1a) is
Positively selected sites, log-likelihood scores, and parameter estimates for the Vip3 proteins from B. thuringiensis ΔAIC
Positively Selected Sites (P > 90%)
−8,290.97
538.26
None
p 0 = 0.807, p1 = 0.193 ω0 = 0.0664, ω1 =1.000
−8,059.18
76.68
Not allowed
M2a
p 0 = 0.797, p1 = 0.168, p2 = 0.035 ω0 = 0.0753, ω1 =1.000, ω2 = 8.372
−8,024.53
(M1a vs. M2a) 69.30 (P << 0.001)
11.38
711, 715, 723, 726, 750, 770, 771, 787, 807, 809
M3
p 0 = 0.666, p 1 = 0.274 p2 = 0.060, ω0 = 0.0449 ω1 = 0.450, ω2 = 4.962
−8,019.31
(M0 vs, M3) 543.32 (P << 0.001)
2.94
568, 705, 710, 711, 713, 715, 723, 726, 748, 750, 769, 770, 771, 778, 783, 786, 787, 798, 801, 802, 804, 807, 809
M7
p = 0.256, q = 0.856
−8,076.21
110.74
Not allowed
M8
p 0 = 0.940 p = 0.435, q =2.200, p1 = 0.060, ωs = 4.892
−8,018.84
MinAIC with value 16,045.68
705, 710, 711, 715, 723, 726, 748, 750, 770, 771, 783, 787, 801, 804, 807, 809
M0
ω = 0.180
−8,105.56
567.6
None
M1a
p 0 = 0.807, p1 = 0.193 ω0 = 0.0627, ω1 =1.000
−7,862.89
84.26
Not allowed
M2a
p 0 = 0.796, p1 = 0.167, p2 = 0.037 ω0 = 0.0721, ω1 =1.000, ω2 = 11.610
2.78
711, 715, 723, 726, 750, 770, 771, 787,783, 807, 809
M3
p 0 = 0.733, p 1 = 0.220 p2 = 0.047, ω0 = 0.0549 ω1 = 0.621, ω2 = 8.263
−7,817.82
0.12
705, 710, 711, 713, 715, 723, 726, 748, 750, 770, 771, 783, 786, 787, 798, 801, 804, 807, 809
M7
p = 0.243, q = 0.816
−7,883.04
124.56
Not allowed
M8
p 0 = 0.944 p = 0.396, q = 1.924, p1 = 0.056, ωs = 6.641
−7,818.76
MinAIC with value 15,645.52
705, 710, 711, 715, 723, 726, 748, 750, 770, 771, 783, 787, 798, 801, 804, 807, 809
Model
Parameter estimate (s)
LnL
M0
ω = 0.181
M1a
2ΔLnL
Large data set
(M7 vs. M8) 114.74 (P << 0.001)
Small data set
−7,820.17
(M1a vs. M2a) 85.44 (P << 0.001) (M0 vs. M3) 575.48 (P << 0.001)
(M7 vs. M8) 128.56 (P << 0.001)
Maximum likelihood estimates showing the presence of positive selection are in boldface. The parameters p and q describe the shape of the beta distribution of ω, and p1, p2, and p3 are the proportions of codons belonging to each category. Sites inferred to be under positive selection with probabilities > 0.99 are in boldface. www.jgenetgenomics.org
Jinyu Wu et al.: Evidence for Positive Darwinian Selection of Vip Gene in Bacillus thuringiensis
655
−8,059.18, with parameter estimates ω0 = 0.0664. The ‘positive selection’ model (M2a) suggests that about 3.5% of sites are under positive selection with ω2 = 8.372. Because model M2a is an extension of M1a (neutral), these two models can be compared using an LRT. The test statistic is 2ΔLnL = 69.30, with P << 0.001 and df (degrees of freedom) = 2, so model M2a provides a significantly better fit for the data than M1a. Model M3 also fits the data significantly better than M0 with P << 0.001 and df = 3. The ‘beta and ω’ model (M8) suggest that about 6.0% of sites are under positive selection with ωs = 4.892. It is seen that M8 fits the data significantly better than M7 (2ΔLnL = 114.74, df = 2, P << 0.001). These tests provide significant evidence that the sites analyzed are under positive selection (Table 1). To compare and estimate how well different models approximate the data, AIC statistics are calculated from likelihood scores, because likelihood scores will always be better for more complex models[28]. Of all the candidate models above, M8 is the best suited for the data and M3 ranks the second. Sites subjected to be positive selection may be involved in important functions, therefore, the Bayes theorem was used to calculate the posterior probabilities of ω classes for each site in the models M2a, M3, and M8. Sites with high probabilities of coming from the class with ω > 1 were likely to be under positive selection[20]. Ten codon sites were inferred to be under positive selection, with P > 90% using the Bayes Empirical Bayes method under M2a (Table 1, Fig. 2A). These sites and another six sites (705, 710, 748, 783, 801, and 804) were also identified with P > 90% under M8. Using the Naive Empirical Bayes approach under M3, 23 sites with P > 90% were identified. Besides the 16 sites above, seven more sites were located (568, 713, 769, 778, 786, 798, and 802). The likelihood model analysis of the small data set also obtained unequivocal statistical evidence for positive selection that occurred in the Vip proteins. Similarly, all the three LRTs (M1a vs. M2a, M0 vs. www.jgenetgenomics.org
Fig. 2 Venn diagram showing the relationship between different models and/or different methods for predicting positive selected sites A: the positive selected sites identified between different models (M2a, M3, and M8) in the large data set. B: the positive selected sites identified between different models (M2a, M3, and M8) in the small data set. C: the positive selected sites identified between methods of maximum likelihood approach (M8) and the maximum parsimony based sliding window in the large data set.
M3, and M7 vs. M8) were highly significant (Table 1). Model M2a identified about 3.7% of all sites in a class that was under the influence of positive selection (ω2 = 11.610). Model M8 indicated that about 5.6% of sites had a ω value of 6.641, a value indicative of strong positive selection. Model M8 was the best approximation to the data according to the AIC. No significant differences were found in the corresponding sites of both data sets between M2a and M8, using the same method of Bayes Empirical Bayes, which was identified under positive selection with P > 90%. The only minor discrepancy was that both M2 and M8 identified one extra site in the small data set (783 for M2a, 798 for M8). When comparing the positively
656
Journal of Genetics and Genomics
selected sites (with P > 90%) under M3, in both data sets, it was found that four sites (568, 769, 778, and 802) were missed in the small data set. Using the method of sliding window implemented in SWAPSC program [23], 12 sites (565, 694, 705, 710, 711, 743, 750, 783, 784, 787, 789, and 804) in the large data set were inferred to be under positive selection. Seven of them (705, 710, 711, 750, 783, 787, and 804) were consistent with those identified by the maximum likelihood method of M8 with P > 90% (Fig. 2C). Virtually, the other five sites (565, 694, 743, 784, and 789) had already been inferred by the maximum likelihood method, but they had P < 90% and thus were not considered significant. Moreover, nine sites (715, 723, 726, 748, 770, 771, 801, 807, and 809) inferred by the ML method of M8, were not identified to be under positive selection by the sliding window method. Among all these nine sites, eight sites had P > 99% under M8, except for site 748. It is seen that the maximum parsimony based sliding window approach is more conservative than the maximum likelihood method [29], as the maximum likelihood method identified the sites where dS is equal to 0, as having undergone a positive selection even if the dN ratio is greater than 0, whereas, the sliding window method does not consider such sites. Thereby, in general it can be considered that both methods have obtained nearly the same results. In addition, no saturated positively selected sites have been inferred. 2.3
Location of positively selected sites in the structures
The predicted results of the secondary structure of Vip protein (AAU89707), through the Jnet server (Fig. 1), show that the N-terminus may mainly consist of helices and its C-terminus may be mainly composed of sheets and coils. Sites inferred to be positive selection under the best model M8 both in the large data sets and the small data set have been mapped onto the secondary structure of AAU89707. It is clear
遗传学报
Vol.34 No.7 2007
that all these sites are located from site-705 to site-809 in the Vip protein’s C-terminus (Fig. 1). In all the sites identified to be under positive selection, the majority of them are in the loop region, except for sites 723, 807, and 809, according to the secondary structure. Furthermore, most of these sites are exposed, except for sites 783 and 807 according to the Conseq server. Unfortunately, no whole tertiary structure of the Vip3 protein has been obtained by the method of homology modeling (http://swissmodel. expasy.org//SWISS-MODEL.html). Using the C-terminus 200 amino acids of the Vip protein as input for the Phyre server, a part of the Vip tertiary structure has been obtained. Among them, AAU89707 has the smallest E-value when compared to candidate structures in PDB and is used as a working model to map a part of the positively selected sites, as all these sites, identified to be under positive selection, are located in the C-terminal. The partial tertiary structure obtained, shows that it consists of several loops and β-sheets, and ten sites (748, 750, 770, 771, 783, 787, 801, 804, 807, and 809) are located on it (Fig. 3). Interestingly, most of them are located on different loops and exposed. Moreover, the exceptional sites (787, 807, and 809) are adjacent to the loops.
3
Discussion
Although insecticidal proteins of B. thuringiensis have been widely used in the agricultural fields, forest management, and mosquito control, for decades, there are several agronomically important insects that are less sensitive to their action [2]. A more serious problem associated with the use of the toxins, is the management of resistance development in the target insect pests. Several strategies are being developed to enhance the efficacy of the toxins, such as development of hybrid toxins and search for novel toxin sequences. On the other hand, analysis of the mechanism of toxin/receptor interactions and insect resistance to the www.jgenetgenomics.org
Jinyu Wu et al.: Evidence for Positive Darwinian Selection of Vip Gene in Bacillus thuringiensis
657
Fig. 3 Mapping of the residues (red space-fill) identified to be under positive selection on the Vip tertiary structure (A, C, and D) and the Cry domain II (B, pdb code 1ciy) Three loops in domain II of Cry protein are also shown in red.
toxin can help to develop hybrid toxins or novel toxins, with enhanced insecticidal activity or specificity [3, 4]. Vip proteins have been reported to have insecticidal properties, providing a potent broad spectrum of insect control. The accuracy and power of the maximum likelihood analysis to detect positive selection have been extensively tested, because it can account for variable selection pressures among different sites. This method is fairly reliable and robust [10, 30, 31], although the maximum parsimony based sliding window method is considered more conservative. As some sequences share high identities between each other, the 16 Vip3 sequences have been spliced into one large and one smaller data set for accounting potential influence. By analyzing a large data set and a small data set from Vip3 sequences, the likelihood ratio tests comparing the models M1a versus M2a, M0 versus M3, and M7 www.jgenetgenomics.org
versus M8 have provided significant evidence that nonsynonymous substitution rates vary widely across sites. In addition, no significant differences have been found in the sites identified to be under positive selection, between both data sets. The little disagreement may be because of the sample size and the approach implemented. It is indicated that the Naive Empirical Bayes approach could not reflect the uncertainty of the model parameters in a small data set and the Bayes Empirical Bayes approach is more powerful [20]. Using the maximum parsimony based sliding window method implemented in the SWAPSC program, the evidence of positive selection has also been addressed. These results give strong support to the evidence of positive selection and the positively selected sites, with confidence in the evolution of Vip proteins. Notably, when the positively selected sites (P > 90%) have been mapped, under the best approxima-
658
tion model M8 in both data sets, to the secondary working model (AAU89707), they were all located from site-705 to site-809 in the C-terminus. These probably reflect the differences of critical structural and functional roles of the N- and C-terminus and reveal the evolutionary processes acting on the Vip proteins. Although the function of the N- and C-terminus is still unknown, it is reasonable to think that the Vip proteins keep their function intact, but allow the diversification in some regions of sequence, during the longer evolution process. Moreover, in all the sites identified to be under positive selection, a majority of them are exposed and clustered in the loop regions according to their putative secondary structure. The two methods (maximum likelihood approach and the maximum parsimony based sliding window) that were employed, themselves do not select loop residues [10, 19]. Surface-exposed loops may be involved in receptor-recognition, and therefore selective pressures should lead to their diversification [32] . Furthermore, the partial tertiary structure of the Vip protein (AAU89707) predicted here, exhibits something similar to domain II of the Cry protein (Fig. 3) at the secondary structure level. Both are made of several loops and β-sheets, although the predicted Vip fold is not similar to the fold of the Cry domain II in anyway. Cry domain II consists of three anti-parallel β-sheets, each terminating in surface-exposed loops, which has been demonstrated to participate in receptor recognition and therefore to determine the insect specificity [2, 3]. The maximum likelihood approach has been previously employed to detect evidence of adaptive evolution in δ-endotoxin and has identified 20 positively selected residues, which are generally located on the three loops of domain II (unpublished data). Similarly, when those positively selected sites identified from Vip proteins to their tertiary structure have been mapped, most of them are also located on the exposed loops. Although a definite conclusion cannot be drawn without any further functional and structural information, it can be postulated that posi-
Journal of Genetics and Genomics
遗传学报
Vol.34 No.7 2007
tive selection may play a major role for their functional divergence in the evolutionary events of the Vip proteins, and the sites under positive selection may be related to the insect host range, similar to the Cry proteins. It was shown that gram-positive spore-forming entomopathogenic bacteria could produce a large variety of protein toxins to help them invade, infect, and finally kill their hosts, through their action on the insect midgut [33]. In this study, the Vip protein analyzed is one of such protein toxin. The results can be interpreted as indicating that the potential positive selection pressures driving the diversification of the Vip protein may be an attempt to adapt to the ‘arm race’ between the Vip protein and the targeted insects, or to enlarge their target range, thus resulting in the functional divergence among different Vip proteins. Applying statistical tests of molecular adaptation to nucleotide sequence data can help investigators identify specific amino acid substitutions for further experiments. It is believed that positively selected sites identified in well-sampled datasets, combined with structural and functional information, can provide a valuable framework to understand the mechanism of the Vip protein’s evolution. The positively selected sites identified in this work may be the main candidates for the functional determination of Vip proteins. Further studies on the mechanism of the Vip protein against its host may help us develop new effective insecticidal proteins and delay the insects’ resistance. Acknowledgments: We thank the colleagues in Beijing Genomics Institute, Chinese Academy of Sciences for helpful discussions. References 1 2
Hofte H, Whiteley HR. Insecticidal crystal proteins of Bacillus thuringiensis. Microbiol Rev, 1989, 53: 242−255. Schnepf E, Crickmore N, van Rie J, Lereclus D, Baum J, Feitelson J, Zeigler DR, Dean D H. Bacillus thuringiensis and its pesticidal crystal proteins. Microbiol Mol Biol Rev, 1998,
www.jgenetgenomics.org
Jinyu Wu et al.: Evidence for Positive Darwinian Selection of Vip Gene in Bacillus thuringiensis
3
4
5
6
7
8
9
10
11
12
13 14 15 16
62: 775−806. Saraswathy N, Kumar PA. Protein engineering of δ-endotoxins of Bacillus thuringiensis. Electronic J Biotech, 2004, 7: 178−188. Ferre J, van Rie J. Biochemistry and genetics of insect resistance to Bacillus thuringiensis. Annu Rev Entomol, 2002, 47: 501−533. Estruch JJ, Warren GW, Mullins MA, Nye GJ, Craig JA, Koziel MG. Vip3A, a novel Bacillus thuringiensis vegetative insecticidal protein with a wide spectrum of activities against lepidopteran insects. Proc Natl Acad Sci USA, 1996, 93: 5389−5394. Selvapandiyan A, Arora N, Rajagopal R, Jalali SK, Venkatesan T, Singh SP, Bhatnagar RK. Toxicity analysis of N- and C-terminus-deleted vegetative insecticidal protein from Bacillus thuringiensis. Appl Environ Microbiol, 2001, 67: 5855−5858. Crickmore N, Zeigler D R, Feitelson J, Schnepf E, van Rie J, Lereclus D, Baum J, Dean DH. Revision of the nomenclature for the Bacillus thuringiensis pesticidal crystal proteins. Microbiol Mol Biol Rev, 1998, 62: 807−813. Moellenbeck DJ, Peters ML, Bing JW, Rouse JR, Higgins L S, Sims L, Nevshemal T, Marshall L, Ellis RT, Bystrak PG, Lang BA, Stewart JL, Kouba K, Sondag V, Gustafson V, Nour K, Xu D, Swenson J, Zhang J, Czapla T, Schwab G, Jayne S, Stockhoff B A, Narva K, Schnepf HE, Stelman SJ, Poutre C, Koziel M, Duck N. Insecticidal proteins from Bacillus thuringiensis protect corn from corn rootworms. Nat Biotechnol, 2001, 19: 668−672. Creevey CJ, McInerney JO. An algorithm for detecting directional and non-directional positive selection, neutrality and negative selection in protein coding DNA sequences. Gene, 2002, 300: 43−51. Wong WS, Yang Z, Goldman N, Nielsen R. Accuracy and power of statistical methods for detecting adaptive evolution in protein coding sequences and for identifying positively selected sites. Genetics, 2004, 168: 1041−1051. Thompson JD, Gibson TJ, Plewniak F, Jeanmougin F, Higgins DG. The CLUSTAL_X windows interface: flexible strategies for multiple sequence alignment aided by quality analysis tools. Nucleic Acids Res, 1997, 25: 4876−4882. Xia X, Xie Z, Salemi M, Chen L, Wang Y. An index of substitution saturation and its application. Mol Phylogenet Evol, 2003, 26: 1−7. Xia X, Xie Z. DAMBE: software package for data analysis in molecular biology and evolution. J Hered, 2001, 92: 371−373. Swofford D L. PAUP*: Phylogenetic analysis using parsimony (*and other methods). Sinauer, 1998, Sunderland, MA. Posada D, Crandall KA. MODELTEST: testing the model of DNA substitution. Bioinformatics, 1998, 14: 817−818. Felsenstein J. PHYLIP−Phylogeny Inference Package. Cladistics, 1989, 5: 164−166.
www.jgenetgenomics.org
659
17 Nielsen R, Yang Z. Likelihood models for detecting positively selected amino acid sites and applications to the HIV-1 envelope gene. Genetics, 1998, 148: 929−936. 18 Yang Z. PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci, 1997, 13: 555−556. 19 Yang Z, Nielsen R. Codon-substitution models for detecting molecular adaptation at individual sites along specific lineages. Mol Biol Evol, 2002, 19: 908−917. 20 Yang Z, Wong WS, Nielsen R. Bayes empirical bayes inference of amino acid sites under positive selection. Mol Biol Evol, 2005, 22: 1107−1118. 21 Yang Z, Bielawski JP. Statistical methods for detecting molecular adaptation. Trends Ecol Evol, 2000, 15: 496−503. 22 Akaike H. A new look at the statistical model identification. IEEE Trans Autom Contr, 1974, 19: 716−723. 23 Fares MA. SWAPSC: sliding window analysis procedure to detect selective constraints. Bioinformatics, 2004, 20 2867−2868. 24 Cuff JA, Barton GJ. Evaluation and improvement of multiple sequence methods for protein secondary structure prediction. Proteins, 1999, 34: 508−519. 25 Berezin C, Glaser F, Rosenberg J, Paz I, Pupko T, Fariselli P, Casadio R, Ben-Tal N. ConSeq: the identification of functionally and structurally important residues in protein sequences. Bioinformatics, 2004, 20: 1322−1324. 26 Guex N, Peitsch MC. SWISS-MODEL and the Swiss-PdbViewer: an environment for comparative protein modeling. Electrophoresis, 1997, 18: 2714−2723. 27 Nicholls A, Sharp KA, Honig B. Protein folding and association: insights from the interfacial and thermodynamic properties of hydrocarbons. Proteins, 1991, 11: 281−296. 28 Burnham KP, Anderson DR. Model Selection and Multimodel Inference: a practical information-theoretic approach. Springer-Verlag New York, 2002. 29 Suzuki Y, Nei M. Simulation study of the reliability and robustness of the statistical methods for detecting positive selection at single amino acid sites. Mol Biol Evol, 2002, 19: 1865−1869. 30 Aagaard JE, Phillips P. Accuracy and power of the likelihood ratio test for comparing evolutionary rates among genes. J Mol Evol, 2005, 60: 426−433. 31 Yang Z. The power of phylogenetic comparison in revealing protein function. Proc Natl Acad Sci USA, 2005, 102: 3179−3180. 32 Jiggins FM, Hurst GD, Yang Z. Host-symbiont conflicts: positive selection on an outer membrane protein of parasitic but not mutualistic Rickettsiaceae. Mol Biol Evol, 2002, 19: 1341−1349. 33 de Maagd RA, Bravo A, Berry C, Crickmore N, Schnepf HE. Structure, diversity, and evolution of protein toxins from spore-forming entomopathogenic bacteria. Annu Rev Genet, 2003, 37: 409−433.
660
Journal of Genetics and Genomics
遗传学报
Vol.34 No.7 2007
苏云金杆菌 Vip 蛋白经历正向达尔文选择的证据 吴金雨1, *, 赵方庆2, *, 白 洁1, 邓 刚1, 秦 松2, 包其郁1 1. 生物医学信息研究所/浙江省医学遗传学重点实验室, 温州医学院, 温州 325000; 2. 中国科学院海洋研究所, 中国科学院, 青岛 266071 摘 要: 营养期杀虫蛋白(Vip)是苏云金杆菌在营养期所产生的一类新型杀虫蛋白, 代表了第二代转基因杀虫蛋白, 它能在一 定程度上克服许多害虫对δ-内毒素低敏感或者不敏感的缺陷。但是, 目前和已经深入研究的δ-内毒素相比较, 有关 Vip 蛋 白结构和功能关系方面的报道还甚少。本文采用最大似然方法和基于最大简约的滑窗分析对 Vip 蛋白的分子进化机制进行 了评价。结果发现 Vip 蛋白在进化过程当中经历了正选择, 并采用贝叶斯方法确定了 16 个正选择氨基酸残基。有意思的是 所有这些正选择残基都位于 Vip 蛋白 C 端从 705 到 809 的区域。当把这些正选择残基定位到二级结构和三级结构时, 发现 绝大部分正选择残基都暴露在 Vip 蛋白空间结构的表面并且聚集在环的区域。推测 Vip 蛋白分子进化的机制应该是受到了 正选择压力而不是功能约束的松弛。导致 Vip 蛋白 C 端多样性的潜在正选择压力可能是 Vip 蛋白为了在和目标昆虫之间竞 争取得优势, 或者是为了扩大 Vip 蛋白的杀虫范围。文中确定的经历了正选择残基很有可能是和昆虫宿主范围有关, 因此可 以为今后研究 Vip 蛋白的结构和功能提供相应的靶点。 关键词: 苏云金杆菌; 正选择; 滑窗分析; 最大似然; Vip 蛋白 作者简介: 吴金雨(1982−), 男, 浙江温州人, 硕士研究生, 研究方向: 生物信息学。E-mail:
[email protected]; 赵方庆(1980−), 男, 山东荷泽人, 博士研究生, 研究方向: 分子进化。E-mail:
[email protected] *表示并列第一作者。
www.jgenetgenomics.org