Computational Biology and Chemistry 52 (2014) 43–50
Contents lists available at ScienceDirect
Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem
Genome-wide evidence of positive selection in Bacteroides fragilis Sumio Yoshizaki a , Toshiaki Umemura c, Kaori Tanaka a,d, Kunitomo Watanabe d , Masahiro Hayashi d, Yoshinori Muto a,b, * a
United Graduate School of Drug Discovery and Medical Information Sciences, Gifu University, 1-1, Yanagido, Gifu 501-1193, Japan Department of Functional Bioscience, Gifu University School of Medicine, 1-1, Yanagido, Gifu 501-1193, Japan c Graduate School of Medicine and Pharmaceutical Sciences, University of Toyama, 2630 Sugitani, Toyama 930-0194, Japan d Division of Anaerobe Research, Life Science Research Center, Gifu University, 1-1, Yanagido, Gifu 501-1194, Japan b
A R T I C L E I N F O
A B S T R A C T
Article history: Received 2 May 2014 Received in revised form 2 September 2014 Accepted 2 September 2014 Available online 6 September 2014
We used an evolutionary genomics approach to identify genes that are under lineage-specific positive selection in six species of the genus Bacteroides, including three strains of pathogenic Bacteroides fragilis. Using OrthoMCL, we identified 1275 orthologous gene clusters present in all eight Bacteroides genomes. A total of 52 genes were identified as under positive selection in the branch leading to the B. fragilis lineage, including a number of genes encoding cell surface proteins such as TonB-dependent receptor. Three-dimensional structural mapping of positively selected sites indicated that many residues under positive selection occur in the extracellular loops of the proteins. The adaptive changes in these positively selected genes might be related to dynamic interactions between the host immune systems and the surrounding intestinal environment. ã 2014 Elsevier Ltd. All rights reserved.
Keywords: Bacteroides fragilis Molecular evolution Positive selection Pathogenicity
1. Introduction Bacteroides fragilis is a Gram-negative anaerobe and a component of the normal intestinal flora in humans. Although the viable cell number of B. fragilis in fecal isolates is 10- to 100-fold smaller than those of other intestinal Bacteroides sp. (Salyers, 1984), it is the pathogenic anaerobe most frequently isolated from intra-abdominal infections, abscesses, and blood (Finegold, 1989; Snydman et al., 2007). The pathogenic potential of B. fragilis has been linked to several virulence factors, such as the capsular polysaccharide (Kasper et al., 1977; Onderdonk et al., 1977), some proteases (Duerden, 1994) and the B. fragilis toxin (Sears et al., 2006). Moreover, factors contributing to the resistance of B. fragilis to oxidative stress and extreme aero-tolerance, each of which is an important virulence factor for extraintestinal infections, have also been reported (Sund et al., 2008). While these factors have been thought to be important for pathogenicity, at the present time their relative contributions are not known, and other possible mechanisms must be considered. In the evolution of many microorganisms, positive selection and recombination are important evolutionary driving forces
* Corresponding author at: Department of Functional Bioscience, Gifu University School of Medicine, Gifu 501-1193, Japan. Tel.: +81 58 293 3241; fax: +81 58 293 3241. E-mail address:
[email protected] (Y. Muto). http://dx.doi.org/10.1016/j.compbiolchem.2014.09.001 1476-9271/ ã 2014 Elsevier Ltd. All rights reserved.
(Aguileta et al., 2009; Perfeito et al., 2007). There are many reports suggesting that positive selection contributes to the evolution of virulence genes in bacterial pathogens (Suzuki and Stanhope, 2012) such as Escherichia coli (Peek et al., 2001), Neisseria meningitidis (Andrews and Gojobori, 2004; Urwin et al., 2002), Pseudomonas aeruginosa (Smith et al., 2005), Streptococcus pneumoniae (Stanhope et al., 2008), and Helicobacter pylori (Ogura et al., 2007). Thus, positive selection for the fixation of advantageous point mutations is an important force in the adaptation of pathogenic microorganisms to different environmental niches, in terms of both optimizing infection processes and escaping host immune responses (Toft and Andersson, 2010). Genome-wide studies on positive selection and recombination in bacterial whole genomes have further contributed to a comprehensive understanding of the evolution of important pathogens, including Streptococcus spp. (Lefebure and Stanhope, 2007), Salmonella serotype (Soyer et al., 2009), Campylobacter spp. (Lefebure and Stanhope, 2009), Actinobacillus pleuropneumoniae (Xu et al., 2011), E. coli (Petersen et al., 2007), Staphylococcus (Guinane et al., 2010), and Mycobacterium tuberculosis (Zhang et al., 2011). No genomewide analyses of positive selection in Bacteroides sp. have been reported to date. To improve our understanding of the evolutionary dynamics and functional differentiation of Bacteroides sp., we performed full genome analyses for positive selection using the completed and published genome sequences for six Bacteroides sp., including three strains of B. fragilis. We focused on the evolutionary
44
S. Yoshizaki et al. / Computational Biology and Chemistry 52 (2014) 43–50
characterizations of core genome genes that are shared by the eight Bacteroides genomes. The results of our analysis of site- and lineage-specific selection patterns provide insights into the evolution of the core genomes in these Bacteroides sp. and information regarding the potential functional diversification of genes related to bacterial pathogenicity. 2. Materials and methods 2.1. Genome dataset and identification of orthologous genes Eight available annotated Bacteroides genomes, representing six different species, were used in this study (Table 1). The genome sequences were downloaded from the Integrated Microbial Genomes (IMG) database in FASTA format (http://img.jgi.doe. gov/cgi-bin/w/main.cgi). Gene annotations with COG (Clusters of Orthologous Groups) functional classification were also retrieved from the IMG database. Protein coding sequences were extracted from FASTA files, and orthologs were determined using OrthoMCL (v1.4) (Li et al., 2003). Genes with premature stop codons or with a sequence shorter than 50 codons were excluded from the subsequent analyses. OrthoMCL uses reciprocal best BLAST scores in a normalized similarity matrix that is analyzed using an additional step of Markov Clustering to improve sensitivity and specificity. OrthoMCL was run with a BLAST E-value cut-off of 1e-05, and an inflation parameter of 1.5. The OrthoMCL output was used to construct a table describing the genome gene content. We used this table to plot Venn diagrams and to delimit the distribution of genes within the eight Bacteroides genomes included in this analysis. Venn diagrams were plotted with the Vennerable R package (http://r-forge.r-project.org/projects/vennerable). Core genes (core genome) were defined as the orthologous genes shared by all Bacteroides genomes. To increase the accuracy and power of the selection analyses, an ortholog cluster was excluded from the core genes if the length of any gene was lower than 50% of the maximum length or the cluster contained more than one gene from each genome. 2.2. Alignment, recombination detection and phylogenetic inference Core gene orthologs grouped in the same clusters were aligned using the program MUSCLE (Edgar, 2004) with default settings. Multiple sequence alignments were carried out on amino acid sequences from each orthologous group, followed by conversion to nucleotide sequence alignments using the PAL2NAL software package (Suyama et al., 2006). Since recombined fragments among aligned codon sequences have a profound effect on the detection of positive selection (Anisimova et al., 2003) and phylogenetic inference, we tested for recombination signals between sequences in the alignment of orthologous genes. The alignments were tested for intragenic recombination based on single breakpoint (SBP)
analysis and KH test in the HyPhy package (Kosakovsky Pond et al., 2006; Pond et al., 2005; Suzuki and Stanhope, 2012). To investigate the phylogenetic relationships of the Bacteroides sp., we concatenated alignments of the 1214 ortholog clusters which were created by excluding the recombinant gene clusters from core gene clusters. The resultant 1199,271 nucleotide alignment was used to reconstruct a genome-wide tree (species tree) using PhyML (Phylogenetic Estimation Using Maximum Likelihood) (Guindon et al., 2010; Guindon and Gascuel, 2003) with the GTR + Gamma substitution model of nucleotide evolution, and the Subtree Pruning-Regrafting (SPR) branch-swapping method. Branch support was calculated using the non-parametric Shimodaira-Hasegawa-like (SH-like) approximate likelihood ratio test (aLRT) as implemented in the PhyML program. 2.3. Analysis of positive selection The maximum likelihood method was used to test for traces of positive selection and to infer amino acid sites under positive selection using the codeml program of PAML version 4.5 (Yang, 2007). We used the branch-site models, which allow the v ratio to vary among sites and lineages (Zhang et al., 2005). Branch-site model A assumes the following four classes of codon sites: class 0 with 0 < v0 < 1 in all branches; class 1 with v1 = 1 in all branches; class 2a with foreground v2 1 but background 0 < v0 < 1; and class 2b with a foreground v2 1 but background v1 = 1. Null model A1 was the same as A but with the foreground v2 constrained to 1. The model allowing positive selection (model A) was tested using a likelihood ratio test (LRT) that was compared to a x2 statistic with one degree of freedom. Likelihoods were estimated based on the species tree, and each lineage leading to the six different Bacteroides sp. was tested as foreground. Genes for which significant positive selection was detected were inspected for alignment errors potentially affecting the results of this analysis. If necessary, the alignments were manually modified and the codeml analysis repeated. Gene-specific trees were constructed for each positively selected gene, and the codeml analyses were re-run if the gene-specific tree differed from the species tree. A PAML analysis with gene-specific trees confirmed all positive selections detected using the species tree. Correction for multiple testing was performed using the procedure reported by Benjamini and Hochberg (Benjamini and Hochberg, 1995). For all genes tested for positive selection, we calculated q-values from p-values using the R package QVALUE with the proportion of true null hypothesis set to 1 (p0 = 1) (Storey and Tibshirani, 2003). A false discovery rate (FDR) of 2% was basically used for the positive selection analyses. A binomial test was used to estimate the associations between each COG category and the frequency of positive selection. In the case of alternative models that allow for positive selection, we used the Bayes Empirical Bayes approach to calculate the posterior probability (PP) that each codon evolved under
Table 1 Bacterial strains used. Bacteroides strain B. B. B. B. B. B. B. B.
fragilis 638 R fragilis NCTC9343 fragilis YCH46 helcogenes P36–108 salanitronis DSM18170 thetaiotaomicron VPI-5482 vulgatus ATCC8482 xylanisolvens XB1A
GenBank accession No.
No. of CDS
Genome size (Mbp)
GC%
NC_016776 NC_003228 NC_006347 NC_014933 NC_015164 NC_004663 NC_009614 FP929033
4417 4403 4730 3436 3838 4917 4195 4466
5.373 5.241 5.310 3.998 4.308 6.293 5.163 5.976
43.4 43.1 43.2 44.7 46.5 42.9 42.2 41.9
CDS: Coding sequence; GC%: Guanine plus cytosine content.
S. Yoshizaki et al. / Computational Biology and Chemistry 52 (2014) 43–50
45
Fig. 3 shows the distribution of the orthologous clusters of genes among the eight Bacteroides genomes; 46% of the genes in the genome were present in only one genome, which represented lineage-specific genes. Next to the lineage-specific genes, the 13% of genes shared by all eight Bacteroides genomes comprised a coherent core genome (Fig. 3). To exclude the possibility that recombination or horizontal gene transfer within the set of core genome genes affected our phylogenetic and positive selection analysis, we tested genes for evidence of recombination among Bacteroides sp., using the single breakpoint analysis and KH test as implemented in HyPhy (Pond et al., 2005). This method compares a likelihood model that assumes a single recombination breakpoint with a different topology on either side, with a model that assumes no recombination. In cases in which we found support for the model with recombination, we used the KH test for incongruence (Kishino and Hasegawa, 1989), as implemented in HyPhy to determine whether the level of support was significant. The HyPhy analysis detected significant recombination breakpoints with a p-value <0.05 in only 61 of the 1275 core genome genes. These recombinant genes were not included in our positive selection analysis using the whole coding sequences. Instead, the recombinant gene alignments were broken into two gene fragments at the recombination breakpoint, and the positive selection was assessed on each split alignment separately by using branch-site models as described below.
positive selection (Yang et al., 2005). The codeml program was used to calculate the PP of each amino acid site that came from the v > 1 class. We used a cutoff of PP > 0.95 to identify sites under positive selection. 2.4. Protein tertiary structure analysis To construct the three-dimensional (3D) structure models of the proteins encoded by the genes that showed evidence of positive selection, homology modeling was performed by using the Phyre2 (Protein Homology/analogY Recognition Engine) server (http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index) (Kelley and Sternberg, 2009). The amino acid sequences of the proteins were submitted to the Phyre2 server for modeling in the intensive mode, which is based on a profile–profile alignment algorithm using multiple templates. Six templates were selected based on heuristics to maximize confidence, percentageidentity, and alignment coverage. The visualizations of the positively selected sites mapped onto the structure were created in the PyMOL Molecular Graphics System, version 1.3 (Schrödinger, LLC). 3. Results and discussion 3.1. Comparative analysis of the Bacteroides core genome Using OrthoMCL, we identified 1275 orthologous genes present in all eight Bacteroides genomes, representing an initial definition of the core genome. The numbers of protein-coding genes per genome within the various species and strains of Bacteroides varied within the range of 3436–4917 (Table 1), but the gene composition of these genomes was much more variable. Based on the gene content table obtained by OrthoMCL output, we found that three strains of B. fragilis share about 75% of their genes, and three different species of Bacteroides share only a smaller number of their genes (Fig. 1). Our comparison of B. fragilis against two closely related pathogenic Bacteroides sp., B. vulgatus and B. thetaiotaomicron (Finegold, 1989), defined a common set of 2004 genes. On the other hand, the comparison of B. fragilis against two closely related non-pathogenic Bacteroides sp., B. xylanisolvens and B. salanitronis, defined a common set of 1497 genes (Fig. 1). These patterns of gene co-occurrence do not coincide with the evolutionary distances depicted in the phylogenetic tree (Fig. 2), but may have been somewhat influenced by the presence or absence of pathogenicity.
3.2. Positive selection in the Bacteroides core genome We inferred a species tree of the eight Bacteroides sp. based on concatenated core genome genes excluding recombinant genes (Fig. 2). Using this species tree, we carried out the analysis of positive selection with branch-site models implemented in the PAML package (Zhang et al., 2005). Overall, evidence of positive selection was abundant across all Bacteroides lineages, but it was highly variable among lineages (Table 2). The number of genes under positive selection was greatest along the lineages leading to B. salanitronis and B. vulgatus. The 167 genes under positive selection on the B. salanitronis branch represent 13.1% of the core genome genes included in our analysis. The smallest numbers of genes determined to be under positive selection were on the B. xylanisolvens and B. thetaiotaomicron lineages. These results indicate that the number of genes under positive selection per lineage was roughly related to the branch length, and the low number of genes in the B. xylanisolvens
B.fragilis NCTC9343
B.fragilis NCTC9343
B.fragilis NCTC9343
403 1281 1483
171
142
464
249 3282 402
709 2004
747
939
1359
1416
1125 298
391 B.fragilis 638R
309 1497
182 B.fragilis YCH46
B.vulgatus ATCC 8482
B.thetaiotaomicron VPI−5482
B.xylanisolvens XB1A
B.salanitronis BL78
Fig. 1. Venn diagram showing the number of genes for three sets of three taxa. Taxa of the same species (B. fragilis) are shown at left, and taxa of different species are shown at center and right. The surfaces are approximately proportional to the number of genes.
46
S. Yoshizaki et al. / Computational Biology and Chemistry 52 (2014) 43–50
mic tao aio het B. t
metabolism”, “Intracellular trafficking, secretion and vesicular transport” and “Defense mechanisms”—were significantly overrepresented among the total genes under positive selection (P < 0.05; one-sided binomial test). We also found that although positively selected genes in the COG category “Cell wall/membrane/envelope biogenesis” were not significantly enriched, many genes with positive selection in other COG categories were suggested to encode proteins localized on the surface/membrane according to the gene annotations (Table 3). Of the 52 positively selected genes, 17 genes (33%) were connected to the surface/membrane structures (Table 3). These findings are consistent with previous genome-wide studies which showed that many proteins that undergo positive selection are exposed on the cell surface or localized on the surface membrane (Orsi et al., 2008; Petersen et al., 2007; Xu et al., 2011). The positive selection and rapid evolution of genes involved in the cell surface/ membrane are likely to be important to allow B. fragilis to adapt to different and rapidly changing environments, including competing bacteria in the human intestinal tract as well as effectors of innate and adaptive immune systems in hosts with pathogenic infections (Duerkop et al., 2009; Massari et al., 2003).
XB 1
43
ens
93
olv
A
NC TC
nis
2 548
xyla
IVP ron
B.
1.
0
B. f
1.00 1.0
B. fragilis YC
f B.
B. helcogenes DSM20613
ra
gil
is
00
H46
ra gi lis 63
1.
00
8R
48
2
0 B. fragilis YCH46 B. fragilis 638R
0.99
ulg B. v
0.1
B. fragilis NCTC9343
1.00
C8
817
TC
1 SM
sA
la
is D
atu
a B. s
on nitr
Fig. 2. A genome-scale estimate of the phylogenetic relationships among species of Bacteroides. The maximum likelihood tree was obtained after concatenating the 1214 core genome genes. The numbers at the branches show aLRT values using an SH-like calculation as implemented in the PhyML program. The inset shows the phylogenetic relationships of the B. fragilis lineage.
3.3. Protein tertiary structure analysis
and B. thetaiotaomicron lineages may be the consequence of a short period of evolution. To assess the possible implications of positive selection with Bacteroides pathogenicity, we conducted a detailed investigation of positively selected genes in the branch leading to the B. fragilis lineage. Based on the LRT statistic for the branch-site models (FDR < 2%), a total of 48 genes were identified as being under positive selection (Table 3). In addition, we also detected positive selection in 4 recombinant genes using alignments of the gene fragment (Table 3). The sequence alignments of these positively selected genes are given in the supplementary data. As shown in Fig. 4, the assignment to COG functional classification indicated that genes in three of the COG categories—i.e., “Lipid transport and
5000
To obtain more insight into the roles that positive selection might play, we mapped the positively selected amino acid sites on the 3D model of the proteins. Because many of the genes involved in the cell surface/membrane exhibited positive selection in the B. fragilis lineage, we focused on two representative proteins localized in the outer membrane. Fig. 5 shows the 3D structures of the TonB-dependent receptor (Cluster ID 1083 in Table 3) and outer membrane protein/Omp85 (Cluster ID 1117 in Table 3), which were based on homology modeling by using the Phyre2 server. The TonB-dependent receptor, which was predicted to be constructed with a beta-barrel structure, showed strong evidence of positive selection with a low q-value (Table 3). According to our analysis using the Bayes Empirical Bayes approach, nine sites in the TonB-dependent receptor were inferred to have v > 1 with high posterior probabilities (PP > 0.95) under branch-site model A: 212 L, 267P, 391 M, 424 R, 473 S, 537Q, 561 R, 575P, and 610 L.
46%
Number of genes
4500 4000 3500 3000 2500 2000 1500
12%
13% 10%
1000 500 0
1
2
3
4
5
6
7
8
N um ber of genom es Fig. 3. The frequency of genes within the eight genomes included in this analysis. Genes present in a single genome are considered lineage-specific genes, while those at the opposite end of the scale—that is, genes found in all eight genomes—represent the Bacteroides core genome.
S. Yoshizaki et al. / Computational Biology and Chemistry 52 (2014) 43–50
47
Table 2 The number of genes found under positive selection in Bacteroides lineages. Lineage B. B. B. B. B. B. a b
fragilisa helcogenes P36–108 salanitronis DSM18170 thetaiotaomicron VPI-5482 vulgatus ATCC8482 xylanisolvens XB1A
No. of genes with evidence for positive selection
Percentage in total core genome genes (%)
52b 88 167 10 164 7
3.8 6.9 13.1 0.8 12.9 0.5
Lineage leading to the three B. fragilis strains. Including positively selected genes detected among the recombinant genes.
Table 3 Genes that show evidence of positive selection on the B. fragilis lineage. Cluster ID
COG categorya
Gene annotation
2Dln Lb
q-value
170 462 697 269 471 787 1083 169 644 1136 268 726 745 168 832 189 978 535 904 1205 851 973 340 1059 465 770 1206 596 626 1117 163 1114 180 1153 361 1245 788 702 399 868 611 311 816 423 1013 506 915 946 186c 176c 1095c 1192c
[S] [I] [T] [M] [C] [C] [P] [P] [S] [P] [S] [T] [C] [L] [E] [J] [G] [R] [P] [F] [I] [G] [G] [L] [E] [K] [Q] [F] [S] [M] [U] [H] [E] [O] [M] [E] [P] [M] [I] [O] [E] [V] [M] [M] [C] [U] [U] [I] [R] [F] [V] [J]
Tetratricopeptide repeat protein Acyl-CoA dehydrogenase Two component system sensor histidine kinase Lipoprotein Ferredoxin oxidoreductase Hypothetical protein TonB dependent receptor TonB dependent receptor Membrane protein Alkaline phosphatase Hypothetical protein Two-component system, sensor kinase Na+translocating NADH-quinone reductase subunit F DNA topoisomerase I Dipeptidase Polyribonucleotide nucleotidyltransferase Hypothetical protein Amidohydrolase Oxalate/formate antiporter Formyl transferase O-succinylbenzoate-CoA ligase Glyceraldehyde 3-phosphate dehydrogenase Phosphoglucomutase DNA polymerase III alpha subunit Aminopeptidase C RNA-binding protein Acyl carrier protein Amidophosphoribosyltransferase Glucose-1-phosphate adenylyltransferase Outer membrane protein/Omp85 Protein-export transmembrane SecDF protein Riboflavin biosynthesis protein Aminopeptidase Peptidyl-prolyl cis-trans isomerase Hypothetical protein Dipeptidyl peptidase IV Sulfate adenylyltransferase subunit 1 Penicillin-binding protein 1 A Cardiolipin synthetase META domain protein Aminopeptidase ABC-2 type transporter Transmembrane glycosyltransferase Hypothetical protein NADH-quinone oxidoreductase chain C/D Signal recognition particle protein Tetratricopeptide repeat protein YegS//BmrU family lipid kinase Hypothetical protein Phosphoribosyl aminoimidazole carboxylase ABC transporter Translation initiation factor IF-2
37.49512 33.69269 31.02918 29.99908 24.33866 24.30007 23.01494 22.20886 20.10175 19.83070 19.71620 19.15439 17.71081 17.14196 16.41834 16.28441 15.82599 15.75123 15.67711 15.66830 15.61942 15.58608 15.48465 14.90654 14.77003 14.30354 13.54749 13.52283 13.49570 13.38540 13.23510 13.17641 12.93830 12.92560 12.64840 12.57262 12.44627 12.42766 12.30724 12.12239 11.98312 11.97787 11.96647 11.80581 11.71577 11.61612 11.35812 11.09953 25.71775 14.93504 13.62999 13.64192
1.16E-06 3.47E-06 9.65E-06 1.24E-05 0.00015903 0.00015903 0.00026584 0.00035391 0.00094479 0.00094577 0.00094577 0.00116340 0.00229070 0.00286911 0.00381506 0.00381506 0.00396907 0.00396907 0.00396907 0.00396907 0.00396907 0.00396907 0.00401333 0.00523281 0.00540919 0.00643397 0.00893195 0.00893195 0.00893195 0.00917682 0.00964147 0.00965289 0.01014399 0.01014399 0.01145469 0.01162270 0.01166327 0.01166327 0.01215106 0.01311172 0.01334565 0.01334565 0.01334565 0.01424485 0.01464575 0.01514261 0.01705632 0.01922939 0.00011435 0.00523281 0.00859239 0.00859239
a b c
The abbreviations of the COG function categories were assigned based on Fig. 4. 2Dln L denotes the statistic of the likelihood ratio test. Positive selection was detected using alignment of the gene fragment.
48
Proportion of genes (%)
S. Yoshizaki et al. / Computational Biology and Chemistry 52 (2014) 43–50
14 12 10
*
*
8 6
*
4 2 0
COG Function Category Fig. 4. COG category distribution of positively selected genes in the branch leading to the B. fragilis lineage. The COG category codes are indicated on the abscissa. The proportion of genes in each COG category is shown on the ordinate. Positively selected genes defined at 5% FDR were used in this analysis (n = 95). Filled bars indicate genes under positive selection in the branch leading to the B. fragilis lineage. Open bars are for all core genome genes (n = 1275). COG categories that are significantly enriched in the set of positively selected genes relative to all core genome genes are indicated by an asterisk (P < 0.05, binomial test).
We mapped nine positively selected sites onto the 3D structure of the TonB-dependent receptor (Fig. 5A). Seven of the nine amino acids identified as having experienced positive selection were located in extracellular loops of the protein; only two positively selected amino acid residues were located in the b-strand facing the membrane boundary. In Omp85, which was predicted to have a beta-barrel structure positioned in the outer membrane, only one amino acid site was inferred to be positively selected. The positively selected site in Omp85 (829 A) was also located in the extracellular loop (Fig. 5B).
As was elucidated some years ago (Schauer et al., 2008), TonB-dependent receptors play important roles in the physiological uptake of various nutrients such as iron complexes and vitamin B12. Since the survival of pathogenic bacteria in their respective hosts depends on their ability to compete for nutrients such as iron (Miethke and Marahiel, 2007), TonB-dependent receptors might be important for pathogenicity (Koebnik, 2005). Another study demonstrated that a TonB-dependent receptor of B. fragilis binds to plasmatic fibronectin, suggesting the importance of this receptor as an adhesion molecule to host tissue (Pauer et al., 2009). The
Fig. 5. Three-dimensional structures of positively selected proteins. Structural models of the TonB-dependent receptor (A) and Omp85 (B) were constructed by using the Phyre2 server. Sites under positive selection identified with the branch-site model are highlighted in the predicted structure with their side chains shown (red spheres). (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
S. Yoshizaki et al. / Computational Biology and Chemistry 52 (2014) 43–50
evolutionary analysis performed in the present study indicated that many residues under positive selection in TonB-receptor occur in the extracellular loops. It is thus interesting to speculate that positively selected amino acids might be implicated in efficient nutrient recognition and host cell attachment during pathogenesis. In addition, Omp85 as a surface antigen was reported to interact with host immune systems and to be positively selected in E. coli (Fitzpatrick and McInerney, 2005). Together, the above observations and the present results further support the above-mentioned hypothesis that positively selected proteins in B. fragilis most likely interact with host immune systems or the surrounding intestinal environment and contribute greatly to the pathogenicity of this anaerobic bacterium. 4. Conclusion Our genome-wide analysis defined the core-genome genes of the genus Bacteroides, including three B. fragilis pathogenic strains. Based on maximum likelihood codon substitution models, we identified lineage-specific positive selection in a wide range of genes from Bacteroides sp., indicating that positive selection contributed to the evolution of the Bacteroides core genome. In particular, genes encoding proteins associated with the surface/ membrane, such as TonB-dependent receptor and Omp85, are the primary targets of positive selection. The adaptive changes in these positively selected genes might be related to dynamic interactions caused by the host immune and defense systems. Although further experimental evidence is needed to test our functional predictions on adaptive genes, the list of genes identified as being under positive selection may provide targets for further research into the mechanisms of the host-pathogen interactions in B. fragilis. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.compbiolchem. 2014.09.001. References Aguileta, G., Refregier, G., Yockteng, R., Fournier, E., Giraud, T., 2009. Rapidly evolving genes in pathogens: methods for detecting positive selection and examples among fungi, bacteria, viruses and protists. Infect. Genet. Evol. 9, 656–670. Andrews, T.D., Gojobori, T., 2004. Strong positive selection and recombination drive the antigenic variation of the PilE protein of the human pathogen Neisseria meningitidis. Genetics 166, 25–32. Anisimova, M., Nielsen, R., Yang, Z., 2003. Effect of recombination on the accuracy of the likelihood method for detecting positive selection at amino acid sites. Genetics 164, 1229–1236. Benjamini, Y., Hochberg, Y., 1995. Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. Roy. Stat. Soc. B Med. 289–300. Duerden, B.I., 1994. Virulence factors in anaerobes. Clin. Infect. Dis. 18 (Suppl 4), S253–259. Duerkop, B.A., Vaishnava, S., Hooper, L.V., 2009. Immune responses to the microbiota at the intestinal mucosal surface. Immunity 31, 368–376. Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. Finegold, S., 1989. Anaerobic Infections in Humans. Access Online via Elsevier. Fitzpatrick, D.A., McInerney, J.O., 2005. Evidence of positive Darwinian selection in Omp85, a highly conserved bacterial outer membrane protein essential for cell viability. J. Mol. Evol. 60, 268–273. Guinane, C.M., Ben Zakour, N.L., Tormo-Mas, M.A., Weinert, L.A., Lowder, B.V., Cartwright, R.A., Smyth, D.S., Smyth, C.J., Lindsay, J.A., Gould, K.A., Witney, A., Hinds, J., Bollback, J.P., Rambaut, A., Penades, J.R., Fitzgerald, J.R., 2010. Evolutionary genomics of Staphylococcus aureus reveals insights into the origin and molecular basis of ruminant host adaptation. Genome Biol. Evol. 2, 454–466. Guindon, S., Dufayard, J.F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O., 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321. Guindon, S., Gascuel, O., 2003. A simple fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704.
49
Kasper, D.L., Onderdonk, A.B., Bartlett, J.G., 1977. Quantitative determination of the antibody response to the capsular polysaccharide of Bacteroides fragilis in an animal model of intraabdominal abscess formation. J. Infect. Dis. 136, 789–795. Kelley, L.A., Sternberg, M.J., 2009. Protein structure prediction on the Web: a case study using the Phyre server. Nat. Protoc. 4, 363–371. Kishino, H., Hasegawa, M., 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in hominoidea. J. Mol. Evol. 29, 170–179. Koebnik, R., 2005. TonB-dependent trans-envelope signalling: the exception or the rule? Trends Microbiol. 13, 343–347. Kosakovsky Pond, S.L., Posada, D., Gravenor, M.B., Woelk, C.H., Frost, S.D., 2006. GARD: a genetic algorithm for recombination detection. Bioinformatics 22, 3096–3098. Lefebure, T., Stanhope, M.J., 2007. Evolution of the core and pan-genome of Streptococcus: positive selection, recombination, and genome composition. Genome Biol. 8, R71. Lefebure, T., Stanhope, M.J., 2009. Pervasive: genome-wide positive selection leading to functional divergence in the bacterial genus Campylobacter. Genome Res. 19, 1224–1232. Li, L., Stoeckert Jr., C.J., Roos, D.S., 2003. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome Res. 13, 2178–2189. Massari, P., Ram, S., Macleod, H., Wetzler, L.M., 2003. The role of porins in neisserial pathogenesis and immunity. Trends Microbiol. 11, 87–93. Miethke, M., Marahiel, M.A., 2007. Siderophore-based iron acquisition and pathogen control. Microbiol. Mol. Biol. Rev. 71, 413–451. Ogura, M., Perez, J.C., Mittl, P.R., Lee, H.K., Dailide, G., Tan, S., Ito, Y., Secka, O., Dailidiene, D., Putty, K., Berg, D.E., Kalia, A., 2007. Helicobacter pylori evolution: lineage- specific adaptations in homologs of eukaryotic Sel1-like genes. PLoS Comput. Biol. 3, e151. Onderdonk, A.B., Kasper, D.L., Cisneros, R.L., Bartlett, J.G., 1977. The capsular polysaccharide of Bacteroides fragilis as a virulence factor: comparison of the pathogenic potential of encapsulated and unencapsulated strains. J. Infect. Dis. 136, 82–89. Orsi, R.H., Sun, Q., Wiedmann, M., 2008. Genome-wide analyses reveal lineage specific contributions of positive selection and recombination to the evolution of Listeria monocytogenes. BMC Evol. Biol. 8, 233. Pauer, H., Ferreira Ede, O., dos Santos-Filho, J., Portela, M.B., Zingali, R.B., Soares, R. M., Domingues, R.M., 2009. A TonB-dependent outer membrane protein as a Bacteroides fragilis fibronectin-binding molecule. FEMS Immunol. Med. Microbiol. 55, 388–395. Peek, A.S., Souza, V., Eguiarte, L.E., Gaut, B.S., 2001. The interaction of protein structure, selection, and recombination on the evolution of the type1 fimbrial major subunit (fimA) from Escherichia coli. J. Mol. Evol. 52, 193–204. Perfeito, L., Fernandes, L., Mota, C., Gordo, I., 2007. Adaptive mutations in bacteria: high rate and small effects. Science 317, 813–815. Petersen, L., Bollback, J.P., Dimmic, M., Hubisz, M., Nielsen, R., 2007. Genes under positive selection in Escherichia coli. Genome Res. 17, 1336–1343. Pond, S.L., Frost, S.D., Muse, S.V., 2005. HyPhy: hypothesis testing using phylogenies. Bioinformatics 21, 676–679. Salyers, A., 1984. Bacteroides of the human lower intestinal tract. Annu. Rev. Microbiol. 38, 293–313. Schauer, K., Rodionov, D.A., de Reuse, H., 2008. New substrates for TonB-dependent transport: do we only see the ‘tip of the iceberg'? Trends Biochem. Sci. 33, 330–338. Sears, C.L., Buckwold, S.L., Shin, J.W., Franco, A.A., 2006. The C-terminal region of Bacteroides fragilis toxin is essential to its biological activity. Infect. Immun. 74, 5595–5601. Smith, E.E., Sims, E.H., Spencer, D.H., Kaul, R., Olson, M.V., 2005. Evidence for diversifying selection at the pyoverdine locus of Pseudomonas aeruginosa. J. Bacteriol. 187, 2138–2147. Snydman, D.R., Jacobus, N.V., McDermott, L.A., Ruthazer, R., Golan, Y., Goldstein, E.J., Finegold, S.M., Harrell, L.J., Hecht, D.W., Jenkins, S.G., Pierson, C., Venezia, R., Yu, V., Rihs, J., Gorbach, S.L., 2007. National survey on the susceptibility of Bacteroides fragilis group: report and analysis of trends in the United States from 1997 to 2004. Antimicrob. Agents Ch. 51, 1649–1655. Soyer, Y., Orsi, R.H., Rodriguez-Rivera, L.D., Sun, Q., Wiedmann, M., 2009. Genome wide evolutionary analyses reveal serotype specific patterns of positive selection in selected Salmonella serotypes. BMC Evol. Biol. 9, 264. Stanhope, M.J., Lefebure, T., Walsh, S.L., Becker, J.A., Lang, P., Pavinski Bitar, P.D., Miller, L.A., Italia, M.J., Amrine-Madsen, H., 2008. Positive selection in penicillinbinding proteins 1a, 2b, and 2x from Streptococcus pneumoniae and its correlation with amoxicillin resistance development. Infect. Genet. Evol. 8, 331–339. Storey, J.D., Tibshirani, R., 2003. Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. U. S. A. 100, 9440–9445. Sund, C.J., Rocha, E.R., Tzianabos, A.O., Wells, W.G., Gee, J.M., Reott, M.A., O'Rourke, D. P., Smith, C.J., 2008. The Bacteroides fragilis transcriptome response to oxygen and H2O2: the role of OxyR and its effect on survival and virulence. Mol. Microbiol. 67, 129–142. Suyama, M., Torrents, D., Bork, P., 2006. PAL2NAL: robust conversion of protein sequence alignments into the corresponding codon alignments. Nucleic Acids Res. 34, W609–612. Suzuki, H., Stanhope, M.J., 2012. Functional bias of positively selected genes in Streptococcus genomes. Infect. Genet. Evol. 12, 274–277. Toft, C., Andersson, S.G., 2010. Evolutionary microbial genomics: insights into bacterial host adaptation. Nat. Rev. Genet. 11, 465–475.
50
S. Yoshizaki et al. / Computational Biology and Chemistry 52 (2014) 43–50
Urwin, R., Holmes, E.C., Fox, A.J., Derrick, J.P., Maiden, M.C., 2002. Phylogenetic evidence for frequent positive selection and recombination in the meningococcal surface antigen PorB. Mol. Biol. Evol. 19, 1686–1694. Xu, Z., Chen, H., Zhou, R., 2011. Genome-wide evidence for positive selection and recombination in Actinobacillus pleuropneumoniae. BMC Evol. Biol 11, 203. Yang, Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol 24, 1586–1591.
Yang, Z., Wong, W.S., Nielsen, R., 2005. Bayes empirical bayes inference of amino acid sites under positive selection. Mol. Biol. Evol. 22, 1107–1118. Zhang, J., Nielsen, R., Yang, Z., 2005. Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol. Biol. Evol. 22, 2472–2479. Zhang, Y., Zhang, H., Zhou, T., Zhong, Y., Jin, Q., 2011. Genes under positive selection in Mycobacterium tuberculosis. Comput. Biol. Chem. 35, 319–322.