Towards a genome based taxonomy of Mycoplasmas

Towards a genome based taxonomy of Mycoplasmas

Infection, Genetics and Evolution 11 (2011) 1798–1804 Contents lists available at SciVerse ScienceDirect Infection, Genetics and Evolution journal h...

2MB Sizes 3 Downloads 151 Views

Infection, Genetics and Evolution 11 (2011) 1798–1804

Contents lists available at SciVerse ScienceDirect

Infection, Genetics and Evolution journal homepage: www.elsevier.com/locate/meegid

Towards a genome based taxonomy of Mycoplasmas Cristiane C. Thompson a,⇑, Nayra M. Vieira b, Ana Carolina P. Vicente a, Fabiano L. Thompson b a b

Laboratory of Molecular Genetics of Microorganisms, Oswaldo Cruz Institute, FIOCRUZ, Rio de Janeiro, Brazil Department of Marine Biology, Institute of Biology, Federal University of Rio de Janeiro, UFRJ, Brazil

a r t i c l e

i n f o

Article history: Received 25 March 2011 Received in revised form 25 July 2011 Accepted 27 July 2011 Available online 5 August 2011 Keywords: Mycoplasmas Taxonomy Genomes Comparative genomics

a b s t r a c t Mycoplasmas are Gram-positive wall-less bacteria with a wide environmental and host distribution, causing disease in man and in (wild and farmed) animals. The aim of this study was to analyze the usefulness of a genomic taxonomic approach in Mycoplasma systematics. Multilocus Sequence Analysis (MLSA), average amino acid identity (AAI), and Karlin genomic signature allowed a clear identification of species. For instance, Mycoplasma pneumoniae and Mycoplasma genitalium had only 71% MLSA similarity, 67% AAI, and 88 for Karlin genomic signature. Codon usage (Nc) of the phylogenetically distantly related species Mycoplasma conjunctivae and Mycoplasma gallisepticum was identical, in spite of clear differences in MLSA, AAI, and Karlin, suggesting that these two species were subjected to convergent adaptation due to similar environmental conditions. We suggest that a Mycoplasma species may be better defined based on genomic features. In our hands, a Mycoplasma species is defined as a group of strains that share P97% DNA identity in MLSA, P93.9% AAI and 68 in Karlin genomic signature. This new definition may be useful to advance the taxonomy of Mycoplasmas. Ó 2011 Elsevier B.V. All rights reserved.

1. Introduction Mycoplasmas are one of the smallest and simplest prokaryotes, having only the minimal cellular machinery required for self-replication and survival. They appear to have evolved from Gram-positive bacteria by a process of degenerative evolution towards genome reduction and the loss of a cell wall (Woese et al., 1980; Neimark, 1986). Mycoplasmas are widespread in nature as parasites of humans, mammals, reptiles, fish, arthropods, and plants. They may be symbionts of isopods (Eberl, 2010), songbirds (States et al., 2009), wild and reared fish (Holben et al., 2002), and deep sea Lophelia corals of Gulf of Mexico and Norwegian Fjords (Kellogg et al., 2009). In spite of their reduced genomes, Mycoplasmas have highly versatile transcriptomes (with the production of different transcripts from a single gene and antisense ncRNAs) (Güell et al., 2009) and proteomes (with multifunctional enzymes, posttranslational modifications, and signaling molecules) (Kühner et al., 2009). The metabolic network appears to be more linear in Mycoplasmas than in other more complex bacteria. The multifunctionality of enzymes in Mycoplasmas may be a consequence of gene loss and genome reduction. Sequence comparisons among complete Mycoplasma genomes indicated that the 580-Kb genome of Mycoplasma genitalium arose by minimization of the 816-Kb genome of the human respiratory pathogen Mycoplasma pneumoniae (Stein and Baseman, 2006).

⇑ Corresponding author. Tel./fax: +55 21 38658168. E-mail address: thompson@ioc.fiocruz.br (C.C. Thompson). 1567-1348/$ - see front matter Ó 2011 Elsevier B.V. All rights reserved. doi:10.1016/j.meegid.2011.07.020

Many Mycoplasma species are pathogenic for humans, animals, plants, and insects (Maniloff, 2002). In addition, Mycoplasmas have been a problem as intracellular contaminants in human cell therapy, and in the animal (poultry and swine farming) production as pathogens. Thus, rapid diagnostics and identification of Mycoplasmas is crucial for various activities. Due to the ecological importance of Mycoplasmas, several studies have been performed recently on the diversity of these microbes (Spergser et al., 2010). Online tools for the identification of Mycoplasma-related bacteria have also been developed (Zhao et al., 2009). Identification of Mycoplasmas is based on the current taxonomic standards (Stackebrandt et al., 2002). The species definition and identification for Mycoplasma is based on the general species definition using a combination of 16S rRNA gene sequence analyses, DNA–DNA Hybridization (DDH), serology and phenotypic data. Distinguishing Mycoplasmas by their metabolic characteristics has been generally of limited phylogenetic and taxonomic value. A few exceptions include the ability to ferment glucose, to hydrolyze arginine and urea, and the dependence on cholesterol for growth and anaerobiosis. Cellular localization of reduced nicotinamide adenine dinucleotide (NADH) oxidase activity has also been of help in classification of Mycoplasmas at and above the genus level (Razin, 2006). Since the 1970s, serology has been established as the most important tool for defining Mycoplasma species (ICSB Subcommittee on the Taxonomy of Mycoplasmatales, 1972; ICSB Subcommittee on the Taxonomy of Mollicutes, 1979, 1995) (Brown et al., 2007). Serological characterization is one of the most widespread used approaches for the identification. These methods are in agreement with DNA–DNA hybridization data and with 16S rRNA

C.C. Thompson et al. / Infection, Genetics and Evolution 11 (2011) 1798–1804

sequence data. Currently, there are 123 species described in the genus Mycoplasma (http://www.bacterio.cict.fr/index.html). Differentiation of closely related species using 16S rRNA gene sequence is very difficult because Mycoplasma species may have up to 100% 16S rRNA gene similarity. For instance, the pairs M. pneumoniae and M. genitalium, Mycoplasma mycoides and Mycoplasma capricolum have 98% and 99.8% 16S rRNA sequence similarity, respectively. Serology is also very cumbersome, requiring special reagents and the expertise of few international laboratories. The high genomic and phenotypic diversity of Mycoplasmas may also result in cross-reaction or unidentification based on serology. The advent of rapid genome sequencing and automated genome analyses have opened up a new perspective for the taxonomy of Mycoplasmas. Genome sequences may allow rapid identification and evolutionary inferences. Several recent studies have used whole-genome analysis to determine the taxonomic relationships among bacterial species (Coenye and Vandamme, 2003, 2004; Richter and Rosselló-Móra, 2009; Thompson et al., 2009, 2011). In order to test the feasibility of the genomic taxonomy in Mycoplasmas, several genomic markers were analyzed in a collection of 46 genomes. The availability of whole genome sequences of several closely related species, such as M. pneumoniae and M. genitalium formed an ideal test case for the establishment of the genomic taxonomy of Mycoplasmas. Disclosing species-specific patterns for the different genome-wide markers would reinforce their usefulness in Mycoplasma taxonomy. The aim of this study was to analyze the usefulness of the genomic taxonomic approach in Mycoplasmas, using information of Multilocus Sequence Analysis (MLSA), average amino acid identity (AAI), Karlin genomic signature, codon usage and Genome-to-Genome distances (GGD) for species identification.

1799

comparisons, all protein-coding sequences (CDSs) from one genome were searched against the genomic sequence of the other genome. The genetic relatedness between a pair of genomes was measured by the average amino acid identity of all conserved genes between the two genomes as computed by the BLAST algorithm. 2.4. Determination of dinucleotide relative abundance values We determined the dinucleotide relative abundance value for each genome. Mononucleotide and dinucleotide frequencies were calculated using COMPSEQ (EMBOSS). Dinucleotide relative abundances (q⁄XY) were calculated using the equation q⁄XY = fXY/fXfY where fXY denotes the frequency of dinucleotide XY, and fX and fY denote the frequencies of X and Y, respectively (Karlin et al., 1997; Karlin, 1998). The difference in genome signature between two sequences is expressed by the genomic dissimilarity (d⁄), which is the average absolute dinucleotide of relative abundance difference between two sequences. The dissimilarities in relative abundance of dinucleotides between both sequences were calculated using the equation described by Karlin et al. (1997): d⁄(f,g) = 1/16R|q⁄XY (f) q⁄XY (g)| (multiplied by 1000 for convenience), where the sum extends over all dinucleotides. 2.5. Codon usage bias Codon usage bias was calculated for each genome. The effective number of codons used in a sequence (Nc) (Wright, 1990) was calculated using CHIPS (EMBOSS). Nc values range from 20 (in an extremely biased genome where only one codon is used per amino acid) to 61 (all synonymous codons are used with equal probability) (Wright, 1990).

2. Material and methods 2.6. Genome-to-Genome Distance Calculator (GGDC) 2.1. Genome sequence data We analyzed 46 complete genome sequences of Mycoplasmas that were public available for download by June 2nd, 2011 in the National Center for Biotechnology Information (NCBI) under the project accession number indicated in Table 1. The following analyzes were performed according to Thompson et al. (2009) and are briefly described below. 2.2. 16S rRNA gene sequences and Multilocus Sequence Analysis (MLSA) The 16S rRNA gene sequences and the gene sequences used for MLSA were obtained from GenBank (NCBI). MLSA approach was based on the concatenated sequences of four housekeeping genes (adk, efp, gmk and gyrB) that were used in other taxonomic studies. The concatenated sequences were aligned by CLUSTALX and MAFFT alignment methods. Phylogenetic analyses were conducted using MEGA version 5. The phylogenetic inference was based on the neighbor-joining genetic distance method (NJ), and the maximum likelihood method (ML). Distance estimations were obtained by the model of Kimura two-parameters. The reliability of each tree topology was checked by 2000 bootstrap replications (Felsenstein, 1985).

The genome distance was calculated using Genome-To-Genome Distance Calculator (GGDC) (Auch et al., 2010). Distances between a pair of genomes were determined by whole-genome pairwise sequence comparisons using BLAST (Altschul et al., 1990). For these comparisons, algorithms were used to determine high-scoring segment pairs (HSPs) for inferring intergenomic distances for species delimitation. The corresponding distance threshold can be used for species delimitation. Any distance value above the threshold can be regarded as indication that the two genomes analyzed represent two distinct species. Distances are calculated by (i) comparing two genomes using the chosen program to obtain HSPs/MUMs and (ii) inferring distances from the set of HSPs/MUMs using three distinct formulas. Next, the distances are transformed to values analogous to DNA–DNA Hybridizations (DDH). These DDH estimates are based on an empirical reference dataset comprising real DDH values and genome sequences. The regression-based DDH estimate uses parameters from a robust-line fit, whereas the threshold-based DDH estimate applies the distance threshold leading to the lowest error ratio in predicting whether DDH is P70% or <70%. 3. Results 3.1. General features

2.3. Average amino acid identity (AAI) The AAI was calculated as described previously (Konstantinidis and Tiedje, 2005). Conserved genes between a pair of genomes were determined by whole-genome pairwise sequence comparisons using the BLAST algorithm (Altschul et al., 1997). For these

The complete genomes of the Mycoplasmas comprised a single chromosome (Table 1). The estimated size of the Mycoplasma genomes ranged from 580 Kb (M. genitalium) to 1,360 Kb (M. penetrans). Mycoplasma genomes present low GC content. The average GC content of Mycoplasma genomes ranged from 21% to 40%. The

1800

C.C. Thompson et al. / Infection, Genetics and Evolution 11 (2011) 1798–1804

Table 1 Genomic features of Mycoplasma genomes. ND (not determined). Organism

Accession no.

Genome size (nt)

GC (mol%)

No. of CDS

% Coding region

Acholeplasma laidlawii PG8A Candidatus Phytoplasma australiense Candidatus Phytoplasma mali Mesoplasma florum L1 Mycoplasma agalactiae Mycoplasma agalactiae PG2 Mycoplasma alligatoris A21JP2 Mycoplasma arthritidis 158L31 Mycoplasma bovis PG45 Mycoplasma capricolum subsp. capricolum ATCC 27343 Mycoplasma capricolum subsp. capripneumoniae M1601 Mycoplasma conjunctivae HRC/581 Mycoplasma crocodyli MP145 Mycoplasma fermentans JER Mycoplasma fermentans M64 Mycoplasma fermentans PG18 Mycoplasma gallisepticum F Mycoplasma gallisepticum R (high) Mycoplasma gallisepticum R (low) Mycoplasma genitalium G37 Mycoplasma haemofelis Ohio2 Mycoplasma haemofelis Langford1 Mycoplasma hominis Mycoplasma hyopneumoniae J Mycoplasma hyopneumoniae 168 Mycoplasma hyopneumoniae 232 Mycoplasma hyopneumoniae 7448 Mycoplasma hyorhinis HUB-1 Mycoplasma hyorhinis MCLD Mycoplasma leachii PG50 Mycoplasma mobile 163K Mycoplasma mycoides subsp. capri LC95010 Mycoplasma mycoides subsp. capri GM12 Mycoplasma mycoides subsp. mycoides SC PG1 Mycoplasma mycoides subsp. mycoides SC Gladyslade Mycoplasma penetrans HF2 Mycoplasma pneumoniae FH Mycoplasma pneumoniae M129 Mycoplasma pulmonis UAB CTIP Mycoplasma suis Illinois Mycoplasma synoviae 53 Ureaplasma parvum ATCC 27815 Ureaplasma parvum ATCC 700970 Ureaplasma urealyticum ATCC27618 Ureaplasma urealyticum ATCC33695 Ureaplasma urealyticum ATCC33699

CP000896 AM422018 CU469464 AE017263 FP671138 CU179680.1 ADNC00000000 CP001047.1 CP002188 CP000123. CM001150 FM864216.2 CP001991 CP001995 CP002458 AP009608 CP001873 CP001872 AE015450.2 L43967.2 CP002808 FR773153 FP236530 AE017243.1 CP002274 AE017332.1 AE017244.1 CP002170.1 CP002669 CP002108 AE017308.1 FQ377874 CP001621 BX293980.2 CP002107 BA000026.2 CP002077 U00089.2 AL445566.1 CP002525 AE017245.1 CP000942 AF222894 AAYN00000000 AAZS00000000 CP001184

1,496,992 879,959 601,943 793,224 1,006,702 877,438 973,030 820,453 1,003,404 1,010,023 1,018,102 846,214 934,379 977,524 1,118,751 1,004,014 977,612 1,012,027 1,012,800 580,076 1,155,937 1,147,259 665,445 897,405 925,576 892,758 920,079 839,615 829,709 1,008,951 777,079 1,153,998 1,089,202 1,211,703 1,193,808 1,358,633 811,088 816,394 963,879 742,431 799,476 751,679 751,719 881,828 886,725 874,478

31 27 21 27 29 29 26 30 29 23 ND 28 26 26 26 ND 31 31 31 31 ND 38 27 28 ND 28 28 25 ND 23 24 23 24 23 ND 25 ND 40 26 31 28 25 25 25 26 25

1380 684 536 682 813 742 803 631 765 812 ND 692 689 797 1050 ND ND ND 763 475 ND 1545 523 657 ND 691 657 654 ND 882 633 922 ND 1017 ND 1037 ND 689 782 844 659 609 614 673 612 646

90 64 76 92 87 87 89 88 82 88 ND 90 81 85 88 ND ND ND 87 90 ND 94 88 87 ND 89 85 84 ND 88 90 90 ND 81 ND 88 ND 87 89 88 88 91 91 90 84 89

number of CDS and the percentage of coding regions varied from 475 to 1545 and from 64% to 92%, respectively (Table 1). Genome sizes were variable not only between genera and within the same genus but also among strains of the same species. Mycoplasma species had high intergenus, intragenus and intraspecies genome size and GC content variability, indicating clearly the heterogeneity in this bacterial group. One of the reasons for this variability could be associated with the frequent occurrence of repetitive elements, consisting of segments of protein coding genes, differing in size and number, or insertion sequence (IS) elements (Razin and Hayflick, 2010). 3.2. Phylogenetic reconstructions by 16S rRNA and MLSA We selected both conserved and variable single copy genes belonging to different functional groups and that have been used in other taxonomic studies. Phylogenetic trees based on 16S rRNA gene sequences (Fig. 1; Supplementary Fig. 1) and MLSA approach (Fig. 2; Supplementary Fig. 2) were constructed using the ML and NJ (Supplementary Figs. 1 and 2) methods. The trees based on 16S rRNA gene sequences and MLSA showed similar topology in the two methods. Bootstrap analysis indicated that, most branches

were highly significant. The phylogenetic reconstruction indicated a clear separation of the main clusters (bovis, synoviae, hominis, hyopneumoniae, mycoides, haemofelis and pneumoniae) (Figs. 1 and 2). The pairs of species M. agalactiae and M. bovis, M. hominis and M. arthritidis, M. mycoides and M. capricolum, M. mycoides and M. leachii, M. capricolum and M. leachii, and M. pneumoniae and M. genitalium had 99.8%, 96%, 99.4%, 99.8%, 99.9%, and 98% 16S rRNA sequence similarity, respectively (Fig. 1). The respective MLSA values for these pais were 84%, 74%, 96%, 95.6%, 99% and 71% (Fig. 2).The MLSA intraspecies similarity was at least 97%. According to the phylogenetic reconstructions, the genus Mesoplasma appeared to be nested within Mycoplasma, putting in question its taxonomic validity. The genera Acholeplasma and candidate Phytoplasma appeared at the outskirts of the phylogenetic tree, whereas the genus Ureaplasma formed a separated branch distinct from the genus Mycoplasma. The genus Mycoplasma appeared to be paraphyletic. 3.3. Average amino acid identity (AAI) The AAI among Mycoplasma species varied considerably (between 47% and 90%, Supplementary table) and the intraspecies

C.C. Thompson et al. / Infection, Genetics and Evolution 11 (2011) 1798–1804

1801

Fig. 1. Phylogenetic tree of the 16S rRNA gene based on the maximum likelihood method. Distance estimation was obtained by the model of Kimura 2-Parameter. Bootstrap percentages after 2,000 replications are shown. Scale bar, estimated sequence divergence.

1802

C.C. Thompson et al. / Infection, Genetics and Evolution 11 (2011) 1798–1804

Fig. 2. Phylogenetic tree based on concatenated sequences of the adk, efp, gmk and gyrB genes using maximum likelihood method. Distance estimation was obtained by the model of Kimura 2-Parameter. Bootstrap percentages after 2,000 replications are shown. Scale bar, estimated sequence divergence.

similarity was at least 93.9% (Fig. 3). The AAI for the pairs M. agalactiae and M. bovis, M. alligatoris and M. crocodyli, M. hominis and M. arthritidis, M. mycoides and M. leachii, M. mycoides and M. capricolum, M. leachii and M. capricolum and M. pneumoniae and M. genitalium was at maximum 86%, 72%, 64%, 89%, 85%, 90%, and

67%, respectively. M. mycoides subsp. mycoides and M. mycoides subsp. capri showed at maximum 92% of AAI. The AAI within the genera Ureaplasma and Candidatus Phytoplasma were 89–90% and 57%, respectively. There was a clear separation of related species and sub-species by means of AAI comparisons (Fig. 3).

1803

C.C. Thompson et al. / Infection, Genetics and Evolution 11 (2011) 1798–1804 100

180 93.9

170

90

160 150

80 140 130 70

110

AAI (%)

Karlin

100

50

90 80

40

70 60

Karlin genome dissimilarity [δ*(f,g)]

120 AAI 60

30 50 40 20

30 20

10

10 7.6 0

0

Fig. 3. Taxonomic resolution of AAI (left Y axis) and Karlin genome dissimilarity [d⁄(f,g)] (right Y axis) of Mycoplasmas. Solid lines, AAI. Dashed lines, Karlin. The AAI value 93.9% and the Karlin value 7.6 indicate the limit for intraspecies pairwise comparisons. A (Acholeplasma), CP (Candidatus Phytoplasma), M (Mycoplasma), Meso (Mesoplasma), U (Ureaplasma).

3.4. Karlin genomic dissimilarity The genomic dissimilarity values of Mycoplasmas varied between 0.2 and 267. The intraspecies variation was at maximum 8 (Fig. 3). The pair of species M. agalactiae and M. bovis, M. alligatoris and M. crocodyli, M. hominis and M. arthritidis, M. mycoides and M. leachii, M. mycoides and M. capricolum, and M. pneumoniae and M. genitalium had at least 27, 55, 84, 14, 14, 7 and 88 of genomic dissimilarity, respectively (Fig. 3). M. leachii and M. capricolum had a low genomic signature value (around 7). M. mycoides subsp. mycoides and M. mycoides subsp. capri had at least 10 of genomic dissimilarity. The genomic signature within the genera Ureaplasma and Candidatus Phytoplasma were 29 and 62, respectively. 3.5. Codon usage bias (Nc) Codon usage bias varied considerably among Mycoplasmas. The Nc values within Mycoplasma ranged from 31 to 51, and the intraspecies variation in M. fermentans, M. gallisepticum, M. haemofelis, M. hypneumoniae, M. mycoides was 36, 41, 47, 44 and 31, respectively. The Nc values of the pairs M. pneumoniae and M. genitalium were 41 and 51, respectively. M. hominis and M. arthritidis were 37 and 43, while M. mycoides subsp. mycoides and M. capricolum subsp. capricolum had Nc of 31. Some phylogenetically distantly related species e.g. M. conjunctivae and M. gallisepticum had identical Nc value (40). The Nc values of Acholeplasma, Candidatus Phytoplasma, Mesoplasma and Ureaplasma species was 39, 33–40, 33 and 34. There was not a clear differentiation of closely related and distantly related Mycoplasma species using the Nc. 3.6. Genome distance analysis The GGD was calculated only for closely related species that cannot be differentiated by 16S rRNA (Fig. 1). Based on GGD analysis the pairs M. agalactiae and M. bovis, M. alligatoris and M.

crocodyli, M. hominis and M. arthritidis, M. mycoides and M. leachii, M. mycoides and M. capricolum, M. leachii and M. capricolum and M. pneumoniae and M. genitalium showed GGD values lower than 70% DDH similarity, analogous to DDH values obtained by classical hybridization methodologies. The GGD values for these pair of species were lower than the threshold values for species delimitation (Supplementary material). M. mycoides subsp. mycoides and M. mycoides subsp. capri also had GGD and DDH values lower than 70%. 4. Discussion The genomic taxonomic approach (MLSA, AAI, Karlin genomic signature and GGD) differentiated the sister species M. agalactiae and M. bovis, M. alligatoris and M. crocodyli, M. hominis and M. arthritidis, M. mycoides and M. capricolum, M. mycoides and M. leachii, and M. pneumoniae and M. genitalium. M. mycoides, M. capricolum and M. leachii belong to the M. mycoides cluster that is pathogenic for ruminants (Cottew et al., 1987). Because the different members of the M. mycoides cluster share many genotypic (e.g. identical 16S rRNA sequences and high DDH similarity values) and phenotypic traits, their classification and evolutionary relationships have been difficult to establish using the classic polyphasic approach. Although M. mycoides, M. capricolum and M. leachii are considered sister species, and M. mycoides subsp. mycoides and M. mycoides subsp. capri are considered subspecies of M. mycoides species, they were clearly distinguishable by our genomic taxonomic approach. Two groups of Mycoplasma were disclosed in this study. One group (e.g. M. pneumoniae and M. genitalium or the M. mycoides group) having different Nc, genomic signature and AAI, but were highly related based on 16S rRNA sequences. This group comprises Mycoplasmas involved in different types of disease in man and ruminants. In this case, the new genomic taxonomic approach provides high resolution for species identification. The second group (M. conjunctivae and M. gallisepticum) comprises Mycoplasmas that

1804

C.C. Thompson et al. / Infection, Genetics and Evolution 11 (2011) 1798–1804

are distantly related by 16S rRNA, MLSA, AAI and Karlin genomic signature, but have identical Nc. The identical Nc of M. conjunctivae and M. gallisepticum, two distantly related species, may suggest convergent adaptation to similar environmental conditions. Niche adaptation may have shaped the genomes of these Mycoplasmas. Selective constraints can drive convergent evolution, producing bacterial species that are functionally similar despite their different evolutionary histories (Philippot et al., 2010). A bacterial species is defined as a group of strains (including the type strain), having >70% DDH similarity, <5 °C DTm, <5% mol G + C difference of total genomic DNA, >97% 16S rRNA identity (Stackebrandt and Goebel, 1994). We suggest that Mycoplasma species may be identified by whole-genome sequence analysis. In our hands, a Mycoplasma species may be better defined based on genomic features. A Mycoplasma species may be defined as a group of strains that share P97% DNA identity in MLSA, P93.9% AAI and 68 in Karlin genomic signature. This definition may advance the field of Mycoplasma taxonomy, extending on previous taxonomic work (Manso-Silván et al., 2007; Manso-Silvan et al., 2011). This approach will allow researchers and end-users of taxonomy to uncover and characterize large collections of Mycoplasma isolates using genomic features and automatic algorithms for genome pattern recognition, with the possibility of extending it to meet genomic surveys and rapid laboratory diagnostics. 5. Authors’ contribution CCT carried out the computational analyses, phylogenetic and statistical analyses, and wrote the manuscript. ACPV participated in the discussion and in the draft of the manuscript. FLT participated in the discussion of the paper and wrote the manuscript. All the authors have read and approved the final manuscript. Acknowledgements CCT acknowledges a CAPES post-doc fellowship. CCT and ACPV acknowledge the financial support of Oswaldo Cruz Institute (IOCFIOCRUZ). FLT acknowledges grants from CNPq, FAPERJ, IFS, and FUJB. We thank the fruitful discussions with Prof. Tom Coenye and the remarks of two anonymous referees. Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.meegid.2011.07.020. References Altschul, S.F., Madden, T.L., Schaffer, A.A., Zhang, J., Zhang, Z., Miller, W., Lipman, D.J., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. Auch, A.F., von Jan, M., Klenk, H.P., Göker, M., 2010. Digital DNA–DNA hybridization for microbial species delineation by means of genome-to-genome sequence comparison. Stand. Genomic Sci. 2, 117–134. Brown, D.R., Whitcomb, R.F., Bradbury, J.M., 2007. Revised minimal standards for description of new species of the class Mollicutes (division Tenericutes). Int. J. Syst. Evol. Microbial. 57, 2703–2719. Coenye, T., Vandamme, P., 2003. Extracting phylogenetic information from wholegenome sequencing projects: the lactic acid bacteria as a test case. Microbiology 149, 3507–3517. Coenye, T., Vandamme, P., 2004. A genomic perspective on the relationship between the Aquificales and the epsilon-Proteobacteria. Syst. Appl. Microbial. 27, 313– 322. Cottew, G.S., Breard, A., DaMassa, A.J., Ernø, H., Leach, R.H., Lefevre, P.C., Rodwell, A.W., Smith, G.R., 1987. Taxonomy of the Mycoplasma mycoides cluster. Isr. J. Med. Sci. 23, 632–635.

Eberl, R., 2010. Sea-land transitions in isopods: pattern of symbiont distribution in two species of intertidal isopods Ligia pallasii and Ligia occidentalis in the Eastern Pacific. Symbiosis 51, 107–116. Felsenstein, J., 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791. Güell, M., van Noort, V., Yus, E., Chen, W.H., Leigh-Bell, J., Michalodimitrakis, K., Yamada, T., Arumugam, M., Doerks, T., Kühner, S., Rode, M., Suyama, M., Schmidt, S., Gavin, A.C., Bork, P., Serrano, L., 2009. Transcriptome complexity in a genome-reduced bacterium. Science 326, 1268–1271. Holben, W.E., Williams, P., Gilbert, M.A., Saarinen, M., Särkilahti, L.K., Apajalahti, J.H., 2002. Phylogenetic analysis of intestinal microflora indicates a novel Mycoplasma phylotype in farmed and wild salmon. Microb. Ecol. 44, 175–185. Karlin, S., 1998. Global dinucleotide signatures and analysis of genomic heterogeneity. Curr. Opin. Microbial. 1, 598–610. Karlin, S., Mrazek, J., Campbell, A.M., 1997. Compositional biases of bacterial genomes and evolutionary implications. J. Bacteriol. 179, 3899–3913. Kellogg, C.A., Lisle, J.T., Galkiewicz, J.P., 2009. Culture-independent characterization of bacterial communities associated with the cold-water coral Lophelia pertusa in the northeastern Gulf of Mexico. Appl. Environ. Microbial. 75, 2294–2303. Konstantinidis, K.T., Tiedje, J.M., 2005. Towards a genome-based taxonomy for prokaryotes. J. Bacteriol. 187, 6258–6264. Kühner, S., van Noort, V., Betts, M.J., Leo-Macias, A., Batisse, C., Rode, M., Yamada, T., Maier, T., Bader, S., Beltran-Alvarez, P., Castaño-Diez, D., Chen, W.H., Devos, D., Güell, M., Norambuena, T., Racke, I., Rybin, V., Schmidt, A., Yus, E., Aebersold, R., Herrmann, R., Böttcher, B., Frangakis, A.S., Russell, R.B., Serrano, L., Bork, P., Gavin, A.C., 2009. Proteome organization in a genome-reduced bacterium. Science 326, 1235–1240. Maniloff, J., 2002. Phylogeny and evolution. In: Razin, S., Herrmann, R. (Eds.), Molecular Biology and Pathogenicity of Mycoplasmas. Kluver Academic/Plenum Publishers, New York, pp. 31–43. Manso-Silván, L., Perrier, X., Thiaucourt, F., 2007. Phylogeny of the Mycoplasma mycoides cluster based on analysis of five conserved protein-coding sequences and possible implications for the taxonomy of the group. Int. J. Syst. Evol. Microbiol. 57, 2247–2258. Manso-Silvan, L., Dupuy, V., Chu, Y., Thiaucourt, F., 2011. Multi-locus sequence analysis of Mycoplasma capricolum subsp. capripneumoniae for the molecular epidemiology of contagious caprine pleuropneumonia. Vet. Res. 42, 86. Neimark, H.C., 1986. Origin and evolution of wall-less prokaryotes. In: Madoff, S. (Ed.), The bacterial L-Forms. Marcel Dekkar Inc., New York, pp. 21–42. Philippot, L., Andersson, S.G., Battin, T.J., Prosser, J.I., Schimel, J.P., Whitman, W.B., Hallin, S., 2010. The ecological coherence of high bacterial taxonomic ranks. Nat. Rev. Microbial. 8, 523–529. Razin, S., 2006. The Genus Mycoplasma and Related Genera (Class Mollicutes). In: Prokaryotes 4, 836–904. Razin, S., Hayflick, L., 2010. Highlights of mycoplasma research – an historical perspective. Biologicals 38, 183–190. Richter, M., Rosselló-Móra, R., 2009. Shifting the genomic gold standard for the prokaryotic species definition. Proc. Natl. Acad. Sci. U.S.A. 106, 19126–19131. Spergser, J., Langer, S., Muck, S., Macher, K., Szostak, M., Rosengarten, R., Busse, H.J., 2010. Mycoplasma mucosicanis sp. nov. isolated from the mucosa of dogs. Int. J. Syst. Evol. Microbial. 61, 716–721. Stackebrandt, E., Goebel, B.M., 1994. A place for DNA–DNA reassociation and 16S ribosomal-RNA sequence analysis in the present species definition in bacteriology. Int. J. Syst. Bacteriol. 49, 846–849. Stackebrandt, E., Frederiksen, W., Garrity, G.M., Grimont, P.A., Kämpfer, P., Maiden, M.C., Nesme, X., Rosselló-Mora, R., Swings, J., Trüper, H.G., Vauterin, L., Ward, A.C., Whitman, W.B., 2002. Report of the ad hoc committee for the re-evaluation of the species definition in bacteriology. Int. J. Syst. Evol. Microbial. 52, 1043– 1047. States, S.L., Hochachka, W.M., Dhondt, A.A., 2009. Spatial variation in an avian host community: implications for disease dynamics. Eco. Health 6, 540–545. Stein, M.A., Baseman, J.B., 2006. The Evolving Saga of Mycoplasma genitalium. Clin. Microbial. Newslett. 28, 18–41. Thompson, C.C., Vicente, A.C., Souza, R.C., Vasconcelos, A.T., Vesth, T., Alves Jr., N., Ussery, D.W., Iida, T., Thompson, F.L., 2009. Genomic taxonomy of Vibrios. BMC Evol. Biol. 9, 258. Thompson, F.L., Thompson, C.C., Dias, G.M., Naka, H., Dubay, C., Crosa, J.H., 2011. The genus Listonella MacDonell and Colwell 1986 is a later heterotypic synonym of the genus Vibrio Pacini 1854 (Approved Lists 1980). Int. J. Syst. Evol. Microbial. doi:10.1099/ijs.0.030015-0. Woese, C.R., Maniloff, J., Zablen, L.B., 1980. Phylogenetic analysis of the mycoplasmas. Proc. Natl. Acad. Sci. U.S.A. 77, 494–498. Wright, F., 1990. The ‘effective number of codons’ used in a gene. Gene 87, 23–29. Zhao, Y., Wei, W., Lee, I.M., Shao, J., Suo, X., Davis, R.E., 2009. Construction of an interactive online phytoplasma classification tool, iPhyClassifier, and its application in analysis of the peach X-disease phytoplasma group (16SrIII). Int. J. Syst. Evol. Microbial. 59, 2582–2593.