Noncontiguous finished genome sequence and description of Enterococcus massiliensis sp. nov.

Noncontiguous finished genome sequence and description of Enterococcus massiliensis sp. nov.

Accepted Manuscript Non-contiguous finished genome sequence and description of Enterococcus massiliensis sp. nov. Stéphanie Le Page, Teresa Cimmino, A...

1MB Sizes 0 Downloads 57 Views

Accepted Manuscript Non-contiguous finished genome sequence and description of Enterococcus massiliensis sp. nov. Stéphanie Le Page, Teresa Cimmino, Amadou Togo, Matthieu Million, Caroline Michelle, Saber Khelaifia, Jean-Christophe Lagier, Didier Raoult, Jean-Marc Rolain PII:

S2052-2975(16)30025-7

DOI:

10.1016/j.nmni.2016.04.011

Reference:

NMNI 163

To appear in:

New Microbes and New Infections

Received Date: 4 February 2016 Revised Date:

19 April 2016

Accepted Date: 28 April 2016

Please cite this article as: Le Page S, Cimmino T, Togo A, Million M, Michelle C, Khelaifia S, Lagier JC, Raoult D, Rolain J-M, Non-contiguous finished genome sequence and description of Enterococcus massiliensis sp. nov., New Microbes and New Infections (2016), doi: 10.1016/j.nmni.2016.04.011. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT 1

Taxonogenomics: reporting the genome of a new organism

2

Full-length title: Non-contiguous finished genome sequence and description of Enterococcus

4

massiliensis sp. nov.

5

Short title (for the running head):

6

Author list: Stéphanie Le Page1, Teresa Cimmino1, Amadou Togo1, Matthieu Million1, ,

7

Caroline Michelle1, Saber Khelaifia1, Jean-Christophe Lagier1, Didier Raoult1 and Jean-Marc

8

Rolain*1

SC

RI PT

3

9

Affiliations: 1 Aix-Marseille Univ., URMITE UM 63 CNRS 7278 IRD 198 INSERM U1905,

11

IHU Méditerranée Infection, Facultés de Médecine et de Pharmacie, 27 boulevard Jean

12

Moulin, 13385 Marseille CEDEX 05, France

M AN U

10

13

* Corresponding author:

TE D

14 15

17

EP

16

Text word count: 1540 Words

19

Abstract word count: 90 words

20

Number of references: 19

21

Number of tables: 3

22

Number of figures: 3

23

Key words: Enterococcus massiliensis, genome, taxono-genomics, culturomics

AC C

18

24

1

ACCEPTED MANUSCRIPT

ABSTRACT

25

Enterococcus massiliensis strain sp. nov. (=CSUR P1927 = DSM 100308) is a new species

27

within the genus Enterococcus. This strain was first isolated from a fresh stool sample of a

28

man during culturomics study of intestinal microflora. Enterococcus massiliensis is a Gram-

29

positive cocci, facultative anaerobic and motile. E. massiliensis is negative for mannitol and

30

positive for beta-galactosidase contrary to E. gallinarum. The complete genome sequence is

31

2,712, 841 bp in length with a GC content of 39.6% and contains 2,617 protein-coding genes

32

and 70 RNA genes, including 9 rRNA genes.

SC

RI PT

26

33

M AN U

34

Abbreviations

36

CSUR: Collection de souches de l’Unité des Rickettsies

37

URMITE: Unité des Maladies Infectieuses et Tropicales Emergentes

38

DSM: Deutsche Sammlung von Mikroorganismen

39

MALDI-TOF MS: Matrix-Assisted laser-desorption/ionization time-of-flight mass

40

spectrometry

41

dDDH: digital DNA-DNA hybridization

43 44

EP

AC C

42

TE D

35

2

ACCEPTED MANUSCRIPT INTRODUCTION

46

Enterococcus massiliensis sp. nov. strain AM1T (=CSUR P1927 = DSM 100308) belongs to

47

the genus Enterococcus. This bacterium is a Gram-positive, facultative anaerobic, motile and

48

unpigmented. It was isolated from a fresh stool sample of a human in Marseille as part of

49

“culturomics” study (1). Currently, a polyphasic approach that combines proteomic by

50

MALDI-TOF spectra analysis (2), genomic data with 16S rRNA sequence identity and

51

phylogeny (3), genomic G + C content diversity and DNA-DNA hybridization (DDH) and

52

phenotypic characterization is used to describe new bacterial species (4).

53

Enterococcus are classified in the genus of Streptococcus because of the presence of the D

54

antigen until 1984 (Thiercelin and Jouhaud) (5) . But, after analysis of the genome of

55

Streptococcus faecalis and S. faecium these strains have been transferred to the genus

56

Enterococcus (6). Members of the genus Enterococcus are components of the intestinal flora

57

of humans and animals. There are opportunistic pathogens with 2 principal strains:

58

Enterococcus faecalis and Enterococcus faecium responsible for nosocomial infections (7,8).

59

Here we present a summary classification and a set of features for E. massiliensis sp. nov.

60

strain AM1T together with the description of the complete genome sequence and annotation.

61

These characteristics support the circumscription of the species E. massiliensis.

62

Organism information

63

A stool sample was collected in 2015 from a voluntary patient as a negative control and

64

isolated on Columbia agar supplemented with 5% sheep blood (bioMérieux, Marcy-l’Étoile,

65

France) in aerobic and anaerobic condition using GasPak™ EZ Anaerobe Container System

66

Sachets (BD) at 37°. Enterococcus massiliensis was sequenced as part of a culturomics study

67

aiming to isolate all bacterial species colonizing the human gut (9). Enterococcus massiliensis

68

strain AM1T (GenBank accession number LN833866) exhibited a 97% 16S rRNA nucleotide

69

sequence similary with Enterococcus gallinarum (JF915769) , the phylogenetically-closest

AC C

EP

TE D

M AN U

SC

RI PT

45

3

ACCEPTED MANUSCRIPT validly pubished bacterial species (Fig.1) after compararison with NCBI database. This value

71

is lower than 98.7% 16S rRNA gene sequence similarity set as a threshold recommended by

72

Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA

73

hybridization.

74

Growth occurred between 25°C and 37°C, but optimal growth was observed at 37°C, 24

75

hours after inoculation. Colonies were smooth, withish and approximately 1 mm in diameter

76

on 5% sheep blood-enriched agar (Biomerieux). Growth of the strain was tested under

77

anaerobic and microaerophilic conditions using GasPak EZ Anaerobe pouch (Becton,

78

Dickinson and Company) and CampyGen Compact (Oxoid) systems respectively, and in

79

aerobic conditions, with or without 5% of CO2. Growth was achieved under aerobic (with and

80

without CO2), microaerophilic and anaerobic conditions. Gram staining showed Gram-

81

positive cocci without sporulation (Fig.2A). A motility test was positive and realized with

82

API® M Medium (Biomerieux, Marcy l’Etoile, France) a semi-solid medium with an

83

inoculation performed by swabbing one colony into the medium. After 24 hours of

84

incubation, the growth of E. massiliensis was away from this stabbed line, characterisitic of

85

the positive motility. Cells grown on agar exhibit a mean diameter of 0.5 µm and a mean

86

length ranging from 1.1 to 1.3 µm (mean 1.2 µm) determined by negative staining

87

transmission electron microscopy (Fig.2B).

88

Differential phenotypic characteristics using API 50CH and API Zym system (Biomerieux)

89

between E. massiliensis sp. nov. AM1T and others Enterococcus species (9) are presented in

90

Table 1. Antibiotic Susceptibility testing was performed by the disk diffusion method on

91

Mueller-Hinton agar with blood (Biomerieux, Marcy l’Etoile, France). E. massiliensis is

92

susceptible to vancomycin, teicoplanine, linezolid, gentamicin, ciprofloxacin, doxycyclin,

93

rifampicin and pristinamycin and resistant or intermediate to penicillin G, oxacillin,

AC C

EP

TE D

M AN U

SC

RI PT

70

4

ACCEPTED MANUSCRIPT ceftriaxone, cefoxitin, trimethoprim/sulfamethoxazole, fosfomycin, erythromycin and

95

clindamycin.

96

Extended features descriptions

97

Matrix-assisted laser-desorption/ionization time-of-flight (MALDI-TOF) MS protein analysis

98

was carried out as previously described (2) using a Microflex spectrometer (Bruker Daltonics,

99

Leipzig, Germany). Twelve distinct deposits were done for strain AM1T from 12 isolated

100

colonies. Twelve distinct deposits were done for strain AM1T from 12 isolated colonies.

101

Spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed

102

by standard pattern matching against 7,765 bacterial spectra including 92 spectra from 31

103

Enterococcus species, in the BioTyper database. Interpretation of scores was as follows: a

104

score > 2 enabled the identification at the species level, a score > 1.7 but < 2 enabled the

105

identification at the genus level; and a score < 1.7 did not enable any identification (These

106

scores were established by the manufacturer Bruker Daltonics). For strain AM1T, no

107

significant MALDI-TOF score was obtained against the Bruker database thus suggesting that

108

our isolate was a new species. We incremented our database with the spectrum from strain

109

AM1T (Fig.3).

110

Genome sequencing information

111

Enterococcus massiliensis sp. nov. (GenBank accession number CVRN00000000 ) is the 54

112

species described within Enterococcus genus.

113

After DNA extraction by phenol-chloroform method, genomic DNA of E. massiliensis was

114

sequenced on a MiSeq instrument (Illumina Inc, San Diego, CA, USA) using paired end and

115

mate pair strategies.

116

For genome annotation, Open Reading Frames (ORFs) were predicted using Prodigal (10)

117

with default parameters but the predicted ORFs were excluded if they spanned a sequencing

118

gap region. The predicted bacterial protein sequences were searched for against the GenBank

AC C

EP

TE D

M AN U

SC

RI PT

94

5

ACCEPTED MANUSCRIPT database (11) and the Clusters of Orthologous Groups (COG) databases using BLASTP. The

120

tRNAScanSE tool (12) was used to find tRNA genes, whereas ribosomal RNAs were detected

121

using RNAmmer (13) and BLASTn against the GenBank database.

122

The ARG-ANNOT database for acquired antibiotic resistance genes (ARGs) was employed

123

for a BLAST search using the Bio-Edit interface (14). The assembled sequences were

124

searched against the ARG database under moderately stringent conditions (e-value of 10−5 )

125

for the in silico ARG prediction. E. massiliensis present Lsa gene, encoding a putative ABC

126

protein Lsa with an identity to 72% with Lsa family ABC-F of E. faecalis in NCBI that

127

confirm phenotypically the resistance to clindamycin.

128

Analysis of presence of polyketide synthase (PKS) and Non-Ribosomal Polyketide Synthesis

129

(NRPS) were performed by discriminating the gene with large size using a database realized

130

in our laboratory, predicted proteins were compared against non redundant (nr) GenBank

131

database using blastp and finally examined using antiSMASH (15). The analysis of genome

132

revealed the absence of NRPKs and PKS. Lipoprotein signal peptides and number of

133

transmembrane helices were predicted using SignalP (16) and TMHMM (17), respectively.

134

ORFans were identified if their BLASTP E-value was lower than 10-3 for alignment length

135

greater than 80 amino acids.

136

Moreover, we used Genome-to-Genome Distance calculator (GGDC) web server available at

137

(http://ggdc.dsmz.de) to estimate the overall similarity among the compared genomes and to

138

replace the wet-lab DNA-DNA hydridization (DDH) by a digital DDH (dDDH) (18-19).

139

GGDC 2.0 BLAST+ was chosen as alignment method and the recommended formula 2 was

140

taken into account to interpret the results.

141

We compared the genome of E. massiliensis with 9 other genomes of Enterococcus strains.

142

The genome is 2,712,841 bp long (one chromosome, no plasmid) with a GC content of 39.6%

143

(Table 2). The properties and statistics of the genome are summarized in Table 2. The draft

AC C

EP

TE D

M AN U

SC

RI PT

119

6

ACCEPTED MANUSCRIPT genome of E. massiliensis is smaller than those of E. moraviensis, E. haemoperoxidus, E.

145

caccae, E. casseliflavus, E. gallinarum and E. faecalis ( 3.60, 3.58, 3.56, 3.43, 3.16 and 2.96

146

Mb respectively), but larger than those of E. saccharolyticus, E. columbae, E. cecorum and E.

147

sulfureus ( 2.60, 2.58, 2.34 and 2.31 respectively). The G + C content of E. massiliensis is

148

lower than those of E. casseliflavus and E. gallinarum ( 42.8 and 40.7) but greater than those

149

of E. moraviensis, E. haemoperoxidus, E. caccae, E. saccharolyticus, E. columbae, E.

150

cecorum, E. sulfureus and E. faecalis (39.6, 36.1, 35.7, 35.8, 36.9, 36.6, 36.4, 38.0 and 37.5

151

respectively). Of the 2687 predicted chromosomal genes, 2617 were protein-coding genes and

152

70 were RNAs including 61 tRNAs and 9 rRNAs (5S = 4, 23S = 2, 16S = 3). A total of 1889

153

genes (72.2%) were assigned to a putative function described in Fig.3 and Table 3. 71 genes

154

were identified as ORFans (2.71%) and the remaining genes were annotated as hypothetical

155

proteins. The distribution of genes into COGs functional categories is presented in Table 3.

M AN U

SC

RI PT

144

156

TE D

157 158

CONCLUSION/PERSPECTIVES

160

On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the

161

creation of Enterococcus massiliensis sp. nov. AM1T. This strain was isolated in Marseille,

162

France.

164

AC C

163

EP

159

165

Taxonomic and nomenclatural proposals

166

Description of Enterococcus massiliensis sp. nov.

167

Enterococcus massiliensis (massiliensis because this strain was isolated in Massilia , the Latin

168

name of Marseille, where the strain was sequenced).

7

ACCEPTED MANUSCRIPT Colonies were whitish and approximately 1 mm diameter on 5% sheep blood-enriched agar.

170

Cells are Gram-positive, non haemolytic,facultative anaerobic with a mean length of 1.2 µm

171

and a mean diameter of 0.6 µm. Growth occurred between 25°C to 37°C, but optimal growth

172

was observed at 37°C. Alkaline phosphatase, esterase (C4), esterase lipase (C8), leucine

173

arylamidase, acid phosphatase, β-galactosidase and N-acetyl-β-glucosaminidase activities

174

were present. Esculine activity was also positive but catalase, oxydase, β-galactosidase and N-

175

acetyl-β-glucosaminidase were negative. Positive reaction were obtained for D-ribose, D-

176

glucose, D-fructose, D-mannose and N-acetylglucosamine. E. massiliensis was susceptible to

177

vancomycin, teicoplanine, linezolid, gentamicin, ciprofloxacin, doxycyclin, rifampicin and

178

pristinamycin, but resistant to for trimethoprim/sulfamethoxazole, fosfomycin, erythromycin

179

and clindamycin.

180

The G+C content of the genome is 39.6%. The 16S rRNA and genome sequences are

181

deposited in GenBank under accession numbers LN833866 and CVRN00000000,

182

respectively. The type strain AM1T (=CSUR P1927 = DSM 100308) was isolated from a fresh

183

stool sample of a patient in Marseille, France.

SC

M AN U

TE D

EP AC C

184

RI PT

169

8

ACCEPTED MANUSCRIPT

ACKNOWLEDGMENTS

185 186 187

We are very grateful to Xegen company for automating the genome annotation process.

188

190

RI PT

189

CONFLICT OF INTEREST AND FINANCIAL DISCLOSURE

191

No potential conflict of interest or financial disclosure for all authors.

SC

192 193

M AN U

194 195 196 197

201 202 203 204 205 206 207

EP

200

AC C

199

TE D

198

208 209 210 211 212

9

ACCEPTED MANUSCRIPT

FIGURE LEGEND

213 214

Fig. 1. A consensus phylogenetic tree highlighting the position of Enterococcus

216

massiliensis relative to other type strains within the genus Enterococcus by 16S. In

217

brackets are indicated the GenBank accession numbers. Sequences were aligned using

218

CLUSTALW, and phylogenetic inferences obtained using the maximum-likehood method in

219

the MEGA6 software package. Numbers at the nodes are percentages of bootstrap values

220

from 1000 replicates that support the nodes. Streptococcus pneumoniae and Staphylococcus

221

aureus were used as outgroups. The scale bar represents 1% nucleotide sequence divergence.

222

Fig. 2. 2(A). Gram of Enterococcus massiliensis and 2(B). Transmission electron micrograph

223

of E. massiliensis strain, taken using a Technai G20 Cryo (FEI) at an operating voltage of 200

224

kV with magnification 1000x.

225

Fig. 3. Graphical circular map of chromosome. From outside to center: genes on forward

226

strand (colored by COGs categories explain in Table 3), genes on reverse strand (colored by

227

COGs categories), RNA genes (tRNAs green, rRNAs red), G+C content, G+C skew.

230 231 232 233

SC

M AN U

TE D

EP

229

AC C

228

RI PT

215

234 235 236 237

10

ACCEPTED MANUSCRIPT TABLE LEGENDE

238 239

Table 1. Differential characteristics of Enterococcus massiliensis sp AM1, E. faecalis, E.

241

casseliflavus, E. gallinarum, E. haemoperoxidus, E. cecorum, E. sulfureus and E. caccae.

242

Table 2. Nucleotide content and gene count levels of the genome

243

Table 3. Number of genes associated with the 25 general COG functional categories.

244

SC

245

RI PT

240

246

M AN U

247

AC C

EP

TE D

248

11

ACCEPTED MANUSCRIPT

REFERENCES

249

250

(1)

Lagier JC, Hugon P, Khelaifia S, Fournier PE, La Scola B, and Raoult D. The rebirth of culture in microbiology through the example of culturomics to study human gut

252

microbiota. Clin. Microbiol. Rev 2015 Jan; 28(1): 237–64.

253

(2)

RI PT

251

SengP, Drancourt M, Gouriet F, La Scola B, Fournier PE, Rolain JM, and Raoult D. Ongoing revolution in bacteriology: routine identification of bacteria by matrix-assisted

255

laser desorption ionization time-of-flight mass spectrometry. Clin Infect Dis Aug 2009

256

;49(4): 543–51. (3)

Microbiol Today 2015; 152-155.

258 259

Stackebrandt E EJ. Taxonomic parameters revisited: tarnished gold standards.

(4)

M AN U

257

SC

254

Rosselo-Mora R. DNA-DNA Reassociation Methods Applied to Microbial Taxonomy and Thier Critical Evaluation. In: Stackebrandt E (ed), Molecular identification,

261

Systematics, and population Structure of Prokaryotes. Springer 2015; Berlin. 23-50. (5)

granulations peripheriques et microblastes. Comptes Rendus des Seances la So 1903.

263 264

Thiercelin E and Jouhaud L. Reproduction de l’enterocoque; taches centrales;

(6)

Schleifer KH and Kilpper-Balz R.Transfer of Streptococcus faecalis and Streptococcus

EP

262

TE D

260

faecium to the Genus Enterococcus nom. rev. as Enterococcus faecalis comb. nov. and

268

AC C

265

269

2012 ;16(8): 1039–44.

Enterococcus faecium comb. nov. Int J Syst Bacteriol Jan 1984 ; 34(1): 31–34.

266 267

270 271

(7)

Bereket W, Hemalatha K, Getenet B, Wondwossen T, Solomon A, Zeynudin A, and

Kannan S. Update on bacterial nosocomial infections. Eur. Rev. Med. Pharmacol. Sci

(8)

Miller WR, Munita JM and Arias CA. Mechanisms of antibiotic resistance in enterococci. Expert Rev. Anti. Infect. Ther 2014; 12, 1221–36.

272

12

ACCEPTED MANUSCRIPT 273

(9)

Lagier JC, Armougom F, Million M, Hugon P, Pagnier I, Robert C, Bittar F, Fournous

274

G, Gimenez G, Maraninchi M, Trape JF, V Koonin E, La Scola B, and Raoult D.

275

Microbial culturomics: paradigm shift in the human gut microbiome study. Clin

276

Microbiol Infect 2012 Dec; 18(12): 1185–93. (10) Prodigal. [Online]. Available: http://prodigal.oml.gov/.

278

(11) GenBank database. [Online]. Available: http://www.ncbi.nlm.nih.gov/genbank.

RNA genes in genomic sequence. Nucleic Acids Res Mar 1997; 25(5): 955–64.

280 281

Lowe TM and Eddy SR. tRNAscan-SE: a program for improved detection of transfer

SC

(12)

(13)

Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, and Ussery DW.

M AN U

279

RI PT

277

282

RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids

283

Res 2007 Jan; 35(9): 3100–8.

284

(14)

Gupta SK, Padmanabhan BR, Diene SM, Lopez-Rojas R, Kempf M, Landraud L, Rolain JM. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance

286

genes in bacterial genomes. Antimicrob. Agents Chemother 2014; 58, 212–20.

287

(15) Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R, Lee SY, Fischbach MA,

288

Müller R, Wohlleben W, Breitling R, Takano E and Medema MH. antiSMASH 3.0--a

289

comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res 2015; 43, W237–43.

290 291

(16)

Bendtsen JD, Nielsen H, von Heijne G, and Brunak S. Improved prediction of signal

peptides: SignalP 3.0. J Mol Biol 2004 Jul;340(4): 783–95.

292 293

AC C

EP

TE D

285

(17)

Krogh A, Larsson B, von Heijne G, and Sonnhammer EL. Predicting transmembrane

294

protein topology with a hidden Markov model: application to complete genomes. J Mol

295

Biol 2001 Jan; 305(3):567–80

13

ACCEPTED MANUSCRIPT 296

(18)

Meier-Kolthoff JP, Auch AF, Klenk HP, and Göker M. Genome sequence-based

297

species delimitation with confidence intervals and improved distance functions. BMC

298

Bioinformatics 2013 Jan; 14:p. 60.

299

(19)

Meier-Kolthoff JP, Klenk HP, and Göker M. Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age. Int J Syst Evol Microbiol 2014 Feb; 64(

301

2):352–6.

RI PT

300

SC

302

AC C

EP

TE D

M AN U

303

14

ACCEPTED MANUSCRIPT

E. casseliflavus

E. gallinarum

E. haemoperoxidus

E. cecorum

E. sulfureus

E. caccae

Facultative anaerobic Positive Motile -

Facultative anaerobic Positive -

Facultative anaerobic Positive Motile +

Facultative anaerobic Positive Motile -

Facultative anaerobic Positive -

Facultative anaerobic Positive -

Facultative anaerobic Positive +

Facultative anaerobic Positive -

+ +

-

+

+

na na na +

+ -

na na na +

na na na +

+ + + + 39.6 Human stool

+ + + 37.3 Intestines mammals

+ + + + + 40.7 Intestines mammals

+ + 35.8 Water

+ + 36.3 Commensal chicken

+ + 37.8 Plants

+ 35.8 Human stool

M AN U

TE D

+ + V + + + 42.7 Intestines mammals

SC

E. faecalis

EP

Gram stain Motility Pigment Production of Alkaline phosphatase Catalase Oxydase β-glucuronidase α-galactosidase β-galactosidase N-acetyl-glucosamine Acide form Mannitol Sorbose L-arabinose Sorbitol D-Rafinose Xylose D-Trehalose G + C content (%) Habitat

E. massiliensis

AC C

Properties Cell diameter (µm) Oxygen requirement

RI PT

Table 1. Differential characteristics of Enterococcus massiliensis sp AM1, E. faecalis, E. casseliflavus, E. gallinarum, E. haemoperoxidus, E. cecorum, E. sulfureus and E. caccae.

+ : positive result ; - : negative results ; v : variable, na : data not available.

ACCEPTED MANUSCRIPT Table 2. Nucleotide content and gene count levels of the genome % of totala 100 39.6 88.77 100 2.60 97.39 72.18 71.19 9.55 24.07

RI PT

Value 2,712,841 1,075,567 2,408,151 2687 70 2617 1889 1863 250 630

SC

Attribute Genome size (bp) DNA G + C content (bp) DNA coding region (bp) Total genes RNA genes Protein-coding genes Genes with function prediction Genes assigned to COGs Genes with peptide signals Genes with transmembrane helices

a

AC C

EP

TE D

M AN U

The total is based on either the size of the genome in base paris or the total number of protein coding genes in the annotated genome.

ACCEPTED MANUSCRIPT Table 3. Number of genes associated with the 25 general COG functional categories.

M AN U

SC

RI PT

description Rna processing and modification Chromatin structure and dynamics Energy production and conversion Cell cycle control, cell division, chromosome partitioning Amino acid transport and metabolism Nucleotide transport and metabolism Carbohydrate transport and metabolism Coenzyme transport and metabolism Lipid transport and metabolism Translation, ribosomal structure and biogenesis Transcription Replication, recombination and repair Cell wall/membrane/envelope biogenesis Cell motility Posttranslational modification, protein turnover, chaperones Inorganic ion transport and metabolism Secondary metabolites biosynthesis, transport and catabolism General function prediction only Function unknown Signal transduction mechanisms Intracellular trafficking, secretion, and vesicular transport Defense mechanisms Extracellular structures Nuclear structure Cytoskeleton Not in COGs

TE D

% agea 0 0 2.98 0.84 6.38 2.56 9.48 1.76 2.03 5.88 7.15 5.98 3.40 0.19 2.06 3.97 0.76 9.94 7.26 2.25 0.91 2.29 0 0 0 28.81

the total is based on the total number of protein coding genes in the annotated genome.

AC C

a

Value 0 0 78 22 167 67 248 46 53 154 187 155 89 5 54 104 20 260 190 59 24 60 0 0 0 754

EP

Color of COG class COG class A B C D E F G H I J K L M N O P Q R S T U V W Y Z -

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

Fig. 1

ACCEPTED MANUSCRIPT

Fig. 2 B

AC C

EP

TE D

M AN U

SC

RI PT

A

ACCEPTED MANUSCRIPT

AC C

EP

TE D

M AN U

SC

RI PT

Fig. 3