Accepted Manuscript Non-contiguous finished genome sequence and description of Enterococcus massiliensis sp. nov. Stéphanie Le Page, Teresa Cimmino, Amadou Togo, Matthieu Million, Caroline Michelle, Saber Khelaifia, Jean-Christophe Lagier, Didier Raoult, Jean-Marc Rolain PII:
S2052-2975(16)30025-7
DOI:
10.1016/j.nmni.2016.04.011
Reference:
NMNI 163
To appear in:
New Microbes and New Infections
Received Date: 4 February 2016 Revised Date:
19 April 2016
Accepted Date: 28 April 2016
Please cite this article as: Le Page S, Cimmino T, Togo A, Million M, Michelle C, Khelaifia S, Lagier JC, Raoult D, Rolain J-M, Non-contiguous finished genome sequence and description of Enterococcus massiliensis sp. nov., New Microbes and New Infections (2016), doi: 10.1016/j.nmni.2016.04.011. This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
ACCEPTED MANUSCRIPT 1
Taxonogenomics: reporting the genome of a new organism
2
Full-length title: Non-contiguous finished genome sequence and description of Enterococcus
4
massiliensis sp. nov.
5
Short title (for the running head):
6
Author list: Stéphanie Le Page1, Teresa Cimmino1, Amadou Togo1, Matthieu Million1, ,
7
Caroline Michelle1, Saber Khelaifia1, Jean-Christophe Lagier1, Didier Raoult1 and Jean-Marc
8
Rolain*1
SC
RI PT
3
9
Affiliations: 1 Aix-Marseille Univ., URMITE UM 63 CNRS 7278 IRD 198 INSERM U1905,
11
IHU Méditerranée Infection, Facultés de Médecine et de Pharmacie, 27 boulevard Jean
12
Moulin, 13385 Marseille CEDEX 05, France
M AN U
10
13
* Corresponding author:
TE D
14 15
17
EP
16
Text word count: 1540 Words
19
Abstract word count: 90 words
20
Number of references: 19
21
Number of tables: 3
22
Number of figures: 3
23
Key words: Enterococcus massiliensis, genome, taxono-genomics, culturomics
AC C
18
24
1
ACCEPTED MANUSCRIPT
ABSTRACT
25
Enterococcus massiliensis strain sp. nov. (=CSUR P1927 = DSM 100308) is a new species
27
within the genus Enterococcus. This strain was first isolated from a fresh stool sample of a
28
man during culturomics study of intestinal microflora. Enterococcus massiliensis is a Gram-
29
positive cocci, facultative anaerobic and motile. E. massiliensis is negative for mannitol and
30
positive for beta-galactosidase contrary to E. gallinarum. The complete genome sequence is
31
2,712, 841 bp in length with a GC content of 39.6% and contains 2,617 protein-coding genes
32
and 70 RNA genes, including 9 rRNA genes.
SC
RI PT
26
33
M AN U
34
Abbreviations
36
CSUR: Collection de souches de l’Unité des Rickettsies
37
URMITE: Unité des Maladies Infectieuses et Tropicales Emergentes
38
DSM: Deutsche Sammlung von Mikroorganismen
39
MALDI-TOF MS: Matrix-Assisted laser-desorption/ionization time-of-flight mass
40
spectrometry
41
dDDH: digital DNA-DNA hybridization
43 44
EP
AC C
42
TE D
35
2
ACCEPTED MANUSCRIPT INTRODUCTION
46
Enterococcus massiliensis sp. nov. strain AM1T (=CSUR P1927 = DSM 100308) belongs to
47
the genus Enterococcus. This bacterium is a Gram-positive, facultative anaerobic, motile and
48
unpigmented. It was isolated from a fresh stool sample of a human in Marseille as part of
49
“culturomics” study (1). Currently, a polyphasic approach that combines proteomic by
50
MALDI-TOF spectra analysis (2), genomic data with 16S rRNA sequence identity and
51
phylogeny (3), genomic G + C content diversity and DNA-DNA hybridization (DDH) and
52
phenotypic characterization is used to describe new bacterial species (4).
53
Enterococcus are classified in the genus of Streptococcus because of the presence of the D
54
antigen until 1984 (Thiercelin and Jouhaud) (5) . But, after analysis of the genome of
55
Streptococcus faecalis and S. faecium these strains have been transferred to the genus
56
Enterococcus (6). Members of the genus Enterococcus are components of the intestinal flora
57
of humans and animals. There are opportunistic pathogens with 2 principal strains:
58
Enterococcus faecalis and Enterococcus faecium responsible for nosocomial infections (7,8).
59
Here we present a summary classification and a set of features for E. massiliensis sp. nov.
60
strain AM1T together with the description of the complete genome sequence and annotation.
61
These characteristics support the circumscription of the species E. massiliensis.
62
Organism information
63
A stool sample was collected in 2015 from a voluntary patient as a negative control and
64
isolated on Columbia agar supplemented with 5% sheep blood (bioMérieux, Marcy-l’Étoile,
65
France) in aerobic and anaerobic condition using GasPak™ EZ Anaerobe Container System
66
Sachets (BD) at 37°. Enterococcus massiliensis was sequenced as part of a culturomics study
67
aiming to isolate all bacterial species colonizing the human gut (9). Enterococcus massiliensis
68
strain AM1T (GenBank accession number LN833866) exhibited a 97% 16S rRNA nucleotide
69
sequence similary with Enterococcus gallinarum (JF915769) , the phylogenetically-closest
AC C
EP
TE D
M AN U
SC
RI PT
45
3
ACCEPTED MANUSCRIPT validly pubished bacterial species (Fig.1) after compararison with NCBI database. This value
71
is lower than 98.7% 16S rRNA gene sequence similarity set as a threshold recommended by
72
Stackebrandt and Ebers to delineate a new species without carrying out DNA-DNA
73
hybridization.
74
Growth occurred between 25°C and 37°C, but optimal growth was observed at 37°C, 24
75
hours after inoculation. Colonies were smooth, withish and approximately 1 mm in diameter
76
on 5% sheep blood-enriched agar (Biomerieux). Growth of the strain was tested under
77
anaerobic and microaerophilic conditions using GasPak EZ Anaerobe pouch (Becton,
78
Dickinson and Company) and CampyGen Compact (Oxoid) systems respectively, and in
79
aerobic conditions, with or without 5% of CO2. Growth was achieved under aerobic (with and
80
without CO2), microaerophilic and anaerobic conditions. Gram staining showed Gram-
81
positive cocci without sporulation (Fig.2A). A motility test was positive and realized with
82
API® M Medium (Biomerieux, Marcy l’Etoile, France) a semi-solid medium with an
83
inoculation performed by swabbing one colony into the medium. After 24 hours of
84
incubation, the growth of E. massiliensis was away from this stabbed line, characterisitic of
85
the positive motility. Cells grown on agar exhibit a mean diameter of 0.5 µm and a mean
86
length ranging from 1.1 to 1.3 µm (mean 1.2 µm) determined by negative staining
87
transmission electron microscopy (Fig.2B).
88
Differential phenotypic characteristics using API 50CH and API Zym system (Biomerieux)
89
between E. massiliensis sp. nov. AM1T and others Enterococcus species (9) are presented in
90
Table 1. Antibiotic Susceptibility testing was performed by the disk diffusion method on
91
Mueller-Hinton agar with blood (Biomerieux, Marcy l’Etoile, France). E. massiliensis is
92
susceptible to vancomycin, teicoplanine, linezolid, gentamicin, ciprofloxacin, doxycyclin,
93
rifampicin and pristinamycin and resistant or intermediate to penicillin G, oxacillin,
AC C
EP
TE D
M AN U
SC
RI PT
70
4
ACCEPTED MANUSCRIPT ceftriaxone, cefoxitin, trimethoprim/sulfamethoxazole, fosfomycin, erythromycin and
95
clindamycin.
96
Extended features descriptions
97
Matrix-assisted laser-desorption/ionization time-of-flight (MALDI-TOF) MS protein analysis
98
was carried out as previously described (2) using a Microflex spectrometer (Bruker Daltonics,
99
Leipzig, Germany). Twelve distinct deposits were done for strain AM1T from 12 isolated
100
colonies. Twelve distinct deposits were done for strain AM1T from 12 isolated colonies.
101
Spectra were imported into the MALDI BioTyper software (version 2.0, Bruker) and analyzed
102
by standard pattern matching against 7,765 bacterial spectra including 92 spectra from 31
103
Enterococcus species, in the BioTyper database. Interpretation of scores was as follows: a
104
score > 2 enabled the identification at the species level, a score > 1.7 but < 2 enabled the
105
identification at the genus level; and a score < 1.7 did not enable any identification (These
106
scores were established by the manufacturer Bruker Daltonics). For strain AM1T, no
107
significant MALDI-TOF score was obtained against the Bruker database thus suggesting that
108
our isolate was a new species. We incremented our database with the spectrum from strain
109
AM1T (Fig.3).
110
Genome sequencing information
111
Enterococcus massiliensis sp. nov. (GenBank accession number CVRN00000000 ) is the 54
112
species described within Enterococcus genus.
113
After DNA extraction by phenol-chloroform method, genomic DNA of E. massiliensis was
114
sequenced on a MiSeq instrument (Illumina Inc, San Diego, CA, USA) using paired end and
115
mate pair strategies.
116
For genome annotation, Open Reading Frames (ORFs) were predicted using Prodigal (10)
117
with default parameters but the predicted ORFs were excluded if they spanned a sequencing
118
gap region. The predicted bacterial protein sequences were searched for against the GenBank
AC C
EP
TE D
M AN U
SC
RI PT
94
5
ACCEPTED MANUSCRIPT database (11) and the Clusters of Orthologous Groups (COG) databases using BLASTP. The
120
tRNAScanSE tool (12) was used to find tRNA genes, whereas ribosomal RNAs were detected
121
using RNAmmer (13) and BLASTn against the GenBank database.
122
The ARG-ANNOT database for acquired antibiotic resistance genes (ARGs) was employed
123
for a BLAST search using the Bio-Edit interface (14). The assembled sequences were
124
searched against the ARG database under moderately stringent conditions (e-value of 10−5 )
125
for the in silico ARG prediction. E. massiliensis present Lsa gene, encoding a putative ABC
126
protein Lsa with an identity to 72% with Lsa family ABC-F of E. faecalis in NCBI that
127
confirm phenotypically the resistance to clindamycin.
128
Analysis of presence of polyketide synthase (PKS) and Non-Ribosomal Polyketide Synthesis
129
(NRPS) were performed by discriminating the gene with large size using a database realized
130
in our laboratory, predicted proteins were compared against non redundant (nr) GenBank
131
database using blastp and finally examined using antiSMASH (15). The analysis of genome
132
revealed the absence of NRPKs and PKS. Lipoprotein signal peptides and number of
133
transmembrane helices were predicted using SignalP (16) and TMHMM (17), respectively.
134
ORFans were identified if their BLASTP E-value was lower than 10-3 for alignment length
135
greater than 80 amino acids.
136
Moreover, we used Genome-to-Genome Distance calculator (GGDC) web server available at
137
(http://ggdc.dsmz.de) to estimate the overall similarity among the compared genomes and to
138
replace the wet-lab DNA-DNA hydridization (DDH) by a digital DDH (dDDH) (18-19).
139
GGDC 2.0 BLAST+ was chosen as alignment method and the recommended formula 2 was
140
taken into account to interpret the results.
141
We compared the genome of E. massiliensis with 9 other genomes of Enterococcus strains.
142
The genome is 2,712,841 bp long (one chromosome, no plasmid) with a GC content of 39.6%
143
(Table 2). The properties and statistics of the genome are summarized in Table 2. The draft
AC C
EP
TE D
M AN U
SC
RI PT
119
6
ACCEPTED MANUSCRIPT genome of E. massiliensis is smaller than those of E. moraviensis, E. haemoperoxidus, E.
145
caccae, E. casseliflavus, E. gallinarum and E. faecalis ( 3.60, 3.58, 3.56, 3.43, 3.16 and 2.96
146
Mb respectively), but larger than those of E. saccharolyticus, E. columbae, E. cecorum and E.
147
sulfureus ( 2.60, 2.58, 2.34 and 2.31 respectively). The G + C content of E. massiliensis is
148
lower than those of E. casseliflavus and E. gallinarum ( 42.8 and 40.7) but greater than those
149
of E. moraviensis, E. haemoperoxidus, E. caccae, E. saccharolyticus, E. columbae, E.
150
cecorum, E. sulfureus and E. faecalis (39.6, 36.1, 35.7, 35.8, 36.9, 36.6, 36.4, 38.0 and 37.5
151
respectively). Of the 2687 predicted chromosomal genes, 2617 were protein-coding genes and
152
70 were RNAs including 61 tRNAs and 9 rRNAs (5S = 4, 23S = 2, 16S = 3). A total of 1889
153
genes (72.2%) were assigned to a putative function described in Fig.3 and Table 3. 71 genes
154
were identified as ORFans (2.71%) and the remaining genes were annotated as hypothetical
155
proteins. The distribution of genes into COGs functional categories is presented in Table 3.
M AN U
SC
RI PT
144
156
TE D
157 158
CONCLUSION/PERSPECTIVES
160
On the basis of phenotypic, phylogenetic and genomic analyses, we formally propose the
161
creation of Enterococcus massiliensis sp. nov. AM1T. This strain was isolated in Marseille,
162
France.
164
AC C
163
EP
159
165
Taxonomic and nomenclatural proposals
166
Description of Enterococcus massiliensis sp. nov.
167
Enterococcus massiliensis (massiliensis because this strain was isolated in Massilia , the Latin
168
name of Marseille, where the strain was sequenced).
7
ACCEPTED MANUSCRIPT Colonies were whitish and approximately 1 mm diameter on 5% sheep blood-enriched agar.
170
Cells are Gram-positive, non haemolytic,facultative anaerobic with a mean length of 1.2 µm
171
and a mean diameter of 0.6 µm. Growth occurred between 25°C to 37°C, but optimal growth
172
was observed at 37°C. Alkaline phosphatase, esterase (C4), esterase lipase (C8), leucine
173
arylamidase, acid phosphatase, β-galactosidase and N-acetyl-β-glucosaminidase activities
174
were present. Esculine activity was also positive but catalase, oxydase, β-galactosidase and N-
175
acetyl-β-glucosaminidase were negative. Positive reaction were obtained for D-ribose, D-
176
glucose, D-fructose, D-mannose and N-acetylglucosamine. E. massiliensis was susceptible to
177
vancomycin, teicoplanine, linezolid, gentamicin, ciprofloxacin, doxycyclin, rifampicin and
178
pristinamycin, but resistant to for trimethoprim/sulfamethoxazole, fosfomycin, erythromycin
179
and clindamycin.
180
The G+C content of the genome is 39.6%. The 16S rRNA and genome sequences are
181
deposited in GenBank under accession numbers LN833866 and CVRN00000000,
182
respectively. The type strain AM1T (=CSUR P1927 = DSM 100308) was isolated from a fresh
183
stool sample of a patient in Marseille, France.
SC
M AN U
TE D
EP AC C
184
RI PT
169
8
ACCEPTED MANUSCRIPT
ACKNOWLEDGMENTS
185 186 187
We are very grateful to Xegen company for automating the genome annotation process.
188
190
RI PT
189
CONFLICT OF INTEREST AND FINANCIAL DISCLOSURE
191
No potential conflict of interest or financial disclosure for all authors.
SC
192 193
M AN U
194 195 196 197
201 202 203 204 205 206 207
EP
200
AC C
199
TE D
198
208 209 210 211 212
9
ACCEPTED MANUSCRIPT
FIGURE LEGEND
213 214
Fig. 1. A consensus phylogenetic tree highlighting the position of Enterococcus
216
massiliensis relative to other type strains within the genus Enterococcus by 16S. In
217
brackets are indicated the GenBank accession numbers. Sequences were aligned using
218
CLUSTALW, and phylogenetic inferences obtained using the maximum-likehood method in
219
the MEGA6 software package. Numbers at the nodes are percentages of bootstrap values
220
from 1000 replicates that support the nodes. Streptococcus pneumoniae and Staphylococcus
221
aureus were used as outgroups. The scale bar represents 1% nucleotide sequence divergence.
222
Fig. 2. 2(A). Gram of Enterococcus massiliensis and 2(B). Transmission electron micrograph
223
of E. massiliensis strain, taken using a Technai G20 Cryo (FEI) at an operating voltage of 200
224
kV with magnification 1000x.
225
Fig. 3. Graphical circular map of chromosome. From outside to center: genes on forward
226
strand (colored by COGs categories explain in Table 3), genes on reverse strand (colored by
227
COGs categories), RNA genes (tRNAs green, rRNAs red), G+C content, G+C skew.
230 231 232 233
SC
M AN U
TE D
EP
229
AC C
228
RI PT
215
234 235 236 237
10
ACCEPTED MANUSCRIPT TABLE LEGENDE
238 239
Table 1. Differential characteristics of Enterococcus massiliensis sp AM1, E. faecalis, E.
241
casseliflavus, E. gallinarum, E. haemoperoxidus, E. cecorum, E. sulfureus and E. caccae.
242
Table 2. Nucleotide content and gene count levels of the genome
243
Table 3. Number of genes associated with the 25 general COG functional categories.
244
SC
245
RI PT
240
246
M AN U
247
AC C
EP
TE D
248
11
ACCEPTED MANUSCRIPT
REFERENCES
249
250
(1)
Lagier JC, Hugon P, Khelaifia S, Fournier PE, La Scola B, and Raoult D. The rebirth of culture in microbiology through the example of culturomics to study human gut
252
microbiota. Clin. Microbiol. Rev 2015 Jan; 28(1): 237–64.
253
(2)
RI PT
251
SengP, Drancourt M, Gouriet F, La Scola B, Fournier PE, Rolain JM, and Raoult D. Ongoing revolution in bacteriology: routine identification of bacteria by matrix-assisted
255
laser desorption ionization time-of-flight mass spectrometry. Clin Infect Dis Aug 2009
256
;49(4): 543–51. (3)
Microbiol Today 2015; 152-155.
258 259
Stackebrandt E EJ. Taxonomic parameters revisited: tarnished gold standards.
(4)
M AN U
257
SC
254
Rosselo-Mora R. DNA-DNA Reassociation Methods Applied to Microbial Taxonomy and Thier Critical Evaluation. In: Stackebrandt E (ed), Molecular identification,
261
Systematics, and population Structure of Prokaryotes. Springer 2015; Berlin. 23-50. (5)
granulations peripheriques et microblastes. Comptes Rendus des Seances la So 1903.
263 264
Thiercelin E and Jouhaud L. Reproduction de l’enterocoque; taches centrales;
(6)
Schleifer KH and Kilpper-Balz R.Transfer of Streptococcus faecalis and Streptococcus
EP
262
TE D
260
faecium to the Genus Enterococcus nom. rev. as Enterococcus faecalis comb. nov. and
268
AC C
265
269
2012 ;16(8): 1039–44.
Enterococcus faecium comb. nov. Int J Syst Bacteriol Jan 1984 ; 34(1): 31–34.
266 267
270 271
(7)
Bereket W, Hemalatha K, Getenet B, Wondwossen T, Solomon A, Zeynudin A, and
Kannan S. Update on bacterial nosocomial infections. Eur. Rev. Med. Pharmacol. Sci
(8)
Miller WR, Munita JM and Arias CA. Mechanisms of antibiotic resistance in enterococci. Expert Rev. Anti. Infect. Ther 2014; 12, 1221–36.
272
12
ACCEPTED MANUSCRIPT 273
(9)
Lagier JC, Armougom F, Million M, Hugon P, Pagnier I, Robert C, Bittar F, Fournous
274
G, Gimenez G, Maraninchi M, Trape JF, V Koonin E, La Scola B, and Raoult D.
275
Microbial culturomics: paradigm shift in the human gut microbiome study. Clin
276
Microbiol Infect 2012 Dec; 18(12): 1185–93. (10) Prodigal. [Online]. Available: http://prodigal.oml.gov/.
278
(11) GenBank database. [Online]. Available: http://www.ncbi.nlm.nih.gov/genbank.
RNA genes in genomic sequence. Nucleic Acids Res Mar 1997; 25(5): 955–64.
280 281
Lowe TM and Eddy SR. tRNAscan-SE: a program for improved detection of transfer
SC
(12)
(13)
Lagesen K, Hallin P, Rodland EA, Staerfeldt HH, Rognes T, and Ussery DW.
M AN U
279
RI PT
277
282
RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids
283
Res 2007 Jan; 35(9): 3100–8.
284
(14)
Gupta SK, Padmanabhan BR, Diene SM, Lopez-Rojas R, Kempf M, Landraud L, Rolain JM. ARG-ANNOT, a new bioinformatic tool to discover antibiotic resistance
286
genes in bacterial genomes. Antimicrob. Agents Chemother 2014; 58, 212–20.
287
(15) Weber T, Blin K, Duddela S, Krug D, Kim HU, Bruccoleri R, Lee SY, Fischbach MA,
288
Müller R, Wohlleben W, Breitling R, Takano E and Medema MH. antiSMASH 3.0--a
289
comprehensive resource for the genome mining of biosynthetic gene clusters. Nucleic Acids Res 2015; 43, W237–43.
290 291
(16)
Bendtsen JD, Nielsen H, von Heijne G, and Brunak S. Improved prediction of signal
peptides: SignalP 3.0. J Mol Biol 2004 Jul;340(4): 783–95.
292 293
AC C
EP
TE D
285
(17)
Krogh A, Larsson B, von Heijne G, and Sonnhammer EL. Predicting transmembrane
294
protein topology with a hidden Markov model: application to complete genomes. J Mol
295
Biol 2001 Jan; 305(3):567–80
13
ACCEPTED MANUSCRIPT 296
(18)
Meier-Kolthoff JP, Auch AF, Klenk HP, and Göker M. Genome sequence-based
297
species delimitation with confidence intervals and improved distance functions. BMC
298
Bioinformatics 2013 Jan; 14:p. 60.
299
(19)
Meier-Kolthoff JP, Klenk HP, and Göker M. Taxonomic use of DNA G+C content and DNA-DNA hybridization in the genomic age. Int J Syst Evol Microbiol 2014 Feb; 64(
301
2):352–6.
RI PT
300
SC
302
AC C
EP
TE D
M AN U
303
14
ACCEPTED MANUSCRIPT
E. casseliflavus
E. gallinarum
E. haemoperoxidus
E. cecorum
E. sulfureus
E. caccae
Facultative anaerobic Positive Motile -
Facultative anaerobic Positive -
Facultative anaerobic Positive Motile +
Facultative anaerobic Positive Motile -
Facultative anaerobic Positive -
Facultative anaerobic Positive -
Facultative anaerobic Positive +
Facultative anaerobic Positive -
+ +
-
+
+
na na na +
+ -
na na na +
na na na +
+ + + + 39.6 Human stool
+ + + 37.3 Intestines mammals
+ + + + + 40.7 Intestines mammals
+ + 35.8 Water
+ + 36.3 Commensal chicken
+ + 37.8 Plants
+ 35.8 Human stool
M AN U
TE D
+ + V + + + 42.7 Intestines mammals
SC
E. faecalis
EP
Gram stain Motility Pigment Production of Alkaline phosphatase Catalase Oxydase β-glucuronidase α-galactosidase β-galactosidase N-acetyl-glucosamine Acide form Mannitol Sorbose L-arabinose Sorbitol D-Rafinose Xylose D-Trehalose G + C content (%) Habitat
E. massiliensis
AC C
Properties Cell diameter (µm) Oxygen requirement
RI PT
Table 1. Differential characteristics of Enterococcus massiliensis sp AM1, E. faecalis, E. casseliflavus, E. gallinarum, E. haemoperoxidus, E. cecorum, E. sulfureus and E. caccae.
+ : positive result ; - : negative results ; v : variable, na : data not available.
ACCEPTED MANUSCRIPT Table 2. Nucleotide content and gene count levels of the genome % of totala 100 39.6 88.77 100 2.60 97.39 72.18 71.19 9.55 24.07
RI PT
Value 2,712,841 1,075,567 2,408,151 2687 70 2617 1889 1863 250 630
SC
Attribute Genome size (bp) DNA G + C content (bp) DNA coding region (bp) Total genes RNA genes Protein-coding genes Genes with function prediction Genes assigned to COGs Genes with peptide signals Genes with transmembrane helices
a
AC C
EP
TE D
M AN U
The total is based on either the size of the genome in base paris or the total number of protein coding genes in the annotated genome.
ACCEPTED MANUSCRIPT Table 3. Number of genes associated with the 25 general COG functional categories.
M AN U
SC
RI PT
description Rna processing and modification Chromatin structure and dynamics Energy production and conversion Cell cycle control, cell division, chromosome partitioning Amino acid transport and metabolism Nucleotide transport and metabolism Carbohydrate transport and metabolism Coenzyme transport and metabolism Lipid transport and metabolism Translation, ribosomal structure and biogenesis Transcription Replication, recombination and repair Cell wall/membrane/envelope biogenesis Cell motility Posttranslational modification, protein turnover, chaperones Inorganic ion transport and metabolism Secondary metabolites biosynthesis, transport and catabolism General function prediction only Function unknown Signal transduction mechanisms Intracellular trafficking, secretion, and vesicular transport Defense mechanisms Extracellular structures Nuclear structure Cytoskeleton Not in COGs
TE D
% agea 0 0 2.98 0.84 6.38 2.56 9.48 1.76 2.03 5.88 7.15 5.98 3.40 0.19 2.06 3.97 0.76 9.94 7.26 2.25 0.91 2.29 0 0 0 28.81
the total is based on the total number of protein coding genes in the annotated genome.
AC C
a
Value 0 0 78 22 167 67 248 46 53 154 187 155 89 5 54 104 20 260 190 59 24 60 0 0 0 754
EP
Color of COG class COG class A B C D E F G H I J K L M N O P Q R S T U V W Y Z -
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
Fig. 1
ACCEPTED MANUSCRIPT
Fig. 2 B
AC C
EP
TE D
M AN U
SC
RI PT
A
ACCEPTED MANUSCRIPT
AC C
EP
TE D
M AN U
SC
RI PT
Fig. 3