Permanent draft genome sequence of Geobacillus thermocatenulatus strain GS-1

Permanent draft genome sequence of Geobacillus thermocatenulatus strain GS-1

Marine Genomics 18 (2014) 129–131 Contents lists available at ScienceDirect Marine Genomics journal homepage: www.elsevier.com/locate/margen Genomi...

274KB Sizes 0 Downloads 96 Views

Marine Genomics 18 (2014) 129–131

Contents lists available at ScienceDirect

Marine Genomics journal homepage: www.elsevier.com/locate/margen

Genomics/Technical resources

Permanent draft genome sequence of Geobacillus thermocatenulatus strain GS-1 Beiwen Zheng a,1, Fan Zhang b,1, Lujun Chai b, Gaoming Yu c, Fuchang Shu d, Zhengliang Wang d, Sanbao Su d, Tingsheng Xiang d, Zhongzhi Zhang e, DuJie Hou b, Yuehui She d,⁎ a

State Key Laboratory for Diagnosis and Treatment of Infectious Diseases, Zhejiang University, Hangzhou, China The Key Laboratory of Marine Reservoir Evolution and Hydrocarbon Accumulation Mechanism, School of Energy Resources, China University of Geosciences, Beijing, China College of Petroleum Engineering, Yangtze University, Jingzhou, China d College of Chemistry and Environmental Engineering, Yangtze University, Jingzhou, China e State Key Laboratory of Heavy Oil Processing, China University of Petroleum, Beijing, China b c

a r t i c l e

i n f o

Article history: Received 12 August 2014 Received in revised form 18 September 2014 Accepted 18 September 2014 Available online 1 October 2014 Keywords: Geobacillus thermocatenulatus Thermophile Hydrocarbon degradation Genome

a b s t r a c t Geobacillus thermocatenulatus strain GS-1 is a thermophilic bacillus having a growth optimum at 60 °C, capable of degrading alkanes. It was isolated from the formation water of a high-temperature deep oil reservoir in Qinghai oilfield, China. Here, we report the draft genome sequence with an estimated assembly size of 3.5 Mb. A total of 3371 protein-coding sequences, including monooxygenase, alcohol dehydrogenase, aldehyde dehydrogenase, fatty acid-CoA ligase, acyl-CoA dehydrogenase, enoyl-CoA hydrogenase, hydroxyacyl-CoA dehydrogenase and thiolase, were detected in the genome, which are involved in the alkane degradation pathway. Our results may provide insights into the genetic basis of the adaptation of this strain to high-temperature oilfield ecosystems. © 2014 Published by Elsevier B.V.

1. Introduction Geobacillus is a genus of Gram-positive, spore-forming rod, aerobic or facultative anaerobic bacterium. A total of 56 strains were assigned to the genus Geobacillus, on the basis of phenotypic and 16S rRNA gene sequence analysis (Coorevits et al., 2012). Members of Geobacillus have been isolated from various freshwater and marine systems and have attracted interest for their potential industrial applications (Zhang et al., 2010; Selim, 2012; Garg et al., 2012; McMullan et al., 2004). Geobacillus thermocatenulatus strain GS-1 was isolated from the formation water sample of Qinghai oilfield, China (38°16′N–90°95′E) by direction isolation of the crude-oil degrading strain. It grows between 25 °C and 65 °C (optimum 60 °C) and has the capability to use lactose, rhamnosus, sorbitol, glycerol, tetradecane and hexadecane as a sole carbon source. Colonies grown on the LB plate are butyrous, round and raised with entire margins, with a diameter ranging 0.3–0.9 μm, and from 3 to 10 μm long. Sequence analysis of the 16S rRNA gene indicated that strain GS-1 was grouped into the same branch with species

⁎ Corresponding author. Tel./fax: +86 716 8060650. E-mail address: [email protected] (Y. She). 1 These authors contributed equally to this work.

http://dx.doi.org/10.1016/j.margen.2014.09.005 1874-7787/© 2014 Published by Elsevier B.V.

G. thermocatenulatus type strain DSM 730T (Supplementary materials). To date, the genomes of some Geobacillus representatives have been sequenced and published; however, the genome of G. thermocatenulatus remains unknown (Feng et al., 2007; Bhalla et al., 2013). To further elucidate comprehensive hydrocarbon degradation pathways and the mechanism for thermophilic adaptation to high temperature in G. thermocatenulatus strain GS-1, here, we determined the permanent draft genome sequences of G. thermocatenulatus strain GS-1 (=CGMCC 5644). The genomic DNA of this strain was isolated using the DNeasy Blood & Tissue Kit (Qiagen, Germany). Sequencing was performed by using Illumina Hieseq 2000 genomic sequencer at BGI (Shenzhen, China), with a 2 × 100 paired-end sequencing strategy. The shotgun library was constructed with a 500 bp-span paired-end library. All clean reads were assembled into scaffolds using Velvet version 1.2.07 (Zerbino and Birney, 2008), and PAGIT flow was used to prolong the initial contigs and correct sequencing errors (Swain et al., 2012). Gene prediction was carried out by using Glimmer 3.0 (Delcher et al., 2007). Ribosomal RNA genes were detected by using the RNAmmer 1.2 software (Lagesen et al., 2007) and transfer RNAs by tRNAscan-SE version 1.21 (Lowe and Eddy, 1997). The KAAS server (http://www. genome.jp/kegg/kaas/) was used to assign translated amino acids into KEGG Orthology (Kanehisa et al., 2008). Translated genes were aligned with COG database using NCBI blastp (Tatusov et al., 2001). Signal

130

B. Zheng et al. / Marine Genomics 18 (2014) 129–131

Table 1 Genome features. G. thermocatenulatus strain GS-1 Size (bp) Contigs G + C content (bp) Coding region (bp) Protein-coding genes tRNA genes rRNA genes Genes assigned to COGs Genes with signal peptides Genes with transmembrane helices

3,519,600 155 1,833,910 3,064,743 3371 74 9 2564 159 942

peptides were identified by SignalP version 4.1 (http://www.cbs.dtu.dk/ services/SignalP/). TMHMM 2.0 (http://www.cbs.dtu.dk/services/ TMHMM/) was used to identify genes with transmembrane helices. Orthology identification was carried out by a modified method introduced by Lerat et al. (2003) (Supplementary materials). The draft genome sequence of G. thermocatenulatus strain GS-1 revealed a genome size of 3,519,600 bp and a G + C content of 52.1% (155 scaffolds with N50 length of 72,438 bp). These scaffolds contain 3371 coding sequences (CDSs), 74 tRNAs and 9 rRNAs. A total of 1389 protein-coding genes were assigned as putative function or hypothetical proteins and 2564 genes were categorized into COG functional groups (including putative or hypothetical genes). The properties and the statistics of the genome are summarized in Table 1. As a thermophilic bacterium, GS-1 in response to heat stresses induces heat shock proteins, which remove or refold damaged proteins. Among the protein-coding genes of strain GS-1, several gene encoding molecular chaperones were found, including the dnaK operon comprised of genes encoding DnaJ–DnaK–GrpE and the HrcA regulator, GroEL, heat-shock proteins Hsp20 and Hsp33, and a protein disaggregation chaperone. Genes encoding ATP-dependent heat shockresponsive proteases such as Clp and Lon were also found. Putative genes encoding monooxygenase, alcohol dehydrogenase, aldehyde dehydrogenase, fatty acid-CoA ligase, acyl-CoA dehydrogenase, enoylCoA hydrogenase, hydroxyacyl-CoA dehydrogenase and thiolase were detected in the genome, which confirmed the presence of an

oxidation pathway for the degradation of long-chain alkanes (Feng et al., 2007), which is consistent with the phenotype of crude-oil degradation. Comparison of the GS-1 genome with Geobacillus thermodenitrificans NG80-2, Geobacillus stearothermophilus NUB3621, Geobacillus thermoglucosidasius C56-YS93 and Geobacillus thermoleovorans CCB_US3_UF5 revealed the presence of large core-genomes (Fig. 1), and these five Geobacillus strains shared 2084 CDSs in the genome. A particular overlap between G. thermocatenulatus GS-1 and G. thermoleovorans CCB_US3_UF5 became evident, and these two chromosomes shared 331 orthologous CDSs exclusively, and further 206 CDSs conjointly with G. stearothermophilus NUB3621,while the chromosome of G. thermocatenulatus GS-1 overlapped less with the G. thermodenitrificans NG80-2 chromosome, which shared 30 orthologous CDSs exclusively. In addition, 775 CDSs from the GS-1 genome were classified as unique. Our genomic data of strain GS-1 will provide a vast pool of genes involved in hydrocarbon degradation and an excellent platform for further improvement of this organism for potential application in bioremediation of oil-polluted environments.

2. Nucleotide sequence accession numbers This whole genome sequence project is deposited in DDBJ/EMBL/ GenBank under the accession JFHZ00000000.

Acknowledgments This study was sponsored by the National Natural Science Foundation of China (Grant Nos. 81301461, 50974022, and 51074029), 863 Program of the Ministry of Science and Technology of the People's Republic of China (Grant Nos. 2008AA06Z204 and 2013AA064402), and Zhejiang Provincial Natural Science Foundation of China (Grant No. LQ13H190002). The authors wish to thank the technical personnel in the oilfield under study, for kindly collecting samples.

Appendix A. Supplementary data Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.margen.2014.09.005.

Fig. 1. Complete genome sequences from G. thermoglucosidasius C56-YS93, G. thermodenitrificans NG80-2, G. stearothermophilus NUB3621, and G. thermoleovorans CCB_US3_UF5 are available and a comparative analysis between these completed genomes, and the draft version of G. thermocatenulatus have been made. Numbers inside the Venn diagrams indicate the number of genes found to be shared among the indicated genomes.

B. Zheng et al. / Marine Genomics 18 (2014) 129–131

References Bhalla, A., Kainth, A.S., Sani, R.K., 2013. Draft genome sequence of lignocellulose-degrading thermophilic bacterium Geobacillus sp. strain WSUCF1. Genome Announc. 1. Coorevits, A., Dinsdale, A.E., Halket, G., Lebbe, L., De Vos, P., Van Landschoot, A., Logan, N.A., 2012. Taxonomic revision of the genus Geobacillus: emendation of Geobacillus, G. stearothermophilus, G. jurassicus, G. toebii, G. thermodenitrificans and G. thermoglucosidans (nom. corrig., formerly ‘thermoglucosidasius’); transfer of Bacillus thermantarcticus to the genus as G. thermantarcticus comb. nov.; proposal of Caldibacillus debilis gen. nov., comb. nov.; transfer of G. tepidamans to Anoxybacillus as A. tepidamans comb. nov.; and proposal of Anoxybacillus caldiproteolyticus sp. nov. Int. J. Syst. Evol. Microbiol. 62, 1470–1485. Delcher, A.L., Bratke, K.A., Powers, E.C., Salzberg, S.L., 2007. Identifying bacterial genes and endosymbiont DNA with Glimmer. Bioinformatics 23, 673–679. Feng, L., Wang, W., Cheng, J., Ren, Y., Zhao, G., Gao, C., Tang, Y., Liu, X., Han, W., Peng, X., Liu, R., Wang, L., 2007. Genome and proteome of long-chain alkane degrading Geobacillus thermodenitrificans NG80-2 isolated from a deep-subsurface oil reservoir. Proc. Natl. Acad. Sci. U. S. A. 104, 5602–5607. Garg, N., Tang, W., Goto, Y., Nair, S.K., van der Donk, W.A., 2012. Lantibiotics from Geobacillus thermodenitrificans. Proc. Natl. Acad. Sci. U. S. A. 109, 5241–5246. Kanehisa, M., Araki, M., Goto, S., Hattori, M., Hirakawa, M., Itoh, M., Katayama, T., Kawashima, S., Okuda, S., Tokimatsu, T., Yamanishi, Y., 2008. KEGG for linking genomes to life and the environment. Nucleic Acids Res. 36, D480–D484.

131

Lagesen, K., Hallin, P., Rødland, E.A., Stærfeldt, H.-H., Rognes, T., Ussery, D.W., 2007. RNAmmer: consistent and rapid annotation of ribosomal RNA genes. Nucleic Acids Res. 35, 3100–3108. Lerat, E., Daubin, V., Moran, N.A., 2003. From gene trees to organismal phylogeny in prokaryotes: the case of the gamma-Proteobacteria. PLoS Biol. 1, E19. Lowe, T.M., Eddy, S.R., 1997. tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 25, 0955–0964. McMullan, G., Christie, J.M., Rahman, T.J., Banat, I.M., Ternan, N.G., Marchant, R., 2004. Habitat, applications and genomics of the aerobic, thermophilic genus Geobacillus. Biochem. Soc. Trans. 32, 214–217. Selim, S.A., 2012. Novel thermostable and alkalitolerant amylase production by Geobacillus stearothermophilus HP 3. Nat. Prod. Res. 26, 1626–1630. Swain, M.T., Tsai, I.J., Assefa, S.A., Newbold, C., Berriman, M., Otto, T.D., 2012. A postassembly genome-improvement toolkit (PAGIT) to obtain annotated genomes from contigs. Nat. Protoc. 7, 1260–1284. Tatusov, R.L., Natale, D.A., Garkavtsev, I.V., Tatusova, T.A., Shankavaram, U.T., Rao, B.S., Kiryutin, B., Galperin, M.Y., Fedorova, N.D., Koonin, E.V., 2001. The COG database: new developments in phylogenetic classification of proteins from complete genomes. Nucleic Acids Res. 29, 22–28. Zerbino, D.R., Birney, E., 2008. Velvet: algorithms for de novo short read assembly using de Bruijn graphs. Genome Res. 18, 821–829. Zhang, Z.G., Yi, Z.L., Pei, X.Q., Wu, Z.L., 2010. Improving the thermostability of Geobacillus stearothermophilus xylanase XT6 by directed evolution and site-directed mutagenesis. Bioresour. Technol. 101, 9272–9278.