Computational Biology and Chemistry 36 (2012) 62–70
Contents lists available at SciVerse ScienceDirect
Computational Biology and Chemistry journal homepage: www.elsevier.com/locate/compbiolchem
Structural characteristics of genomic islands associated with GMP synthases as integration hotspot among sequenced microbial genomes Lei Song a,b , Yuting Pan b , Sihong Chen b , Xuehong Zhang a,∗ a b
State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, PR China College of Life and Environmental Sciences, Shanghai Normal University, Shanghai 200234, PR China
a r t i c l e
i n f o
Article history: Received 15 March 2011 Received in revised form 23 December 2011 Accepted 2 January 2012 Keywords: Genomic island (GI) GMP synthase gene (guaA) Integration hotspot P4 integrase AlpA
a b s t r a c t tRNA, tmRNA and some small RNA genes are recognized as general integration hotspots of genomic islands (GIs). The GMP synthase gene (guaA) has been firstly identified as one insertion hotspot of foreign DNA fragments. Thirty four islands integrated into the guaA genes were identified in the 987 completely sequenced archaeal and bacterial genomes. These alien islands were widely distributed within the host strains belonging to Proteobacteria, Firmicutes and Actinobacteria. The analysis of structural characteristics of these GIs is important for further determination of the island mobility and transference into suitable hosts. The putative functional integrases encoded by guaA-associated islands were mainly composed of phage P4 integrases, and followed by phage PhiLC3 integrases. Interestingly, island-encoding AlpA is close to P4 integrase and is deduced to be the positive transcriptional regulatory factor of P4 integrase while the XRE protein is close to PhiLC3 integrase and may be the negative transcriptional regulatory factor of PhiLC3 integrase. An 8-bp consensus sequence (5 -GAGTGGGA-3 ) within the direct repeats of these GIs is the cutting site of the P4 integrases encoding by guaA-associated islands, in which the third nucleotide (G) is the key site. The large-scale investigation of the content of GMP synthase gene hotspots may be useful to find important functional islands within members of many key bacterial species and to transfer useful islands into more suitable hosts. © 2012 Elsevier Ltd. All rights reserved.
1. Introduction Horizontally acquired genomic islands (GIs) have been regarded as playing critical roles in prokaryotic evolution. A number of island-encoding strain-specific genes potentially allow host strains to exploit entirely new niches through the acquisition of new virulence factors, metabolic pathways, antimicrobial resistance mechanisms or cell signaling systems (Juhas et al., 2009). The GIs often have some basic structural features, such as sizes ranging from 10 to 200 kb, direct repeats (DRs) and mobile elements (like integrases, transposases or recombinases), unstable and spontaneous excision from the genome, abnormal GC contents, dinucleotide bias, and genetic code utilization. The GIs frequently recognized insertion ‘hotspots’, such as tRNA/tmRNA gene and small non-coding RNA gene (Mantri and Williams, 2004; Sridhar and Rafi, 2007). 43% of all the integration sites are related to tRNA genes with moderate-copies in Escherichia coli (Mantri and
∗ Corresponding author at: State Key Laboratory of Microbial Metabolism, School of Life Sciences and Biotechnology, Shanghai Jiao Tong University, Shanghai 200240, PR China. E-mail address:
[email protected] (X. Zhang). 1476-9271/$ – see front matter © 2012 Elsevier Ltd. All rights reserved. doi:10.1016/j.compbiolchem.2012.01.001
Williams, 2004). It has been known that the island-coding integrase catalyzes recombination between a site attP in the circular pre-island and the target site attB in the chromosome. Most integrases specify an attB that frequently lies within the 3 -end of a tRNA/tmRNA gene (Reiter et al., 1989; Williams, 2002). Recently the 3 -end of guanosine monophosphate (GMP) synthetase gene (guaA) has been reported as insertion sites of foreign DNA fragments. A 16-kb guaA gene-associated island SaPIbov of Staphylococcus aureus RF122 encodes toxic shock syndrome toxin and bovine variant of staphylococcal enterotoxin C (Fitzgerald et al., 2001). The pathogenicity island SaPlbov2 of S. aureus V329, which codes for the biofilm-associated proteins, is able to be excised to form a circular element and integrate into the 3 -end of a guaA gene (Ubeda et al., 2003). We newly identified a Pseudomonas aeruginosa PA14 island adjacent to guaA gene site, which is involved in uptake of mercury ions (Song and Zhang, 2009). Similarly, some prophages are site-specifically integrated to the 3 -ends of guaA genes, such as Bacteriophage V10 (Perry et al., 2009), Salmonella enterica serovar Anatum (Group E1) phage 15 (Kropinski et al., 2007), and a prophage of E. coli APEC O1 (Johnson et al., 2007). Interestingly, a CP4 integrase gene has been found frequently in the downstream of guaA in several bacteria, including Brucella suis, Shigella flexneri, Rhodopseudomonas palustris, Listeria monocytogenes and
L. Song et al. / Computational Biology and Chemistry 36 (2012) 62–70
Streptococcus pneumoniae (Lavigne et al., 2005). We thus suppose that guaA gene loci may be served as one of integration hotspots of large foreign DNA fragments, site-specifically recognized by functional integrases that promote island excision and insertion. The guaA gene encodes GMP synthase that converts xanthosine monophosphate to guanosine monophosphate, involved in the pathway for purine nucleotide synthesis. The guaA gene generally presents as a single-copy housekeeping gene in the bacterial chromosome backbone (Zimmer et al., 2002). Since inactivation of GMP synthase may be fatal for the organism (Zimmer et al., 2002), the island inserted into guaA gene should be bounded by the 3 -end sequences of guaA gene as DRs to keep the intact guaA gene. Therefore it is helpful to identify the GIs associated to guaA genes by searching the 3 -end sequences of guaA genes among the complete sequences of bacterial genomes. Many recently developed silico tools, web-servers and databases aim to exploit the expanding flow of high-throughput genome sequencing data to reveal nature of GIs (Langille et al., 2010). However, current freely available tools are often focused on GIs inserted into the tRNA/tmRNA gene sites but not within the protein-coding regions. In this study we employed a simple strategy to identify GIs adjacent to guaA genes. After examination of 987 completely sequenced bacterial genomes currently available in GenBank, 34
63
islands of 31 species were identified. We examined the consensus sequences among the DRs and the island-harboring integrase genes. Analysis of structural characteristics of such identified GIs associated with guaA genes may be helpful for studying their functions, regulation and mobility. 2. Materials and methods A BLASTN-based strategy had been employed to determine whether guaA genes of sequenced archaeal and bacterial strains are disrupted by potential GIs (Fig. 1). In brief, the 15-bp fragment of the 3 -end segments of 987 guaA genes was extracted from GenBank. Each 15-bp fragment obtained above was then used as a query in a similarity search against the complete sequence of the genome that contains this guaA gene using BLASTN and default NCBI BLASTN parameters except for MegaBlast. 987 complete archaeal and bacterial genomes available in April 2010 at GenBank were examined. The chromosome fragments were selected as ‘candidate anomalous region’ if it was less than 500 kb in size and bounded by the DRs that are associated with 3 -guaA gene segments. Finally, these candidate regions that possessed the typical island features (including coding for integrase, transposase or recombinase, unusual G + C content and dinucleotide distribution bias) were
Extract 15-bp fragments at the 3’-end of all the 987 guaA genes in GenBank.
Align each 15-bp fragment with the complete sequence of the genome that contains this 15-bp fragment.
Regions are bounded by DR sequences associated with guaA gene and < 500 kb
No
Yes Candidate anomalous regions
Codes for integrase homologue
No Over
Yes
Possesses unusual G+C content and dinucleotide distribution bias
No
Yes guaA-associated GIs Fig. 1. Flowchart depicting the strategy used to identify the GIs insertion into the 3 -end of the GMP synthase gene (guaA).
64
L. Song et al. / Computational Biology and Chemistry 36 (2012) 62–70
identified as guaA-associated islands. The G + C content and dinucleotide distribution bias were calculated by using the ␦-WEB tool (van Passel et al., 2005). In addition, some genomes carrying a guaAintegrate island were compared with the sequenced and closely related genomes belonging to the same species by using WebACT (http://www.webact.org/WebACT/generate) (Abbott et al., 2007). The 23 strains which contain a guaA-integrate GI coding for the P4 integrase have 99 P4 integrases. The phylogenetic tree of all 99 P4 integrases was built using the software MEGA 4.0.2 with neighbor-joining method and the acquiescent parameters for the genetic distance calculation (Kumar et al., 2008).
3. Results and discussion 3.1. GMP synthase genes are hotspot for integration of alien genomic islands After examination of 987 completely sequenced archaeal and bacterial genomes, Thirty four GIs with integration sites at guaA genes were determined (34/987 = 3.44%) (Table 1). 31 integrases which are above 300 amino acid residues in size may be intact and functional, and 3 remnant breaking–rejoining type integrases may be non-functional (Table 2). About 44.1% (15/34) inserted DNA fragment exhibited more than 3% G + C content disparity compared to their corresponding genomes. When the disparity was lower than 3%, the calculated percentage of genomic fragments with lower genomic dissimilarity values ␦* (the average dinucleotide relative abundance difference) was over 90%, indicating that the GIs obtained were also very reliable. 31 GIs are firstly identified except for RF122GIguaA (Fitzgerald et al., 2001), APECO1GIguaA (Johnson et al., 2007), and PA14GIguaA (Song and Zhang, 2009). In addition, eleven genome sequences carrying guaAintegrate islands were aligned against the corresponding complete sequences of their closely related genomes belonging to the same species (Supplemental Fig. S1): Bacillus thuringiensis str. Al Hakam, Desulfitobacterium hafniense DCB-2, E. coli APEC O1, E. coli IAI39, E. coli O127:H6 str. E2348/69, Lactobacillus casei ATCC 334, P. aeruginosa UCBPP-PA14, S. flexneri 2a str. 2457T, S. aureus RF122, S. enterica serovar Paratyphi C strain RKS4594 and S. enterica serovar Choleraesuis str. SC-B67. 9 GIs are integrated into the 3 -end of guaA gene in their closely related genomes, their closely related genomes have not a foreign fragment at 3 -end of their guaA gene. PA14GIguaA and ATCC334GIguaA are also integrated into the 3 -end of guaA gene in their closely related genomes, but their closely related genomes have a foreign fragment at 3 -end of their guaA gene. In 2004, the integration of the tRNA/tmRNA gene sites in 106 completely sequenced genomes was firstly examined by Mantri and Williams (2004). 129 islands integrated into tRNA genes were identified and archived in the Islander database (Mantri and Williams, 2004). If there are 80 tRNA genes in a bacterial genome, the probability of GI insertions will be about 1.5% (129/106/80); namely, at least one tRNA site is served as the insertion site, which is lower than 3.4% (34/987/1) for integration of guaA gene site. In General, a bacterial genome has a single-copy of guaA gene. Therefore the guaA gene site is one of the integration hotspots of aliened DNA fragments.
Fig. 2. Inferred phylogenetic relationship of all 99 P4 integrases of the strains that contain a GI associated with guaA and coding for P4 integrase based on amino acid sequences. The tree was constructed using a MEGA 4 alignment and the neighborjoining method. Bootstrap percentage values (500 replicates) are shown at the nodes. The scale at the base represents inferred evolutionary distance. 23 P4 integrases of GIs associated with guaA gene were highlighted by the gene loci (GMP).
Table 1 GMP synthase gene-associated genomic islands identified in 34 bacterial genomes. Strain
Coordinat-es
Size (kb)
GMP synthase gene
Integrase gene
GC content (%) [genome G + C content (%)]
ı* × 103 a
JS42GIguaA AAC00-1GIguaA HakamGIguaA EbN1GIguaA LB400GIguaA novyiNTGIguaA RCBGIguaA DCB-2GIguaA TPSYGIguaA APECO1GIguaA IAI39GIguaA E2348/69GIguaA ATCC33656GIguaA CGDNIH1GIguaA CH34GIguaA ATCC334GIguaA C91GIguaA OM5GIguaA HLK1GIguaA PA14GIguaA ympGIguaA ATCC11170GIguaA 2457TGIguaA RF122GIguaA JCSC1435GIguaA Py2GIguaA 306GIguaA Ex25GIguaA Ech1591GIguaA DSM2243GIguaA RKS4594GIguaA SC-B67GIguaA B510GIguaA DSM20476GIguaA
Acidovorax sp. JS42 Acidovorax avenae subsp. citrulli AAC00-1 Bacillus thuringiensis Al Hakam Aromatoleum aromaticum EbN1 Burkholderia xenovorans LB400 Clostridium novyi NT Dechloromonas aromatica RCB Desulfitobacterium hafniense DCB-2 Acidovorax ebreus TPSY Escherichia coli APEC O1 Escherichia coli IAI39 Escherichia coli O127:H6 str. E2348/69 Eubacterium rectale ATCC 33656 Granulibacter bethesdensis CGDNIH1 Cupriavidus metallidurans CH34 Lactobacillus casei ATCC 334 Nitrosomonas eutropha C91 Oligotropha carboxidovorans OM5 Phenylobacterium zucineum HLK1 Pseudomonas aeruginosa UCBPP-PA14 Pseudomonas mendocina ymp Rhodospirillum rubrum ATCC 11170 Shigella flexneri 2a 2457 T Staphylococcus aureus RF122 Staphylococcus haemolyticus JCSC1435 Xanthobacter autotrophicus Py2 Xanthomonas axonopodis pv. citri 306 Vibrio sp. Ex25 Dickeya zeae Ech1591 Eggerthella lenta DSM 2243 Salmonella enterica serovar Paratyphi C RKS4594 Salmonella enterica serovar Choleraesuis SC-B67 Azospirillum sp. B510 Slackia heliotrinireducens DSM 20476
2689174–2711100 3706960–3723069 276434–283405 3904095–3936978 2910483–3018717 2132263–2139838 2377183–2518656 1536170–1558619 1324895–1395675 2739318–2779100 2751281–2793176 2812234–2852546 567379–585806 1288899–1314739 1585274–1684371 1903845–1950831 2377779–2389965 2342146–2404461 1607476–1728394 1300805–1327539 3776136–3844978 411023–429000 2598541–2614013 388732–404695 2580925–2593528 470761–487187 2555103–2675457 3019105–3031622 1301463–1330416 1770854–1824926 1211286–1214588 2640258–2643560 1285264–1339811 2099834–2152220
21.9 16.1 7.0 32.9 108.2 7.6 141.5 22.4 70.8 39.8 41.9 40.3 18.4 25.8 99.1 47.0 12.2 62.3 120.9 26.7 68.8 18.0 15.5 16.0 12.6 16.4 120.3 12.5 28.9 54.1 3.3 3.3 54.5 52.4
Ajs 2547 Aave 3367 BALH 0254 ebA6645 Bxe A1708 NT01CX 0459 Daro 2337 Dhaf 1438 Dtpsy 1261 APECO1 4019 ECIAI39 2705 E2348C 2782 EUBREC 0620 GbCGDNIH1 1150 Rmet 1463 LSEI 1979 Neut 2251 OCAR 6467 PHZ c1417 PA14 15340 Pmen 3485 Rru A0355 S2725 SAB0341 SH2582 Xaut 0445 XAC2287 VEA 004340 Dd1591 1128 Elen 1531 SPC 1145 SC2508 AZL 011710 Shel 19190
Ajs 2546 Aave 3366 BALH 0256 ebA6644 Bxe A1709 NT01CX 0453 Daro 2336 Dhaf 1439 Dtpsy 1262 APECO1 4020 ECIAI39 2704 E2348C 2781 EUBREC 0624 GbCGDNIH1 1151 Rmet 1465 LSEI 1978 Neut 2252 OCAR 6465 PHZ c1418 PA14 15350 Pmen 3484 Rru A0354 S2723 SAB0342c SH2581 Xaut 0444 XAC2286 VEA 004339 Dd1591 1130 Elen 1527 SPC 1146 SC2507 AZL 011700 Shel 19160
61.5 [66.2] 62.1 [68.5] 32.5 [35.4] 64.7 [65.1] 60.3 [62.8] 24.0 [28.9] 60.7 [59.2] 38.2 [47.5] 63.3 [66.8] 48.9 [50.5] 49.2 [50.6] 50.2 [50.6] 33.5 [41.5] 58.9 [59.1] 64.0 [63.8] 44.6 [46.6] 57.3 [48.5] 60.7 [62.4] 69.9 [71.3] 61.0 [66.3] 62.5 [64.7] 62.2 [65.4] 57.5 [50.9] 31.4 [32.8] 31.3 [32.8] 61.8 [67.5] 65.3 [64.8] 40.7 [44.9] 55.7 [54.5] 62.9 [64.2] 46.7 [52.2] 46.7 [52.2] 67.2 [67.8] 63.3 [60.2]
87.5 53.9 67.0 69.9 57.0 55.3 25.2 48.1 70.9 50.9 52.3 59.7 51.2 82.5 30.5 56.5 50.2 23.8 22.3 46.22 31.7 39.0 51.4 46.6 44.8 74.3 60.6 90.3 42.6 100.3 94.9 95.2 29.2 83.7
Genome fragments with lower ı* (%)b 98.5 88.5 89.6 98.5 97.8 72.9 96.8 94.5 100 98.5 97.5 97.6 81.3 99.0 97.4 100 89.0 76.7 97.0 95.9 97.3 87.2 92.6 83.6 78.9 96.9 100 99.6 91.0 100 92.4 93.3 91.7 100
L. Song et al. / Computational Biology and Chemistry 36 (2012) 62–70
Identifier
The ı* value was calculated with the␦-WEB tool (http://deltarho.amc.nl) (van Passel et al., 2005). The high ı* values of these fragments indicate a likely heterologous origin. a The value ı* denotes the dinucleotide relative abundance difference between the island fragment and the complete genome. b The percentage distribution of ı* is plotted using the␦-WEB tool with random host genomic fragments of equal length as input sequences (van Passel et al., 2005).
65
66
L. Song et al. / Computational Biology and Chemistry 36 (2012) 62–70
Table 2 Analysis of the flanking direct repeats, integrase types and synergistic genes with the integrase genes. Island
Putative function
attL/attR (DRs)a
Integrase type
Putative island-coding synergic gene with integrase gene
JS42GIguaA AAC00-1GIguaA
tcactcccactcgat/tcactcccactcgat tcactcccactccatc/tcactcccactc gatc
P4 integrase P4 integrase
SMC (Ajs 2527) SMC (Aave 3357)
EbN1GIguaA LB400GIguaA E2348/69GIguaA CGDNIH1GIguaA
– Type I restriction-modification system – Multidrug resistance Prophage related Antibiotic-resistance
P4 integrase P4 integrase P4 integrase P4 integrase
SMC (ebA6581) SMC (Bxe A1800) SMC (E2348C 2742) SMC (GbCGDNIH1 1178)
RCBGIguaA
Heavy metal resistance
aatcactcccactcgat/aatca ttcccactcgat tcactcccactcgattgtc/tca t tcccactcgattgtc atcattcccactcaat/atcattcccactcaat gaacaat tgagtgggaatgattt/ gaacaatcgagtgggaatgattt attattcccactcgatcgt/attattcccactcgatcgt
P4 integrase
TPSYGIguaA
Antibiotic-resistance
atcgagtgggagtga/atcgagtgggagtga
P4 integrase
B510GIguaA
–
tcattcccactcgat/tcattcccactcgat
P4 integrase
CH34GIguaA
Carbon fixation
ga cgatcgagtgggagtga/gatgatcgagtgggagtga
P4 integrase
306GIguaA
–
tcattcccactcgat/tca ctcccactcgat
P4 integrase
OM5GIguaA
Heavy metal resistance
tattcccactcgatcgt/tattcccactcgatcgt
P4 integrase
ympGIguaA C91GIguaA
Amino acid metabolism related –
catcattcccactcgat/catca ctcccactcgat ac catcgagtgggaatgat/acaatcgagtgggaatgat
P4 integrase P4 integrase
2457TGIguaA
Arsenate biodegradation
aatcattcccactctat/aatcattcccactc aat
P4 integrase
Ech1591GIguaA
–
atcgagtgggaat/atcgagtgggaat
P4 integrase
HLK1GIguaA
Arsenate degradation, Heavy metal resistance, Antibiotic-resistance Hg2+ uptake Prophage related Prophage related – Clustered regularly interspaced short palindromic repeats (CRISPR) Type III restriction-modification system –
atcgagtgggaatgatc/atcgagtgggaatgatc
P4 integrase
ParB (Daro 2219) ParM (Daro 2217) ParA (Dtpsy 1289) ParB (Dtpsy 1277) Relaxase (Dtpsy 1293) Soj (ParA) (AZL 011510) ParA (AZL 011530) ParB (AZL 011420) ParA (Rmet 1486) ParB (Rmet 1478) Relaxases (Rmet 1489) Soj(ParA) (XAC2205) ParB (XAC2206) ParB (OCAR 6463) Relaxases (OCAR 6443) ParA (Pmen 3433) Plasmid stability protein (Neut 2256) Resolvase (Neut 2265) Plasmid stability protein (S2719) Resolvase (S2707) Plasmid stability protein (Dd1591 1149) Relaxase (PHZ c1508)
atcgagtgggagtgat/atcgagtgggagtgat aatcattcccactcaat/aatcattcccactcaat aatcattcccactcaat/aatcattcccactcaat aattattcccactcga/aattattcccactcga attattcccactcgat/at cattcccactcgat
P4 integrase P4 integrase P4 integrase P4 integrase P4 integrase
Resolvase (PA14 15430) – – – –
tcattcccactcgat/tca ctcccactcgat
P4 integrase
–
attgagtgggaatagta/attgagtgggaatagta
phiLC3 integrase
Type I restriction-modification system
actattcccactcgat/actattcccactcgat
phiLC3 integrase
DSM20476GIguaA Type I restriction-modification system, Multidrug resistance
attattcccactcgat/atta ctcccactcgat
phiLC3 integrase
ATCC33656GIguaA – Pathogenicity island RF122GIguaA
phiLC3 integrase phiLC3 integrase
FtsK/SpoIIIE family protein (BALH 0259) Soj(ParA) (Elen 1524) ParB (Elen 1523) Relaxase (Elen 1521) Cox protein (Elen 1496) Excisionase (Shel 18850) Relaxase (Shel 19030) Resolvase (Shel 19120) SMC (EUBREC 0642) –
guaA
PA14GI APECO1GIguaA IAI39GIguaA Ex25GIguaA ATCC11170GIguaA Py2GIguaA HakamGIguaA DSM2243GIguaA
ATCC334GIguaA
Prophage related
RKS4594GIguaA
–
attgagtttgaata/attgagtttgaata attgagtgggaataattatatatagcaaatgataggc tggagttaccgtaattacgcggtttccagcct ttttt/attgagtgggaataattatatatagcaaatgataggct ggagttaccgtaattacgcggtttccagcctttttt taccggtcattcccactcaatcgttgc/taccggtcatt cccactcaatcgttgc attgagtgggaat/attgagtgggaat
SC-B67GIguaA
–
attcccactcaat/attcccactcaat
novyiNTGI
–
DCB-2GIguaA
Prophage related
JCSC1435GIguaA
–
aaactaatattctatttttattcccactcaa/aaacta atattctatttttattcccactcaa cctgggaccat cgagtgggagtaag/cctggga ccattgagtgggagtaag ttctattcccactcaat/ttctattcccactc tat
guaA
DNA breaking–rejoining enzyme DNA breaking–rejoining enzyme DNAbreaking–rejoining enzyme DNA breaking–rejoining enzyme DNA breaking–rejoining enzyme DNA breaking–rejoining enzyme
Soj (ParA) (LSEI 1975) Resolvase (LSEI 1951) SecA (SPC 1147) SecA (SC2506) – – –
a The 3 -end fragment of GMP synthase genes are shown in bold. The variable nucleotides between attL and attR sequences are highlighted in grey background. The nucleotides underlined denote the eight consensus sequences (see text for more details).
L. Song et al. / Computational Biology and Chemistry 36 (2012) 62–70
3.2. The consensus sequences in the direct repeats targeted by integrases coded in the guaA-associated islands More than 50% integration sites of known GIs are found to be the tRNA/tmRNA gene (Mantri and Williams, 2004). The action sites of integrases are often in the TC ring, the anti-codon loop and the asymmetric 3 -ends of tRNA sequences (Williams, 2002). The 3 -end segments of guaA genes are conserved among the 34 identified islands in this study, suggesting that the action sites of these island-coding integrases are similar to those targeting the asymmetric 3 -ends of tRNA. The sequence alignment of the DRs in the 34 GIs indicates that an 8-bp consensus sequence (5 -GAGTGGGA-3 ) presents in all the flanking sequences except for Eubacterium rectale ATCC 33656 (Table 2). The action sites of P4 integrases are often at the downstream portion of the TC ring of tRNALeu (Campbell, 2003). Interestingly, this 8-bp consensus sequence (5 -GAGTGGGA3 ) is the downstream part of the TC ring of some tRNALeu genes. It was reported that Bacteriophage 3626 was integrated into 3 end of guaA in Clostridium perfringens, and the 8-bp consensus sequence was located within DRs (GAGTGGGAATAA) (Zimmer et al., 2002). The 8-bp fragment (5 -CCGCCAGC-3 ) was identified as the active site of CP4-57 integrase (Wang et al., 2009). The E. coli 536 island PAI V536 in the absence of intPAI V was able to be excised by ‘cross-talk’ action of the functional integrase encoded by intPAI II in the other island PAI II536 (Hochhut et al., 2006). The identical sequence (5 -CGAGTCCGG-3 ) targeted by intPAI II presents in the DRs of the both islands (Wilde et al., 2008). This showed that the cutting sites of P4 integrase can be 8–9 bp within DRs. Thus, the conserved sequence (5 -GAGTGGGA-3 ) may be the cutting site of P4 integrases in GIs associated with guaA. Out of the 987 guaA genes under study, 57% (563/987) have this 8-bp consensus sequence at their 3 -ends. Furthermore, 5.9% (33/563) of these guaA genes with the 8-bp consensus sequence were disrupted by aliened insertions. Interestingly, among the other 424 remnant guaA genes, 43% (183/424) had near-identical fragment 5 -GAATGGGA-3 of the 8bp consensus sequence, only one variable nucleotide (underlined). If the probability of island insertion was 5.9%, the same as that of 563 genomes having 8-bp consensus sequence (5 -GAGTGGGA-3 ), there would be more than 10 (183 × 5.9%) islands in 183 genomes. Unfortunately, none were found in these 183 genomes, indicating that the third nucleotide underlined (G) in the 8-bp consensus sequence (5 -GAGTGGGA-3 ) was the key action site of integrases coded by guaA-associated islands. The integrases in the 34 identified guaA-associated islands can be divided into three categories (Table 2): 23 P4 integrases, 5 phiLC3 integrases and 6 DNA breaking–rejoining enzymes. Twenty three P4 integrases encoded by the islands associated with guaA exhibited more than 61% identities among each other. The phylogenetic tree for all 99 P4 integrases that exist in 23 strains containing the GIs flanked by 3 -end of guaA and coding for a P4 integrase, and were retrieved from GenBank, is shown in Fig. 2. The 23 P4 integrases coded in the guaA-associated island GIs were clustered into the same lineage. This demonstrated the reorganization site of these P4 integrases should be the same 8-bp consensus sequence (5 -GAGTGGGA-3 ). In addition, organizational map of the guaA gene integration sites of the 34 islands are shown in Fig. 3 . Of 73% (25/34) islands, the guaB (IMP dehydrogenase) genes were close to the guaA gene, including 20 islands coding for P4 integrases, one island coding for phiLC3 integrase, and 4 island coding for DNA breaking–rejoining enzymes. 3.3. AlpA is the positive regulatory factor of P4 integrases Among the 23 guaA-associated islands coding for P4 integrases, 16 encoded at least one AlpA-type transcriptional regulatory
67
factor. The AlpA genes were located near the P4 integrase genes, most of the distance between the AlpA gene and the P4 integrase <4 kb (Fig. 3A). Five islands (AAC00-1GIguaA , TPSYGIguaA , CH34GIguaA , Ech1591GIguaA , and ympGIguaA ) encoded two AlpAtype transcriptional regulatory factors with reverse transcription directions in the downstream of P4 integrase genes. Another six islands (JS42GIguaA , APEC01GIguaA , E2348/69GIguaA , IAI39GIguaA , Py2GIguaA and B510GIguaA ) harbored one AlpA gene that has reverse transcription direction with P4 integrase and is located in the downstream of P4 integrase gene. The island 306GIguaA contained one AlpA gene that has reverse transcription direction with P4 integrase and is located at the opposite end of the islands. Four islands (C91GIguaA , 2457TGIguaA , OM5GIguaA and ATCC11170GIguaA ) coded for such a factor that has same transcription direction with P4 integrase and is located at the downstream of P4 integrase. It has been reported that AlpA-type regulators are positive regulatory factors of integrase coded by slpA in the P4-like prophage of E. coli K-12, promoting the ring out of the prophage (Kirby et al., 1994). Overexpression AlpA, a transcriptional regulatory factor of integrase IntA of CP4-57 prophage, promoted the CP4-57 excision from the chromosome in E. coli (Wang et al., 2009). Therefore it can be inferred that AlpA is the positive regulatory factor of P4 integrases coded in the guaA-associated islands. Among the five GIs coding for phiLC3 integrases, RF122GIguaA harbored two XRE family transcriptional regulatory factor genes with reverse transcription directions at the downstream of the integrase; another three islands (DSM2243GIguaA , DSM20476GIguaA and ATCC33656GIguaA ) carried an XRE family transcriptional regulatory factor genes that have reverse transcription directions with the integrases between the boundary guaA gene and the integrase gene (see Fig. 3B). The XRE family protein includes Cro/cI-type DNAbinding domain that is negative transcriptional regulator (Kulinska et al., 2008). Thus, one XRE family protein was inferred as negative regulator for phiLC3 integrases. Of the six GIs encoding the DNA breaking–rejoining enzyme, one contains two XRE transcriptional regulatory factor genes with reverse transcription directions, and another contains one XRE protein that has the same transcription direction with the integrase gene (see Fig. 3C). It is such difficult to predict transcriptional factors of these integrases. 3.4. Analysis of putative island-coding synergic proteins with integrases There are usually two functions of integrase coded in the alien island: the deletion of island from the chromosome within the DRs to form ring intermediate, and the re-integration of ringed out island back into the chromosomes. Internal gene function analysis were shown that, of the 34 GIs, 25 encoded one or more enzymes related to genome segregation and stability, such as chromosome partitioning protein SOJ (ParA homologous protein), plasmid partitioning proteins ParA-ParB, plasmid stabilily protein SMC (structural maintenance of chromosomes), Cox protein, relaxase, resolvase, excisionase, SecA, and the FtsK/SpoIIIE protein family, among others (see Table 2). It was reported that ParB regulates the activity of ParA (Abeles et al., 1985). While SOJ maintains the ringed out GIs and protects them from degradation, it also contributes to the re-integration of the ringed out GIs back into the genomes before DNA replication, thus bringing about stable heredity of the GIs (Qiu et al., 2006). As such, SOJ works synergistically with integrases to complete the dynamic process of island ring out and re-integration into genomes. This type of protein also includes plasmid stability proteins, SMC, and the FtsK/SpoIIIE protein family. The Cox protein and excisionase (Xis), as directionality factors for site-specific recombination, could promote the deletion of GIs from the genomes and the formation of ring intermediates (Lewis
68
L. Song et al. / Computational Biology and Chemistry 36 (2012) 62–70
Fig. 3. Organizational map of the GMP synthase integration site and integrase gene harboring in the 34 identified GIs. A: P4 integrase homologues; B: PhiLC3 integrase homologues; C: homologues of DNA breaking–rejoining enzymes.
L. Song et al. / Computational Biology and Chemistry 36 (2012) 62–70
69
Fig. 3. (Continued ).
and Hatfull, 2001). It was reported that relaxases can maintain the stability of ringed out GIs (Ramsay et al., 2006), so they also function synergistically with integrases. 3.5. Analysis of the stem loop regions at the downstream of direct repeats (attL/attR) It has been reported that there is two stem loop sequences at the downstream portion of DRs in the clc element of Pseudomonas sp. strain B13, containing P4 integrases and tRNAGlyV as its integration site. When the clc element is transferred and integrated into other chromosomes, the essential integration site is 3 -end of tRNAGlyV , a stem loop sequence exists at the downstream portion of this site, indicating that the stem loop region is the essential auxiliary sequence acted upon by the integrases (van der Meer et al., 2001). It was also reported that there is a stem loop sequence at the downstream portion of guaA genes in E. coli K12, which is the transcription termination site of the factor (Tiedeman et al., 1985). Therefore, the NUCLEIC REPEATS program in the EMBO package was used to identify the stem loop sequences at the downstream portion of DRs in the 34 GIs (Supplemental Figure S2). Supplementary Figure S2A, 2B and 2 C showed that there was at least one stem loop sequence that include the inverted repeat sequence (5 -AAGCCCGC-3 , or 5 -TACCGTCA-3 , or 5 -AACCCTCGT-3 ), in the downstream portion of the DRs in the 20 GIs. If they can form ring
intermediate and transfer, these GIs that have the homologous stem loop sequence can be mutually integrated into the chromosomes that have a homologous 3 -end of guaA gene and a homologous stem loop sequence at downstream of 3 -end of guaA gene. 3.6. Distribution of guaA-associated islands with important biological functions The 34 identified guaA-associated islands were distributed across the three phyla of bacteria, including Actinobacteria, Firmicutes, and Proteobacteria. Eggerthella lenta DSM 2243 and Slackia heliotrinireducens DSM 20476 belong to Actinobacteria. Eubacterium rectale ATCC 33656, Clostridium novyi NT, D. hafniense DCB-2, L. casei ATCC 334, B. thuringiensis str. Al Hakam, S. haemolyticus JCSC1435 and S. aureus RF122 belong to Firmicutes. Among 25 remnant strains, 11 strains belong to Gammaproteobacteria, 8 belong to Betaproteobacteria, and 6 belong to Alphaproteobacteria in Proteobacteria. Among the 34 strains carrying guaA-associated islands, 15 are bacterial pathogens of plant, insect, bird, mammal and/or human, including Acidovorax avenae subsp. citrulli AAC00-1, B. thuringiensis str. Al Hakam, E. coli APEC O1, E. coli IAI39, E. coli O127:H6 str. E2348/69, Granulibacter bethesdensis CGDNIH1, P. aeruginosa UCBPP-PA14, S. flexneri 2a str. 2457T, S. aureus RF122, S. haemolyticus JCSC1435, Xanthomonas axonopodis pv. citri str. 306, Dickeya
70
L. Song et al. / Computational Biology and Chemistry 36 (2012) 62–70
zeae Ech1591, Eggerthella lenta DSM 2243, S. enterica serovar Paratyphi C strain RKS4594 and S. enterica serovar Choleraesuis str. SC-B67. Another 13 strains play a role in bioremediation or adaptation by degenerating hetero-source substances, including Acidovorax sp. JS42, Aromatoleum aromaticum EbN1, Burkholderia xenovorans LB400, Dechloromonas aromatica RCB, D. hafniense DCB2, Acidovorax ebreus TPSY, Nitrosomonas eutropha C91, Oligotropha carboxidovorans OM5, Pseudomonas mendocina ymp, Xanthobacter autotrophicus Py2, Rhodospirillum rubrum ATCC 11170, Azospirillum sp. B510 and Slackia heliotrinireducens DSM 20476. Lactobacillus casei ATCC 334 is an intestinal bacterium that affects immunity. We found that some of 34 GIs have important biological functions (Table 2). For example, HLK1GIguaA and 2457TGIguaA were involved into the biodegradation of arsenate. The island CH34GIguaA encoded a series of enzymes of carbon fixation. The islands (DSM2243GIguaA , DSM20476GIguaA and Py2GIguaA ) coded for a restriction-modification system. 4. Conclusions Thirty four GIs integrated into the guaA genes were identified in 987 completely sequenced archaeal and bacterial genomes. The guaA genes were proven to be an important integration hotspot of foreign DNA fragment. The 8-bp consensus sequence (5 -GAGTGGGA-3 ) was identified as the action site of P4 integrases with the third base (G) as the key site. The AlpA-type transcriptional regulatory factors were encoded close to P4 integrases. Most of GIs coded for the proteins related to genome segregation and stability involved in the dynamic balance of GI excision and re-integration into chromosomes. Some stem loop sequences exist in the downstream of the DRs in the GIs and are essential auxiliary sequences of the internal integrase. The structural characteristic analysis of islands with integration sites at guaA genes may be helpful to determine biological functions, the mobility and the transference of such GIs. Acknowledgements We thank Dr. Hong-Yu Ou for valuable comments and critical revisions on this manuscript. This research was supported by NSFC (no. 30821005 and 30870075), 973 Programs of China (no. 2009CB118906), and Shanghai Leading Academic Discipline Project (B203). Appendix A. Supplementary data Supplementary data associated with this article can be found, in the online version, at doi:10.1016/j.compbiolchem.2012.01.001. References Abeles, A.L., Friedman, S.A., Austin, S.J., 1985. Partition of unit-copy miniplasmids to daughter cells III. The DNA sequence and functional organization of the P1 partition region. J. Mol. Biol. 185 (2), 261–272. Abbott, J.C., Aanensen, D.M., Bentley, S.D., 2007. WebACT: an online genome comparison suite. Methods Mol. Biol. 395, 57–74. Campbell, A., 2003. Prophage insertion sites. Res. Microbiol. 154 (4), 277–282. Fitzgerald, J.R., Monday, S.R., Foster, T.J., Bohach, G.A., Hartigan, P.J., Meaney, W.J., Smyth, C.J., 2001. Characterization of a putative pathogenicity island from bovine Staphylococcus aureus encoding multiple superantigens. J. Bacteriol. 183 (1), 63–70. Hochhut, B., Wilde, C., Balling, G., Middendorf, B., Dobrindt, U., Brzuszkiewicz, E., Gottschalk, G., Carniel, E., Hacker, J., 2006. Role of pathogenicity
island-associated integrases in the genome plasticity of uropathogenic Escherichia coli strain 536. Mol. Microbiol. 61 (3), 584–595. Johnson, T.J., Kariyawasam, S., Wannemuehler, Y., Mangiamele, P., Johnson, S.J., Doetkott, C., Skyberg, J.A., Lynne, A.M., Johnson, J.R., Nolan, L.K., 2007. The genome sequence of avian pathogenic Escherichia coli strain O1:K1:H7 shares strong similarities with human extraintestinal pathogenic E coli genomes. J. Bacteriol. 189 (8), 3228–3236. Juhas, M., van der Meer, J.R., Gaillard, M., Harding, R.M., Hood, D.W., Crook, D.W., 2009. Genomic islands: tools of bacterial horizontal gene transfer and evolution. FEMS Microbiol. Rev. 33 (2), 376–393. Kirby, J.E., Trempy, J.E., Gottesman, S., 1994. Excision of a P4-like cryptic prophage leads to Alp protease expression in Escherichia coli. J. Bacteriol. 176 (7), 2068–2081. Kropinski, A.M., Kovalyova, I.V., Billington, S.J., Patrick, A.N., Butts, B.D., Guichard, J.A., Pitcher, T.J., Guthrie, C.C., Sydlaske, A.D., Barnhill, L.M., Havens, K.A., Day, K.R., Falk, D.R., McConnell, M.R., 2007. The genome of epsilon15, a serotypeconverting, Group E1 Salmonella enterica-specific bacteriophage. Virology 369 (2), 234–244. Kulinska, A., Czeredys, M., Hayes, F., Jagura-Burdzy, G., 2008. Genomic and functional characterization of the modular broad-host-range RA3 plasmid, the archetype of the IncU group. Appl. Environ. Microbiol. 74 (13), 4119–4132. Kumar, S., Nei, M., Dudley, J., Tamura, K., 2008. MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Brief. Bioinform. 9 (4), 299–306. Langille, M.G., Hsiao, W.W., Brinkman, F.S., 2010. Detecting genomic islands using bioinformatics approaches. Nat. Rev. Microbiol. 8 (5), 373–382, Review. Lavigne, J.P., Vergunst, A.C., Bourg, G., O’Callaghan, D., 2005. The IncP island in the genome of Brucella suis 1330 was acquired by site-specific integration. Infect. Immun. 73 (11), 7779–7783. Lewis, J.A., Hatfull, G.F., 2001. Control of directionality in integrase-mediated recombination: examination of recombination directionality factors (RDFs) including Xis and Cox proteins. Nucleic Acids Res. 29 (11), 2205–2216. Mantri, Y., Williams, K.P., 2004. Islander: a database of integrative islands in prokaryotic genomes, the associated integrases and their DNA site specificities. Nucleic Acids Res. 32, D55–D58. Perry, L.L., SanMiguel, P., Minocha, U., Terekhov, A.I., Shroyer, M.L., Farris, L.A., Bright, N., Reuhs, B.L., Applegate, B.M., 2009. Sequence analysis of Escherichia coli O157:H7 bacteriophage PhiV10 and identification of a phage-encoded immunity protein that modifies the O157 antigen. FEMS Microbiol. Lett. 292 (2), 182–186. Qiu, X., Gurkar, A.U., Lory, S., 2006. Interstrain transfer of the large pathogenicity island (PAPI-1) of Pseudomonas aeruginosa. Proc. Natl. Acad. Sci. U. S. A. 103 (52), 19830–19835. Ramsay, J.P., Sullivan, J.T., Stuart, G.S., Lamont, I.L., Ronson, C.W., 2006. Excision and transfer of the Mesorhizobium loti R7A symbiosis island requires an integrase IntS, a novel recombination directionality factor RdfS, and a putative relaxase RlxS. Mol. Microbiol. 62 (3), 723–734. Reiter, W.D., Palm, P., Yeats, S., 1989. Transfer RNA genes frequently serve as integration sites for prokaryotic genetic elements. Nucleic Acids Res. 17, 1907–1914. Song, L., Zhang, X.H., 2009. Innovation for ascertaining genomic islands in PAO1 and PA14 of Pseudomonas aeruginosa. Chin. Sci. Bull. 54 (21), 3991–3999. Sridhar, J., Rafi, Z.A., 2007. Identification of novel genomic islands associated with small RNAs. In Silico Biol. 7 (6), 601–611. Tiedeman, A.A., Smith, J.M., Zalkin, H., 1985. Nucleotide sequence of the guaA gene encoding GMP synthetase of Escherichia coli K12. J. Biol. Chem. 260 (15), 8676–8679. Ubeda, Tormo, C., Cucarella, M.A., Trotonda, C., Foster, P., Lasa, T.J., Penadés, I., Sip, J.R., 2003. An integrase protein with excision, circularization and integration activities, defines a new family of mobile Staphylococcus aureus pathogenicity islands. Mol. Microbiol. 49 (1), 193–210. van der Meer, J.R., Ravatn, R., Sentchilo, V., 2001. The clc element of Pseudomonas sp. strain B13 and other mobile degradative elements employing phage-like integrases. Arch. Microbiol. 175 (2), 79–85, Review. van Passel, M.W., Luyf, A.C., van Kampen, A.H., Bart, A., van der Ende, A., 2005. ␦Web, an online tool to assess composition similarity of individual nucleic acid sequences. Bioinformatics 21 (13), 3053–3055. Wang, X., Kim, Y., Wood, T.K., 2009. Control and benefits of CP 4-57 prophage excision in Escherichia coli biofilms. ISME J. 3 (10), 1164–1179. Wilde, C., Mazel, D., Hochhut, B., Middendorf, B., Le Roux, F., Carniel, E., Dobrindt, U., Hacker, J., 2008. Delineation of the recombination sites necessary for integration of pathogenicity islands II and III into the Escherichia coli 536 chromosome. Mol. Microbiol. 68 (1), 139–151. Williams, K.P., 2002. Integration sites for genetic elements in prokaryotic tRNA and tmRNA genes: sublocation preference of integrase subfamilies. Nucleic Acids Res. 30 (4), 866–875. Zimmer, M., Scherer, S., Loessner, M.J., 2002. Genomic analysis of Clostridium perfringens bacteriophage phi3626, which integrates into guaA and possibly affects sporulation. J. Bacteriol. 184 (16), 4359–4368.