Polyphyletic evolution of type II restriction enzymes revisited: two independent sources of second-hand folds revealed

Polyphyletic evolution of type II restriction enzymes revisited: two independent sources of second-hand folds revealed

Research Update Helical insert are known to undergo apoptosis resulting in the termination of hair growth13. JmjC-domain-containing proteins often a...

53KB Sizes 0 Downloads 29 Views

Research Update

Helical insert

are known to undergo apoptosis resulting in the termination of hair growth13. JmjC-domain-containing proteins often also contain DNA- or chromatinbinding domains, or both (Fig. 2). Recent studies have focused on enzymes that regulate chromatin structure via phosphorylation, methylation and acetylation14,15. The JmjC domain represents an excellent candidate for an additional enzyme that regulates the integrity of chromatin structure.

L E J

H

G I

F

K

TRENDS in Biochemical Sciences Vol.26 No.1 January 2001

D B C

8 Cleasby, A. et al. (1996) The X-ray crystal structure of phosphomannose isomerase from Candida albicans at 1.7 Å resolution. Nat. Struct. Biol. 3, 470–479 9 Dunwell, J.M. and Gane, P.J. (1998) Microbial relatives of seed storage proteins: conservation of motifs in a functionally diverse superfamily of enzymes. J. Mol. Evol. 46, 147–154 10 Ahmad, W. et al. (1998) Alopecia universalis associated with a mutation in the human hairless gene. Science 279, 720–724 11 Cichon, S. et al. (1998) Cloning, genomic organization, alternative transcripts and mutational analysis of the gene responsible for autosomal recessive universal congenital alopecia. Hum. Mol. Genet. 7, 1671–1679 12 Crowley, C.L. et al. (2000) The NAD+ precursors, nicotinic acid and nicotinamide protect cells against apoptosis induced by a multiple stress inducer, deoxycholate. Cell Death Differ. 7, 314–326 13 Panteleyev, A.A. et al. (1999) The role of the hairless (hr) gene in the regulation of hair follicle catagen transformation. Am. J. Pathol. 155, 159–171 14 Strahl, B.D. and Allis, D. (2000) The language of covalent histone modifications. Nature 403, 41–45 15 Rea, S. et al. (2000) Regulation of chromatin structure by site-specific histone H3 methyltransferases. Nature 406, 593–599 16 Rost, B. and Sander, C. (1993) Prediction of protein secondary structure at better than 70% accuracy. J. Mol. Biol. 232, 584–599 17 Kortschak, R.D. et al. (2000) ARID proteins come in from the desert. Trends Biochem. Sci. 25, 294–299 18 Fadok, V.A. et al. (2000) A receptor for phosphatidylserine-specific clearance of apoptotic cells. Nature 405, 85–90

Acknowledgements Ti BS

Fig. 3. Rasmol representation of the catalytic domain of Candida albicans phosphomannose isomerase (PMI)8 (PDB: 1PMI); β strands are in yellow and α helices are in red. The zinc atom in PMI is shown in red, and three of its four coordinating residues are shown in blue (His113 and His285) or purple (Gln111). The PMI β strands are labelled B–L as in the original publication8. The position of the large α-helical insert domain between β strands G and H in PMI is indicated. Note that this corresponds to a 12-46-amino-acid region of low similarity in JmjC domains. The approximately 200-amino-acid insertion in spring vetch early-nodulin-promoter-binding protein 1 (ENBP1) corresponds to a region of the PMI structure that is well removed from its active site.

homologue, 3-hydroxyanthranilate-3,4dioxygenase (3-HAO) generates an intermediate (2-amino-3carboxymuconate semialdehyde) in the synthesis of NAD+ (see http://www.genome.ad.jp/dbgetbin/show_pathway?MAP00760+CO3722). This is consistent with the previously proposed apoptotic function of hairless, a JmjC-domain-containing protein. NAD+ precursors are known to protect against apoptosis12, and the hair matrix cells of mice with mutations in the hairless gene

The authors are funded by the UK Medical Research Council. We would like to acknowledge the helpful comments of an anonymous reviewer. References 1 Balciunas, D. and Ronne, H. (2000) Evidence of domain swapping within the jumonji family of transcription factors. Trends Biochem. Sci. 25, 274–276 2 Altschul, S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 3 Pickard, R.T. et al. (1999) Molecular cloning of two new human paralogs of 85-kDa cytosolic phospholipase A2. J. Biol. Chem. 274, 8823–8831 4 Song, C. et al. (1999) Molecular characterization of cytosolic phospholipase A2-β. J. Biol. Chem. 274, 17063–17067 5 Tatusov, R.L. et al. (1994) Detection of conserved segments in proteins: iterative scanning of sequence databases with alignment blocks. Proc. Natl. Acad. Sci. U. S. A. 91, 12091–12095 6 Dunwell, J.M. et al. (2000) Microbial relatives of the seed storage proteins of higher plants: Conservation of structure and diversification of function during evolution of the cupin superfamily. Microbiol. Mol. Biol. Rev. 64, 153–179 7 Gane, P.J. et al. (1998) Modeling based on the structure of vicilins predicts a histidine cluster in the active site of oxalate oxidase. J. Mol. Evol. 46, 488–493

9

P.M. Clissold C.P. Ponting* MRC Functional Genetics Unit, Dept of Human Anatomy and Genetics, University of Oxford, South Parks Road, Oxford, UK OX1 3QX. *e-mail: [email protected]

Polyphyletic evolution of type II restriction enzymes revisited: two independent sources of second-hand folds revealed Janusz M. Bujnicki, Monika Radlinska and Leszek Rychlewski Using algorithms for protein sequence analysis we predict that some of the canonical type II and type IIS restriction enzymes have an active site with a substantially different architecture and fold from the "typical" PD-(D/E)xK superfamily. These results suggest that they are related to nucleases from the HNH and GIY-YIG superfamilies.

Type II restriction endonucleases (ENases) are enzymes that recognize and cut certain short, usually palindromic, DNA sequences, using Mg2+ as a cofactor1. In general, they function as homodimers, combining all basic functions in a single domain. Subtype IIS ENases are exceptional, as they recognize asymmetric targets and cleave DNA at a fixed distance

nearby, with binding and DNA cleavage conferred independently by two distinct domains. To date, >3000 type II ENases with >200 different specificities have been isolated and characterized2. Strikingly, except for several homologous isoschizomers (i.e. enzymes that recognize identical sequences and cleave it at the same position), their sequences do not

http://tibs.trends.com 0968-0004/01/$ – see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S0968-0004(00)01690-X

10

Research Update

share significant similarity and are not amenable to standard alignment procedures3. Initially, this was interpreted as indicating multiple independent origins4. However, this hypothesis has been largely abandoned, because all the enzymes that have been structurally characterized so far, together with DNA-repair proteins Vsr and MutH and phage λ exonuclease, were shown to share a common three-dimensional fold harboring a bipartite PD–(D/E)xK pattern of moderately conserved catalytic residues5–7. Here, using algorithms for sequence comparison and structure prediction we demonstrate that many bona fide type II restriction ENases of unknown structure share homology with the so-called ‘homing’ ENases. Previously, it was thought that these two classes of site-specific enzymes were unrelated8. To facilitate comparative analysis and knowledge-based engineering of type II ENases we recently initiated a large-scale sequence comparison and structure prediction experiment (to be published elsewhere in more detail). In the course of PSI-BLAST (Ref. 9) and FFAS (Ref. 10) database searches using the sequences of NlaIII enzyme and its homolog SphI (Ref. 11) as queries, we observed an intriguing conservation of multiple Cys and charged residues (Fig. 1a) in a group of weakly similar type II ENases. Except for NlaIII (target sequence ↑CATG↓, with arrows indicating symmetrical cleavage in both strands) our search retrieved NspI, NspHI (both targeting [G/A]↑CATG↓[C/T]), SphI (G↑CATG↓C), SapI (GCTCTTCN↓NNN↑)12 and HpyI (↑CATG↓)13. Because we observed statistically insignificant, but intriguing, similarity between HpyI and the HNH superfamily of homing ENases14,15, the above-mentioned finding prompted us to re-run the iterative search. In the iterative profile-building process we included also those sequences that, despite only marginal overall sequence similarity, retained the residues known to be important for the catalytic activity of HNH and related ENases. They are the putative phosphate-binding His residue, the hallmark triad of Mg2+-binding residues and the Cys-box believed to coordinate zinc16. Using very relaxed criteria, we were able to detect similarity of all the above-mentioned enzymes with numerous HNH superfamily members, including colicins, intron maturases, http://tibs.trends.com

TRENDS in Biochemical Sciences Vol.26 No.1 January 2001

(a) 92 R.Nla III 92 R.Hpy I 106 R.Sph I R.Sap I 276 109 R.Nsp HI 109 R.Nsp I type II str. Dra_o131 726 SC8E4A.25c 75 Phi41_oL3 26 299 R.MboII R.Hpy Ao1366 306 R.Hpy 99o1442 209 type IIS str. Cox1I1a 715 Eco _McrA 199 Phi31_o4 150 Eco_YajD 19 T4 EndoVII 15 EndoVII str. ColE7 515 ColE7 str.

* * __ _ * *_ KDALKTRNCVMLGVNGKSE-----NTKIEIDHKDGRKNNH-----RVSDIKTQKLEDFQPLCKAANDVKRQIC KNHYKQQCCAMCGVRGNSE-----NTQIEVDHKDGRKDDS-----RVSDLSTQAFDDFQALCKACNDKKRQIC LKRAHGGKCAVCYGDFSE-------RELQCDHRVPFAIA--------GDKPKLVQEDFMPLCASDNRAKSWSC VWERFHRKCFNCRKDLKL-------SEVQLDHTRPLAY------------LWPIDEHATCLCAQCNNTKKDRF ILQVYSYTDVIEQRQREK-------HELVIDHRFPMER-----------WGASEPPHLTSMSDDEIKQKFQLL ILQVYSYTDVIEQRQREK-------HELVIDHRFPMER-----------WGASEPPHLTSMNDNEIKRKFQLL

2500150 2314369 1773270 2865604 4033741 4033738

AIRKRDRVCLCCGKRT----------QLQVDHIQSRYA-----------GGTHDLDNLQLLCQVCNNLKGTRE LFARDGGRCMYCGAVA-----------TSVDHVIPRSR-----------GGLHAWDNVVASCRRCNHVKADRH VRHRDKMTCVRCGAFGA--------KKYEVDHIIELTWEN-----LDDWKIALNPDNLQLLCKSCHNKKTGEY SQQAKNDYFKHHKVNKI--------KGYELDHIIPLLEAE----SVDEYRYLDNWLNLLYIDGKTHAIKSQSG SIKEKALYFEKHGVKKE--------KGFELHHIVPLCLAR----SIEEFDLLDKWENLIYIDAFNHAKISQTQ SIKEKALYFEKHGVKKE--------KGFELHHIVPLCLAR----SIEEFDLLDKWENLIYIDAFNHAKISQTQ

6460521 6900954 1294760 135240 2314537 4156064

TKSKLNECCVICKSTE----------NVQMHHIKALKSEYNKKVSGFTKVMIAMNRKQIPVCQNCHIKIHKGL ILQQSKGICENCGKNAPFYLN-DGNPYLEVHHVIPLSS-----------GGADTTDNCVALCPNCHRELHYSK WLEIKSFFCCSCAYCGMPEKKSLEIYGEHLHHEHVVPLI---------DGGAYSYGNVVPACRSCNSSKRNDD ALKIYPWVCGRCSREFVYSN----LRELTVHHIDHDHT-----------NNPEDGSNWELLCLYCHDHEHSKY FYDAQNGKCLICQRELN--------PDVQANHLDHDHE----------LNGPKAGKVRGLLCNLCNAAEGQMK

6468460 2507296 2947224 140198 1en7A

RNNNDRMKVGKAPKTRTQDVS-GKRTSFELHHEKPISQ----------NGGVYDMDNISVVTPKRHIDIHRGK 7ceiB ----------

(b) R.Eco 29kI R.Ngo MIII R.Mra I type II str. T4_EndoII P.ans_o1B A.aeg _COi4 U.ure_UvrC I- Pc l I-Tev l I-Tev l str.

43 FQGAGVYALYYTGHYSLYDEYSRINRLAYNLPIYVGKAVPAGWRQ(53)VEAALIKIY---KPLWNTVVDGFG 1237267 42 FKGAGVYAIYYIGNNPLYKQYADWNRLSYNAPIYVGKAVPKGWRQ(53)IEAALIKLH---KPLWNSCVDGFG AF297971 6 GQLLGAYILFYKGPHELYIPVTAANQQNFTQPIYIGKAVPKGDRT(49)VESVLISKF---VPAWNRHIDGFG 6460962 39 42 94 13 105 37

NKYNVIYAIAIN-----------------DELVYIGKTKNLRKRI(45)NELGTMTIA(5)EPLFIKLFNPPW KGKAGIYRWINN----------------NNGKCYVGSCVDLSKRL(45)REQYYLDLL---KPEYNILKIAGS KDKSGVYCLINK----------------INGNAYVGSSINLASRM(46)RETYYITYV---MPYYNVLKQGYS PHKPGCYLWKDQ----------------FNQIIYIGKAKDLYNRT(35)LENNLIKTH---LPKYNILLKDGS KGKSGIYLWTNK----------------INGKRYVGSRLDLRKRL(45)REQYYIDLL---NPEYNILKIAGS FMKSGIYQIKNT----------------LNNKVYVGSAKDFEKRW(44)RENFWIKELNSKINGYNIADATFG

729416 578862 2738528 6899365 6967038 5354314

Ti BS

Fig. 1. Multiple alignments of the conserved regions in (a) HNH (Refs 14,15) and (b) GIY-YIG (Refs 15,19) superfamilies, including newly classified restriction enzymes (in bold) and selected well-known representatives.The consensus has been based on many individual alignments and edited by hand.The boundaries of the regions shown correspond to the sequences spanned by the FFAS profiles. Sequences are listed with their database identifiers placed rightmost. Residue number of start of the domain is shown on the left, numbers in parentheses indicate number of amino acid residues elided for clarity. Color shading indicates conserved residues: aromatic (F, Y, H, W) in magenta; acidic (D, E) in red; amides (Q, N) in brown; basic (K, R, H) in blue, small (G, A, C), P, S andT in green; generally hydrophobic or aliphatic (V, I, L, F, M, C,Y, W), A, K and C in yellow; Cys residues potentially forming a Zn-finger in the HNH family members are shown in dark yellow and indicated by asterisks.The putative phosphate-binding His is indicated by a triangle, and the Mg2+-binding triad by open circles. Secondary-structure elements (cylinders for helices, arrows for strands) predicted for type II ENases using JPRED2 (Ref. 20) and experimentally determined for the colicin E7 andT4 ENase VII, representing the HNH superfamily, and for I-Tev I ENase representing the GIY-YIG superfamily are shown below the respective sequences.The alignments in both cases indicate conservation of both functionally important residues and structural elements crucial for integrity of the catalytic domain.

McrA enzyme, and with type IIS ENase MboII (GAAGANNNNNNN↑N↓) and its two homologs from the fully sequenced Helicobacter pylori strains J99 and 26695 (Fig. 1a). Importantly, analysis of the refined alignment indicated that all the genuine HNH superfamily members, as well as the majority of restriction enzymes detected by our method, could be connected by an uninterrupted network of ‘intermediate sequences’17 with statistically significant pairwise similarities [PSI-BLAST (Ref. 9) expectation (E ) values from the iteration before a hit was included in the profile <10−4] and profile-to-profile compatibility (FFAS10 Z scores >9) when homologs of structurally characterized proteins were used as a database of targets (data not shown). However, this required a large number of independent PSI-BLAST runs queried with the full-length sequences of known HNH superfamily members and manual editing of alignments to query FFAS, because no highly significant hits were reported when the profile-building

process was initiated using the sequences of restriction enzymes. For NspI and NspHI we were not able to confirm the statistical significance of the alignment, either from our initial observations or from that reported earlier by Xu et al.12 Nevertheless, the predicted secondary structures of NspI and NspHI agreed very well with those of other HNH ENases. Their specific targets also share the CATG core with SphI, HpyI and NlaIII. Therefore, we decided to include their sequences in the alignment, although we are aware that they could turn out to be ‘false positives’. In restriction enzymes from the HNH family only the putative phosphatebinding His residue was found to be invariant. The triad of putative metalbinding residues are conserved or substituted with side chains containing similar functional groups, except for the NspI and NspHI that have Ile at the third position, suggesting that this region might be misaligned. However, we were not able to find a more convincing alignment; it is

Research Update

possible that the ‘missing’ metal-binding side chains are anchored in a different location, as in the case of phage T4 ENase (Endo) VII (Ref. 21). Among the restriction enzymes only SapI and HpyI retained all four Cys residues, which make up a single Zn-finger-like structure in Endo VII. It is believed that multiple Zn-binding sites present at different positions in many HNH enzymes increase the stability of individual proteins rather than directly participate in catalysis14,18. In NspI and NspHI, which lack all the four conserved Cys residues, we identified two alternative Zn-finger-like clusters at the N and C terminus (data not shown). The C-terminal Cys-cluster is also present in NlaIII and SphI, whereas their EndoVIIlike cluster is only partially conserved (Fig. 1a). In the full-length sequences of the MboII enzyme and of its homologs, we could not detect any potential Zn-binding sites; it suggests that these elements emerged and vanished multiple times in the evolution of HNH ENases. A comparable strategy was used to determine homology of three isoschizomers: Eco29kI, MraI and NgoMIII (all targeting CC↑GC↓GG) and the GIY-YIG superfamily of homing ENases19 (Fig.1b). In this case the similarities calculated from the refined alignment were only marginally significant [PSI-BLAST (Ref. 9) E values >10−2], because of the bias due to the smaller size of the type II ENase cluster and the paucity of ‘evolutionary intermediates’. The match between these restriction enzymes and the genuine ‘GIY-YIG’ domain has been confirmed by the BLOCKS/LAMA server (http://blocks.fhcrc.org/blocks/) with a Z score of 5.7. In the widely studied restriction enzymes from the PD-(D/E)xK superfamily sequence similarities also eroded to the extreme; there are no truly invariant catalytic residues, which makes their remote homology virtually undetectable at the amino acid sequence level1,3,7. However, all catalytically important residues and secondary structure elements identified experimentally in the GIY-YIG ENase I-TevI (Ref. 19) are strictly conserved in Eco29kI, MraI, and NgoMIII, which strongly supports our prediction. Interestingly, all these ENases lack the C-terminal domain, which in I-TevI confers target-DNA specificity and allows this enzyme to cleave at a distance from http://tibs.trends.com

TRENDS in Biochemical Sciences Vol.26 No.1 January 2001

the target. We hypothesize that the loops in the single catalytic domain can adopt some of the ‘local’ specificity determinants characteristic for type II ENases. The relationship between the domain architecture and the resulting mode of cleavage of the GIY-YIG ENases Eco29kI, MraI, and NgoMIII and the I-TevI enzyme would therefore be analogous to that between single-domain type II and twodomain type IIS ENases from the PD-(D/E)xK superfamily. It is tempting to speculate that the type II and IIS-like architectures allowing different modes of cleavage arose independently multiple times in evolution from various combinations of domains with alternative folds. Experimental evidence supporting our conjectures would be crucial for the understanding of complicated evolutionary pathways of the main structural and functional classes of sequence-specific nucleases. It is striking that restriction enzymes and other ENases from the HNH and GIY-YIG superfamilies, which display similar, but distinct, specificities, are amenable to sequence comparisons and structure prediction, even in the absence of highresolution X-ray structures. It suggests that they probably represent much better targets of directed evolution or knowledgebased specificity engineering than the PD-(D/E)xK ENases studied so far. Note added in proof

While this paper was being processed for publication, similar predictions regarding the ‘atypical’ restriction enzymes related to the HNH superfamily have been reported by another group22. References 1 Pingoud, A. and Jeltsch, A. (1997) Recognition and cleavage of DNA by type-II restriction endonucleases. Eur. J. Biochem. 246, 1–22 2 Roberts, R.J. and Macelis, D. (2000) REBASE - restriction enzymes and methylases. Nucleic Acids Res. 28, 306–307 3 Jeltsch, A. et al. (1995) Evidence for an evolutionary relationship among type-II restriction endonucleases. Gene 160, 7–16 4 Heitman, J. (1993) On the origins, structures and functions of restriction-modification enzymes. Genet. Eng. News 15, 57–108 5 Aggarwal, A.K. (1995) Structure and function of restriction endonucleases. Curr. Opin. Struct. Biol. 5, 11–19 6 Kovall, R.A. and Matthews, B.W. (1999) Type II restriction endonucleases: structural, functional and evolutionary relationships. Curr. Opin. Chem. Biol. 3, 578–583

11

7 Bujnicki, J.M. (2000) Phylogeny of the restriction endonuclease-like superfamily inferred from comparison of protein structures. J. Mol. Evol. 50, 39–44 8 Belfort, M. and Roberts, R.J. (1997) Homing endonucleases: keeping the house in order. Nucleic Acids Res. 25, 3379–3388 9 Altschul, S.F. et al. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402 10 Rychlewski, L. et al. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci. 9, 232–241 11 Morgan, R.D. et al. (1996) Molecular cloning and expression of NlaIII restriction-modification system in E. coli. Gene 183, 215–218 12 Xu, S.Y. et al. (1998) Cloning and expression of the ApaLI, NspI, NspHI, SacI, ScaI, and SapI restriction-modification systems in Escherichia coli. Mol. Gen. Genet. 260, 226–231 13 Xu, Q. et al. (1997) The Helicobacter pylori genome is modified at CATG by the product of hpyIM. J. Bacteriol. 179, 6807–6815 14 Gorbalenya, A.E. (1994) Self-splicing group I and group II introns encode homologous (putative) DNA endonucleases of a new family. Protein Sci. 3, 1117–1120 15 Dalgaard, J.Z. et al. (1997) Statistical modeling and analysis of the LAGLIDADG family of sitespecific endonucleases and identification of an intein that encodes a site-specific endonuclease of the HNH family. Nucleic Acids Res. 25, 4626–4638 16 Kuhlmann, U.C. et al. (1999) Structural parsimony in endonuclease active sites: should the number of homing endonuclease families be redefined? FEBS Lett. 463, 1–2 17 Park, J. et al. (1997) Intermediate sequences increase the detection of homology between sequences. J. Mol. Biol. 273, 349–354 18 Jurica, M.S. and Stoddard, B.L. (1999) Homing endonucleases: structure, function and evolution. Cell. Mol. Life Sci. 55, 1304–1326 19 Kowalski, J.C. et al. (1999) Configuration of the catalytic GIY-YIG domain of intron endonuclease I-TevI: coincidence of computational and molecular findings. Nucleic Acids Res. 27, 2115–2125 20 Cuff, J.A. et al. (1999) JPred: a consensus secondary structure prediction server. Bioinformatics 14, 892–893 21 Raaijmakers, H. et al. (1999) X-ray structure of T4 endonuclease VII: a DNA junction resolvase with a novel fold and unusual domain-swapped dimer architecture. EMBO J. 18, 1447–1458 22 Aravind, L. et al. (2000) Holliday junction resolvases and related nucleases: identification of new families, phyletic distribution and evolutionary trajectories. Nucleic Acids Res. 28, 3417–3432

J.M. Bujnicki* L. Rychlewski Bioinformatics Laboratory, International Institute of Molecular and Cell Biology, ul. ks. Trojdena 4, 02-109 Warsaw, Poland. *e-mail: [email protected] M. Radlinska Institute of Microbiology, Warsaw University, ul. Miecznikowa 1, 02-093 Warsaw, Poland.