270
NUCLEIC ACID MODIFYINGENZYMES
[231
0 ~r--AMP ADP
I 1
4--ATP
2
3
4
5
FIG.4. Reactionproductof ATPaseactivityof Tk-REC. Separation of the reaction productby thinlayer chromatography.Lane 1, negativecontrolwith no Tk-RECprotein. Lane 2, Tk-RECfrom fraction 8 was added. Lane 3, Tk-RECfrom fraction 8 was added along with 1 }xg pUC19. Lane 4, Tk-REC from fraction4 was added. Lane 5, Tk-RECfrom fraction4 was added along with 1 ~g pUC19. Fractions without DNase activity lead to ADP as the product. A representative result is shown in Fig. 4. At present, the reason for this correlation between DNase and ATPase activities is unknown. However, unlike other RecA family proteins, all fractions exhibit ATPase activity independent of both ssDNA and dsDNAs. All of these results demonstrate that Tk-REC is highly functional both in vivo and in vitro. Archaea in general, and hyperthermophilic archaea in particular, are thought to evolve from and still live in environments very similar to that of primitive earth. Cells at that time might have been exposed to harsh conditions including fairly strong UV light. Under this stress it would be essential to maintain effective DNA repair systems.
[23] Hyperthermophilic Inteins By FRANCINEB. PERLER Introduction Inteins I are intervening sequences that are posttranslationally excised from protein precursors. They are the protein equivalent ofintrons, which are intervening sequences that splice from precursor RNAs. The sequences flanking both sides of
I F. B. Perler, E. O. Davis, G. E. Dean, E S. Gimble, W. E. Jack, N. Neff, C. J. Noren, J. Thorner, a n d M. Belfort, Nucleic Acids Res. 22, 1125 (1994).
METHODSIN ENZYMOLOGY,VOL. 334
Copyright© 2001 by AcademicPress All rights of reproductionin any form reserved. 0076-6879/00$35.00
[23]
INTEINS
271
the intein are called exteins.1 During protein splicing, the intein is excised from a precursor protein and the flanking exteins are joined by a peptide bond. This ligation of exteins differentiates protein splicing from other forms of proteolytic processing. The self-catalytic protein splicing reaction is mediated by the intein plus the first carboxy-extein amino acid (aa), which are capable of splicing in heterologous exteins. However, each intein has its own "substrate" specificity that dictates allowable proximal extein residues. As of December 31, 1999, there were 100 putative inteins listed in the Intein Registry, 1 representing all three domains of life (see InBase 2 at ); 74% of these inteins are found in thermophilic organisms, mainly Archaea. Thermophilic inteins were among the first inteins discovered3 and played a key role in establishing protein splicing as a fundamental method of protein biosynthesis. The proof that inteins were spliced from precursor proteins rather than from precursor RNAs 4 and the mechanism of protein splicing were initially demonstrated using archaeal inteins. 4-1° Since their discovery in 1990, inteins have been harnessed to perform numerous protein engineering processes. Identifying Inteins A combination of criteria should be used to identify inteins, since one must differentiate true inteins from in-frame insertions present because of sequence variability or other types of insertion elements. Experimentally, an intein may be indicated when the observed size of the protein is smaller than the predicted size of the gene product and a second protein is also produced; since there may be other explanations, such as aberrant electrophoretic mobility or protease processing, the gene should be examined for the presence of intein motifs. Most putative inteins have been identified by sequence comparison, rather than experimentally.2'11-14 A large (> 100 aa) in-frame insertion in a sequenced gene that is absent in other 2 E B. Perler, Nucleic Acids Res. 28, 344 (2000). 3 E B. Perler, D. G. Comb, W. E. Jack, L, S. Moran, B. Qiang, R. B. Kucera, J. Benner, B. E. Slatko, D. O. Nwankwo, S. K. Hempstead, C. K. S. Carlow, and H. Jannasch, Proc. Natl. Acad. Sci. U.S.A. 89, 5577 (1992). 4 M. Xu, M. W. Southworth, E B. Mersha, L. J. Hornstra, and E B. Perler, Cell 75, 1371 (1993). 5 M. Xu and E B. Perler, EMBOJ. 15, 5146 (1996). 6 M. Xu, D. G. Comb, H. Paulus, C. J. Noren, Y. Shao, and E B. Perler, EMBO J. 13, 5517 (1994). 7 y. Shao, M. Q. Xu, and H. Paulus, Biochemistry 34, 10844 (1995). 8 y. Shao, M.-Q. Xu, and H. Paulus, Biochemistry 35, 3810 (1996). 9 y. Shao and H. Paulus, J. Pept. Res. 50, 193 (1997). 10 R. A. Hodges, F. B. Perler, C. J. Noren, and W. E. Jack, Nucleic Acids Res. 20, 6153 (1992). 11 E B. Perler, G. J. Olsen, and E. Adam, Nucleic Acids Res. 25, 1087 (1997). 12 S. Pietrokovski, Protein Sci. 3, 2340 (1994). J3 S. Pietrokovski, Protein Sci. 7, 64 (1998). 14j. Z. Dalgaard, M. J. Moser, R. Hughey, and I. S. Mian, J. Comput. Biol. 4, 193 (1997).
272
NUCLEIC ACID MODIFYING ENZYMES
N-terminal Splicing Region
Intein Motifs: A
B
Conserved ICl residues: ISI
T H
[23]
C-terminal Splicing Region C DE H F G I I I I DOD Endonuclease Domain
A FIG. 1. The organization and conserved sequences of a typical protein splicing precursor with a DOD homing endonuclease. The amino-terminal and carboxy-terminal splicing regions from a single protein domain that carries out the splicing reaction. The splicing regions are separated by a homing endonuclease domain as depicted, or by a shorter linker region in mini-inteins. Intein Blocks A through G are shown above the precursor, and conserved residues in selected motifs are shown below the precursor using the single-letter amino acid code. Block G contains the (His)/(Asn or Gln)/(Ser, Thr or Cys) residues and includes the first residue of the carboxy-extein. Nucleophiles are boxed.
sequenced homologs and that contains one or more intein motifs suggests that this gene may contain an intein. lnteins have several signature motifs (Fig. 1 and Table I). Blocks A, B, F, and G, 11'12 also known as Blocks N1, N3, C2, and C1,13 respectively, are present in all inteins. Block A begins with the first residue of the intein and Block B is usually 60-100 amino acids from the start of the intein. Block F closely precedes Block G, which includes the end of the intein and the first residue of the carboxyextein. Blocks C, D, E, and H (also known as Blocks EN1-413) are sometimes present between Blocks B and F and include the signature motifs of one class of endonuclease, termed the dodecapeptide (DOD) or LAGLI-DADG family of homing endonucleases 15,16 (see below). Mini-inteins (<200 aa) have a small linker in place of the homing endonuclease domain. The consensus sequence for each conserved intein motif is listed in Table I. No single position is invariant in all sequenced inteins. Any member of the amino acid group defined in Table I may be present at a given position, even when a specific predominant residue is indicated. 2,11-13 Most inteins begin with Cys or Ser and end in the dipeptide His-Asn. The carboxy-extein usually begins with Ser, Thr, or Cys. All but one intein have a conserved His in Block B. The Ser, Thr, Cys, and Asn residues are the nucleophiles that perform the chemical steps, whereas the conserved His residues facilitate these nucleophilic displacements. As more inteins have been identified and characterized, polymorphisms at these positions have become evident, including inteins beginning with Ala, ending with Gin, and with penultimate residues other than His. 2 15 M. Belfort and R. J. Roberts, Nucleic Acids Res. 25, 3379 (1997). 16 M. S. Jnrica and B. L. Stoddard, Cell. Mol. Life Sci. 55, 1304 (1999).
[23]
INTEINS
273
TABLE I CONSERVEDINTEIN MOTIFSa Motif Block Block Block Block Block Block Block Block
A (N 1): B (N3): C (ENI): D (EN2): E (EN3): H (EN4): F (C2): G (C 1):
Conserved sequence b
ChxxDpxhhhxxG GxxhxhTxxHxhhh LhGxhhaG KxlPxxh LxGhFahDG pxSxxhhxxhxxLLxxhGI rVYDLpV [1-3 residues]axx[H/E]NFh NGhhhHNp
Blocks C, D, E, and H are only present in DOD family homing endonuclease domains. b An uppercase letter indicates that the amino acid is present in >50% of inteins; no single position is invariant in all inteins. Lowercase letters represent amino acid groups: x, any residue; h, hydrophobic residue (G, A, V, L, I, M); p, polar residue (S, C, T); a, acidic residue (D or E); r, aromatic residue (F, Y, W). a
How one identifies new inteins depends on whether one is analyzing the sequence of a favorite gene or searching databases for new inteins. Although intein sequences are very divergent, even in the conserved motifs, inteins can usually be detected by running BLAST or Psi BLAST searches 17 using a few individual intein sequences, searching for intein motifs, 11-13 or by using an intein-trained Hidden Markov model.18 Protein S p l i c i n g R e a c t i o n Protein splicing requires proteolytic cleavage of the precursor protein at both splice junctions and formation of a native peptide bond between the exteins. This complex pathway is mediated by a highly coordinated series of simple chemical reactions (four nucleophilic displacements). The elements directing these chemical reactions are contained within the intein plus the first downstream extein residue. However, splicing in foreign proteins is generally not as efficient as splicing in the native context, suggesting that the exteins may play a role in the splicing reaction by helping to align the splice junctions or by enabling proper folding of the intein catalytic core. The extein amino acid preceding the intein is the target of 3/4 of 17 S. E Altschul, T. L. Madden, A. A. Schaffer, J. Zhang, Z. Zhang, W. Miller, and D. J. Lipman, Nucleic Acids Res. 25, 3389 (1997). 18 j. Z. Dalgaard, Trends Genet. 10, 306 (1994).
274
NUCLEIC ACID MODIFYING ENZYMES
[23]
the nucleophilic displacements. Some inteins are more tolerant of proximal extein sequences than are other inteins.~9-22 A review of the mechanism can be found in Noren et al. 23 Protein splicing is initiated by an acyl rearrangement of the Ser or Cys at the beginning of the intein to form a (thio)ester bond between the intein and the amino-extein (Fig. 2). The upstream splice junction is cleaved when the side chain of the Ser, Thr, or Cys at the beginning of the carboxy-extein attacks this (thio)ester bond, resulting in extein ligation and formation of a branched intermediate. The intein is released from the branched intermediate when the intein carboxy-terminal Asn or Gin cyclizes, cleaving the downstream splice junction. In the absence of the intein, the ligated exteins undergo a spontaneous acyl rearrangement to form a peptide bond. Structural and mutagenesis data indicate that the His in Block B assists in the nucleophilic displacements at the upstream splice junction and the intein penultimate His facilitates Asn cyclization.5'24-26 I n t e i n E x p r e s s i o n in H e t e r o l o g o u s P r o t e i n s a n d O r g a n i s m s In all cases examined, protein splicing is rapid and efficient in the native organism, suggesting that inteins may have little effect on extein expression. However, that is not always the case when intein containing genes are expressed in heterologous organisms, such as Escherichia coli, or when inteins are cloned into model exteins. Most often, the rate of splicing is slower, some precursors fail to splice, dead-end single splice junction cleavage products accumulate, and precursors are proteolyzed, suggesting that the precursors are not optimally folded. If precursors are degraded by cellular proteases, yields may he improved by inducing expression at lower temperatures (12-20 °)4 or by using protease-deficient strains. Insertion of an intein into a foreign extein sometimes results in splicing at one temperature, but not at another temperature. For example, the Pyrococcus sp. GB-D DNA polymerase intein only spliced in a chimeric extein at temperatures above 25-30 ° 4 whereas the Mycobacterium xenopi gyrase subunit A intein only spliced in a chimeric extein at temperatures below 20°. 21 Both inteins spliced at
19 S. Chong, K. S. Williams, C. Wotkowicz, and M. Q. Xu, J. Biol. Chem. 273, 10567 (1998). 2o M. W. Southworth, K. Amaya, T. C. Evans, M. Q. Xu, and E B. Pealer, BioTechniques 27, 1 l0 (1999). 21 A. Telenti, M. Southworth, E Alcaide, S. Daugelat, W. R. Jacobs Jr., and E B. Pealer, J. Bacteriol. 179, 6378 (1997). 22 S. Nogami, Y. Satow, Y. Ohya, and Y. Anraku, Genetics 147, 73 (1997). 23 C. J. Noren, J. Wang, and E B. Peder, Angew. Chemi. Int. Ed. Engl. 39, 450 (2000). 24 M. Kawasaki, S. Nogami, Y. Satow, Y. Ohya, and Y. Anraku, J. Biol. Chem. 272, 15668 (1997). 25 T. Klabunde, S. Sharma, A. Telenti, W. R. Jacobs, Jr., and J. C. Sacchettini, Nat. Struct. Biol. 5, 31 (1998). 26 X. Dual, E S. Gimble, and E A. Quiocho, Cell 89, 555 (1997).
[23]
INTEINS
275
N-extein ~ ~
HX C-extein
acyl shift I t ~l
'
ri_ ,LT H2Nf
0 Asn
I
H
intein ~ 1 . ~ . -
-lll
yN.2 Transesterification [ [
0
~]
Asn
~~~ 0
H2NHX~ intein Asncyclization
~
AsOn
X-N acyl shift ,1,
¥
Ligatedexteins~ " ~
?~'~
H
FIG.2. The protein splicing mechanism is depicted with X representing either the oxygen or the sulfur present in the side chain of Ser, Thr, or Cys. In some inteins, Asn is replaced by Gin, which also can cyclize. All tetrahedral intermediates and proton transfer steps are omitted for clarity. The shaded rectangles represent the exteins.
276
NUCLEIC ACID MODIFYINGENZYMES
[23]
the nonpermissive temperatures when in their native extein context. Although the reason for this temperature dependent splicing is unknown, it probably reflects a suboptimal extein context. In attempting to study splicing of any intein, one may have to examine constructs that have partial extein sequences to ensure proper active site architecture. However, many inteins splice in a variety of contexts, while others do not, suggesting that each intein has its own degree of flexibility with respect to "substrate" (extein) specificity. When cloning an intein into a foreign extein, it is often better to choose an insertion site whose sequence is similar or identical to the native extein for 1-5 amino acids on each side of the intein. Analysis of native intein insertion sites failed to indicate a preferred type of secondary structure or sequence. 2' 1~ 14 Inteins tend to be found in conserved extein motifs or active sites, which are likely to be surface accessible since they need to interact with substrates and cofactors. Therefore, inserting an intein into a surface location might improve splicing and result in less proteolysis due to precursor misfolding. Inteins may also be cytotoxic in heterologous organisms, because of homing endonuclease activity.15'16 The Thermococcus litoralis DNA polymerase gene (Vent pol) contains two inteins that are active endonucleases (PI-Tli I and PI-Tli I1).3'1° Each of these enzymes cleaves the E. coli genome several times (T. Davis and E B. Perler, New England Biolabs, 1993, unpublished data), rendering the T. litoralis DNA polymerase unclonable on a multicopy plasmid in E. coli unless one of the two inteins is deleted. 3 Expression of intein containing genes may thus be improved by inactivating the endonuclease, which can be accomplished by mutating the conserved Asp residues in Blocks C and E. 1° Intein Distribution Are inteins an archaeal phenomenon, since 70% of inteins are found in archaea? 2 Six fully sequenced eubacterial genomes have no inteins, 2 whereas many fully sequenced archaeal genomes contain multiple inteins (Table II), including Methanococcus jannaschii (19 inteins), Pyrococcus horikoshii OT3 (14 inteins), Pyrococcus abyssi (14 inteins), Pyrococcus furiosus (10 inteins), Methanobacterium thermoautotrophicum (delta H strain) (1 intein), Aeropyrum pernix (1 intein), and Archaeoglobus fulgidus (no identified inteins). Inteins have also been identified in three thermophilic eubacteria, Deinococcus radiodurans (2 inteins), Rhodothermus marinus (1 intein), and Aquifex aeolicus (1 intein). Archaeal genes often contain two or three inteins, whereas no eubacterial or eukaryal gene identified to date has more than one intein. In several cases, inteins represent greater than 50% of the coding region of the precursor. Thirty-four different intein insertion sites have been found in 27 genes from thermophilic organisms. Inteins present at the same insertion site in homologous genes from different organisms are considered intein homologs or alleles. However, the presence of an intein in a gene
[23]
INTEINS TABLE
277
II
INTE1NS IN S E Q U E N C E D A R C H A E A L G E N O M E S a
Intein b or (insert site)
Pab ( 1 4 ) c
Pho ( 1 4 )
Pfu ( 1 0 )
Mja
Ape
(19)
164
168
.
(CDC21-b)
268
260
367
--
--
--
GF-6P
--
--
--
499
--
--
Helicase
--
--
--
501
--
--
Hyp- 1
--
--
--
392
--
--
Hyp
.
.
.
Mth
(CDC21-a)
.
.
(1)
.
.
468
IF2
394
444
387
546
--
--
KlbA
196
520
522
168
--
--
LHR
--
475
Lon
333
474
--
--
Moaa
455
.
PEP
--
--
--
412
--
--
(pol-a)
--
--
--
369
--
--
(pol-b)
--
460
--
476
--
--
(pol-c)
.
P o l II ( D P 2 )
185
166
.
.
.
.
RadA
--
172
.
.
.
.
(RFC-a)
499
525
525
548
--
--
(RFC-b)
--
--
--
436
--
--
(RFC-c)
608
--
--
543
--
---
.
(1)
.
.
.
401 .
-.
.
.
.
.
.
.
.
(r-Gyr-a)
--
410
373
494
--
(RIRI-a)
399
--
454
--
--
--
(RIRI-b)
382
385
382
--
--
134
--
(RIRI-c)
438
.
RNR- 1
--
--
. --
.
453
.
. --
RNR-2
--
--
--
533
--
--
RpolA"
--
--
--
471
--
--
RpolA r
--
--
--
452
--
--
RtcB-2
436
390
380
488
--
--
TFIIB
--
--
--
335
--
--
UDP GD
--
--
--
454
--
--
VMA (VMA-b)
429
376
425
--
--
--
" T h e n u m b e r o f a m i n o a c i d s in e a c h intein is indicated, w h i l e the a b s e n c e o f an intein is indicated b y a dash
(--). Pab, Pyrococcus abyssi; Pho, Pyrococcus horikoshii OT3; Pfu, Pyrococcusfuriosus; Mja, Methanococcus jannaschii; Mth, Methanobacterium thermoautotrophicum (delta H strain); Ape, Aeropyrum pernix. b T h e intein n a m e is listed unless m o r e than one insertion site is present in a gene, in w h i c h c a s e the intein insertion site is listed in parentheses. C D C 2 1 , C e l l division control protein 21; GF-6P, g l u t a m i n e - f r u c t o s e - 6 - p h o s p h a t e t r a n s a m i n a s e ; H y p , h y p o t h e t i c a l protein; IF2, translation initiation factor IF2; K l b A , k i l B operon o r f A ; L H R , large h e l i c a s e - r e l a t e d protein; Lon, A T P - d e p e n d e n t p r o t e a s e L A ; M o a a , m o l y b d e n u m cofactor biosynthesis h o m o l o g ; PEP, p h o s p h o e n o l p y r u v a t e s y n t h a s e ; pol, D N A p o l y m e r a s e ; Pol II, D N A p o l y m e r a s e II, D P 2 subunit; R a d A , R a d A D N A repair protein; R F C , replication factor C ; r-gyr, reverse g y r a s e ; R I R I , ribonucleosided i p h o s p h a t e reductase, a subunit; R p o l A " , R N A p o l y m e r a s e subunit A " R p o l A ' , R N A p o l y m e r a s e subunit A ' ; R t c B , R N A t e r m i n a l p h o s p h a t e c y c l a s e operon o r f B ; T F I I B , transcription factor l i B ; U D P G D , U D P - g l u c o s e d e h y d r o g e n a s e ; V M A , vaculor ATPase, subunit A. ' T h e n u m b e r in p a r e n t h e s e s indicates the n u m b e r o f inteins per s e q u e n c e d g e n o m e .
278
NUCLEIC ACID MODIFYING ENZYMES
[23]
from one species does not ensure that a homolog from a closely related organism will also contain an intein (Table II). Some inteins are more widely dispersed than others. For example, RIR1 (ribonucleoside-diphosphate reductase ot subunit) intein alleles are found in all three domains of life. e Several reports have commented upon the observation that inteins tend to be found in proteins involved in transcription, translation, replication, and DNA repair. In thermophilic organisms, 5 1% of inteins are present in these types of proteins. It is possible that this distribution of inteins reflects how they may be acquired. Homing Endonuclease Domain and Intein Mobility There are several classes of homing endonucleases, which are named for their signature motifs) 5A6 Homing endonucleases are a type of site-specific endonuclease that makes a double-stranded break, usually leaving a four-base overhang. They tend to have large, degenerate recognition sites (18-40 base pair) and fail to cleave DNA isolated from their native organism. Most inteins contain a centrally located sequence that has similarities to the DOD class of homing endonucleases (Fig. 1). One intein, the Synechocystis sp. PCC6803 DNA Gyrase B intein, contains a second class of homing endonuclease termed an HNH homing endonuclease. Endonuclease activity has only been demonstrated experimentally with a few inteins, including the T. litoralis and Pyrococcus sp. GB-D DNA polymerase inteins, 3,4A° the two Pyrococcusfuriosus ribonucleoside-diphosphate reductase et subunit inteins, 27,2s and the Pyrococcus kodakaraensis KOD DNA polymerase inteins. 29 Many inteins do not have essential residues in their endonuclease active sites, 2,10,15,16 such as the conserved Asp in Blocks C and E, and may thus be inactive as endonucleases. Some inteins contain intermediate sized inserts between Blocks B and F that appear to lack one or more homing endonuclease motifse; these may represent inteins in the process of becoming mini-inteins, losing their endonuclease domains by gradual mutation and deletion. Homing endonucleases are generally found within mobile intron or intein genes, although some, such as the yeast HO endonuclease involved in mating type switching, are present as free-standing genes.15,16 Homing endonucleases cut at or near the site in the extein where the intein gene is usually inserted. This double-stranded break in the intein minus gene initiates an extremely efficient, unidirectional gene conversion event, resulting in transfer of the intein gene into the homologous extein gene that previously lacked the intein. 15,16,3° Once the
27 K. Komori, K. Ichiyanagi, K. Morikawa, and Y. Ishino, Nucleic Acids Res. 27, 4175 (1999). 28 K. Komori, N. Fujita, K. Ichiyanagi, H. Shinagawa, K. Morikawa, and Y. Ishino, Nucleic Acids Res. 27, 4167 (1999). 29 M. Nishioka, S. Fujiwara, M. Takagi, and T. Irnanaka, Nucleic Acids Res. 26, 4409 (1998). 30 M. Belfort and P. S. Perlman, J. Biol. Chem. 270, 30237 (1995).
[23]
INTEINS
279
intein insertion site is cut, the only gene present to repair this DNA break is the intein-containing gene. Lateral transfer of an intein gene may occur when a piece of DNA that encodes an intein containing gene enters the cell by any means (conjugation, transformation, infection, mating, etc.). Intein homing by homologous recombination is very efficient, 15,16,3°,31 whereas insertion of the intein into a nonhomologous site would be very rare because of the inefficiency of illegitimate recombination. Lateral transmission of inteins is borne out in phylogenetic studies demonstrating that inteins present at the same insertion site in genes from vastly different organisms are more closely related to each other than multiple inteins present in the same gene.ll' 14,32 Although gain of intein genes with homing endonuclease activity is simple and efficient, loss of inteins is not. Introns can be lost by retrotransposition of spliced mRNA back into the genome, but intein-containing genes cannot be lost in this way, since their RNA is never spliced. Instead, there must either be recombination with intein minus genes (which would only occur efficiently if the endonuclease was inactive) or a precise deletion of the intein coding sequence that would maintain the extein reading frame and not leave extra residues that would inhibit extein activity. The latter constraint may also explain why inteins are found in essential extein sequences. Given the sporadic distribution of inteins and their ability to invade genes, it seems likely that present-day inteins are relatively recent acquisitions that arrived by lateral transfer. There are several instances when GC content or codon usage of the intein is different from the rest of the genome. Inteins may have accumulated in archaea because archaea are naturally competent to take up free DNA, have efficient conjugation systems, or have broad host range viruses, all of which would enhance the spread of inteins. The possibility that inteins are spread by viruses or plasmids may account for their increased presence in genes involved in transcription, translation, replication, and repair, since these types of genes are more likely to be present on episomal elements than genes for other types of biochemical pathways. Hyperthermophiles might also have very efficient recombination systems, since life at high temperatures may lead to increased DNA damage, requiring efficient DNA repair. Studies in mesophiles indicate that cellular repair and recombination systems are required for endonuclease-mediated gene conversion. The question of whether inteins are recent or ancient elements has yet to be answered. In ancient times, inteins may have been involved in evolution of better enzymes by domain shuffling, since splicing in trans can be very efficient. 23 The inteins that we see today may be remnants of these ancient inteins that spread to different sites as the splicing domains picked up different homing endonucleases. 31 E S. Gimble and J. Thorner, Nature 357, 301 (1992). 32 j. Z. Dalgaard, A. J. Klar, M. J. Moser, W. R. Holley, A. Chatterjee, and I. S. Mian, Nucleic Acids Res. 25, 4626 (1997).
280
NUCLEIC ACID MODIFYING ENZYMES
[9.3]
Intein genes are selfish DNA, since they can invade extein genes, but this event is not lethal because inteins splice efficiently to generate active extein protein. The unanswered question is whether inteins have any effect on their hosts. Many have suggested that inteins may have a role in biological regulation. As people begin to harness the genome and its products, we may be able to begin asking if there are any advantages or disadvantages of inteins under various physiological stresses, remembering that the rapid self-catalytic protein splicing mechanism leaves little room for environmental influences other than at the level of message stability, protein folding, extremes of pH, or trans-acting inhibitors. I n t e i n s in P r o t e i n E n g i n e e r i n g Understanding intein function has led to their use in a variety of applications.23 Temperature-dependent inteins 5 can be inserted into proteins to control expression of the active extein in vivo or in vitro, generating conditional knockouts or allowing expression of inactive cytotoxic proteins that can be activated in vitro by protein splicing. Hyperthermophilic inteins have been split in their endonuclease domains and precursor fragments have been shown to reassemble and splice. 33'34 Protein splicing in trans allows segmental modification of a protein, which has been used to improve resolution for NMR spectroscopy,33 but can also be used to express extremely cytotoxic proteins. Protein purification vectors have been developed, based on modified inteins that only cleave at a single splice junction. When the target protein is cloned at the amino-terminal splice junction, it is recovered with a carboxy-terminal a-thioester; this reactive group has been used to perform several different types of chemoselective condensations, including protein semisynthesis and incorporation of modified amino acids, biosensors, or tags. The list of applications involving inteins is exponentially expanding as more uses of autocleavage elements and reactive carboxy-terminal a-thioesters are being devised. Acknowledgments I thank Don Comb for support and encouragement and Maurice Southworth, Eric Adam, and other members of the New England Biolabs protein splicing groups.
33 T. Yamazaki, T. Otomo, N. Oda, Y. Kyogoku, K. Uegaki, N. Ito, Y. Ishino, and H. Nakamura, J. Am. Chem. Soc. 120, 5591 (1998). 34 M. W. Southworth, E. Adam, D. Panne, R. Byer, R. Kautz, and E B. Perler, EMBO J. 17, 918 (1998).