Evolution h vitro: analysis of a lineage of ribozymes Niles Lehman* and Gerald F. Joyce Departments of Chemistry and Molecular Biology, The Scripps Research Institute, 10666 North Torrey Pines Road, La Jolla, California 92037, USA.
Background: Catalytic RNAs, or ribozymes, possessing both a genotype and a phenotype, are ideal molecules for evolution experiments in vita. A large, heterogeneous pool of RNAs can be subjected to multiple rounds of selection, amplification and mutation, leading to {the development of variants that have some desired phenotype. Such experiments allow the investigator to correlate specific genetic changes with quantifiable alterations of the catalytic properties of the RNA. In addition, patterns of evolutionary change can be discerned through a detailed examination of the genotypic composition of the evolving RNA population. Results: Beginning with a pool of 1013variants of the Tetrahymena ribozyme, we carried out in vitro evolution experiments that led to the generation of
ribozymes with the ability to cleave an RNA substrate in the presence of CazCions, an activity that does not exist for the wild-type molecule. Over the course of 12 generations, a seven-error variant emerged that has substantial Ca*+-dependent RNA-cleavage activity. Advantageous mutations increased in frequency in the population according to three distinct dynamics logarithmic, linear and transient. Through a comparative analysis of 31 individual variants, we infer how certain mutations influence the catalytic properties of the ribozyme. Conclusions: In vitm evolution experiments make it possible to elucidate important aspects of both evolutionary biology and structural biochemistry on a reasonably short time scale.
Current Biology 1993, 3:723-734
Background Biological structures are the consequence of evolution by natural selection, a process that can be described as irrational design Ill. In contrast, most human creations, even bio-functional molecules, are the result of rational thought. The chief drawback to the rational design of enzyme catalysts, as many scientists are acutely aware, is that we often lack sufficient knowledge of the physico-chemical constraints that determine the func-
(a)
Correspondence
tional properties of these complex molecules. Recently, efforts have begun to let evolution do the work of macromolecular adaptation for us, by selecting desirable molecules out of a large population of variants (reviewed in [2,31XOur previous experiments with the Tetrahymena ribozyme have demonstrated that RNA molecules with desired catalytic properties can be derived from an existing enzyme via evolution in vitro [4,51. Auspiciously, detailed monitoring of the changes that take place in evolving populations Fig. 1. Catalytic activity of a roup I ribozyme. (a) The ribozyme (re li ) binds a complementary oligonucleotide substrate (blue) by Watson-Crick base pairing, and catalyzes nucleophilic attack by guanosine 3’-OH at a phosphodiester linkage within the substrate that lies immediately downstream from the region of base pairing. If the guanosine nucleophile is located at the 3’ end of the ribozyme itself, then the products of the reaction are the released 5’ portion of the substrate and the 3’ portion of the substrate covalently attached to the 3’ end of the ribozyme. (b) The ribozyme is a metalloenzyme, requiring either Mg*+ or MnZ+ for catalytic activity. One role for the metal ion is stabilization of the developing negative charge on the uridine 3’-0 leaving group [9].
(b)
to: Gerald F. Joyce. *Present address: Department
of Biology,
University
0 Current Biology 1993, Vol 3 No 11
of Oregon,
Eugene, Oregon,
97403 USA.
723
724
Current
Biology
1993, Vol 3 No 11
RNA
5’
l-fy---
Primer 1 ii-
Primer 2
/
3%
b 1-51 2
cDNA RT
of RNA molecules reveals similarities to the genetic modifications that occur during the evolution of natural populations of organisms. These changes highlight structural features of the RNA that play an important role in its catalytic function The group I intron from Tetrahymena tbermophila large subunit pre-ribosomal RNA has a well-defined cat&tic center and is perhaps the best characterized ribozyme (reviewed in [6,71). A shortened form of this molecule, containing 393 nucleotides and a nucleophilic guanosine residue at its 3’ end, catalyzes a sequence-specific phosphoester cleavage reaction involving an external oligonucleotide substrate. The reaction proceeds by a trans-esterification mechanism that results in attachment of the 3’ portion of the substrate to the 3’ end of the ribozyme (Fig. 1). The wild-type ribozyme behaves similarly to a protein metalloenzyme: it relies on Mg*+ or Mn*+ divalent cations at its active site to carry out its catalytic function [8,91. Moreover, divalent cations are required to stabilize the active conformation of the ribozyme. For structural stabiliz’aion, but not for catalysis, Ca*+ of Sr*+ can substitute for yg*+ or Mn*+ [lO,lll. In the presence of a Mg*+ cofactcW, the wild-type ribozyme is an efficent enzyme, exhibiting multiple-turnover kinetics in reactions involving (kcat/K,,, 5~ 10’ M-l’ min-I) cleavage of a specific RNA substrate 1121.
Fig. 2. Outline of the in vitro evolution system. Catalytic activity of the ribozyme (red), results in attachment of the 3’ portion of the substrate (blue) to the 3’ end of the ribozyme. Individual ribozyme molecules that undergo this reaction can bind an oligodeoxynucleotide primer (primer 1) that initiates selective cDNA synthesis. A second oli odeoxynucleotide primer (primer 2) iinds to the 3’ end of the cDNA and introduces the T7 RNA polymerase (17 pal) promoter sequence (P). The DNA-dependent DNA polymerase activity of reverse transcriptase (RT) results in synthesis of the second strand of the promoter. T7 RNA polymerase then generates hundreds of copies of RNA per copy of DNA template. Each copy retains a 3’-terminal sequence that corresponds to the 3’ portion of the substrate, allowing it to bind primer 1 and undergo additional amplification. Isothermal amplification continues until an RNA concentration of l-10 p&I is reached. The RNA is then converted to cDNA using the selective primer 1. The resulting DNA is further amplified in a non-selective manner by the PCR, which allows new mutations to be introduced at each generation [47]. The PCR makes use of the same primer 2, but a different primer 1 that restores the 3’ terminus of the ribozyme to its original form. The progeny population is obtained by in vitro transcription of the PCR DNA.
The simultaneous possession of replicatable genetic information and sequence-dependent catalytic function makes ribozymes particularly suitable for evolutionary manipulation in vitro. A pool of variant ribozymes can be;challenged with catalytic tasks that cannot be executed by the wild-type ._ ribozyme, and those variants ’ I
Generation
RNA Fig. 3. Improvement of Ca 2+-dependent RNA-catalyzed cleavage activity over 12 successive generations of evolution in vitro. The composite population at each generation was assayed as described in Materials and methods. In this standard plot, each data point represents the average of at least three independent measurements.
Evolution in vitro of ribozymes Lehman and Joyce
RESEARCH PAPER
-u-i :IC-G“G--193
Fig. 4. Secondary structure of the Tetrahymena ribozyme [481 showing those nucleotide positions that were never found to vary among the 300 sequenced clones. Nucleotides shown in red never varied, yet lie within the region that was randomized in the initial pool of variants. Nucleotides shown in blue never varied and lie outside the region of initial randomization. Phylogenetically conserved sequence elements are shown in bold. Labeled positions are the sites of the most significant mutations. Arrows demarcate the extent to which the amplification primers bind, precluding mutations over 20 nucleotides at each end of the ribozyme. Conserved, paired regions Pl to P9 are indicated. RNA substrate is shown in yellow tint.
that succeed can be amplified by powerful RNA amplification techniques (Fig, 2). Multiple rounds of selective enrichment (generations) can lead to the isolation of highly differentiated ribozymes with desired functional properties. If significant levels of mutation are applied in concert with the amplification process, then the system becomes truly evolutionary, producing variants of the selected variants. By sampling individuals from the population and determining both their complete nucleotide sequence and their degree of functional alteration relative to the wild type, the investigator can draw correlations between genotype and phenotype. Such studies potentially allow RNA structure-function relationships to be inferred, and may provide a detailed picture of molecular evolution in a simple system that is devoid of epigenetic interactions.
In the study described here, we have extended a previously successful evolutionary lineage 151 to produce variant TetrahJmena ribozymes that are adept at cleaving an RNA substrate when only Ca2+ divalent cations are provided in the reaction mixture. Sequencing 50 individual variants from every evennumbered generation;, over 12 successive generations, indicates that a sevenerror variant of the wild-type ribozyme is the putative ‘winning’ ribozyme, but that its ascent to prominence was not direct. Comparative
assays of many variants suggest that there are several classes of adaptive mutations, which can be categorized both in terms of their changing frequency of occurrence in the population, and the way in which they affect the catalytic properties of the ribozyme.
Results Composite phenotype Ribozymes were selected over 12 successive generations for their ability to catalyze the cleavage of a target RNA substrate in the presence of 10 mM CaCl,, The pool of ribozymes obtained after each generation was assayed for its composite activity (Fig. 3). The starting pool (Go) has no detectable activity, although the wildtype RNA exhibits very slight activity that can be attributed to trace Mg2+ contamination of the reaction buffer (0.4 pM, measured by plasma emission spectrometry [51; a more stringent analysis of the metal-dependence of the wild-type ribozyme showed no detectable activity in 12 rnM CaCl, 191).The twelfth generation (G,,) ribozymes have acquired Ca2+dependent RNA cleavage activity that approaches the Mg2+-dependent activity of the wild type. With the exception of the slight decline between G,, and G,,, each population is more active than the one preceding.
725
Our assay makes use of a substrate that has been radioactively labeled at nucleotide positions that lie downstream (3’ direction) from the cleavage site. This would allow us to observe both the phosphoester transfer reaction that results in attachment of the 3’ ponion*of the substrate to the 3’ end of the ribozyme, and the subsequent site-specific hydrolysis reaction that releases the 3’ piortion of the substrate (Fig. 1). However, unlike reactions in which Mg2+ or Mn2+ is provided as the metal cofactor, the hydrolysis step is not detected in the Ca2+-dependent reaction. The 3’ portion of the cle:aved substrate remains covalently
bound to the 3’ end of the ribozyme, thus preventing the ribozyme from undergoing catalytic turnover. This attachment is removed during the PCR amplification step that follows selective amplification (see Materials and methods), so that the original 3’ terminus is restored in the subsequent progeny population of ribozymes.
The continuously improving phenotype of the composite population strongly suggests that there are
Fig. 5. Sites at which mutations occur over the course of evolution, superimposed on the secondary structure of the Tetrahymena ribozyme. Box height corresponds to frequency of mutation (shown as % of 50 clones at each generation.) Immutable primer binding sites are shaded grey and substrate is shown in black. Labeled positions in G,, are sites of the most significant mutations. (Color coding of these positions corresponds to colors employed in Figure 6.)
Evolution
in vitro of ribozymes Lehman
and Joyce
RESEARCH PAPER
1.Table 1. Genetic variability within the evolving population. Generation
k average
I
k range
4
2.3 7.0 3.5
o-7 O-8
:
6.0 4.9
3-10 2-9 4-9 5-9
,-.
k mode
5 6 7
k
I
H
H
standard deviation
average
260,270
2.6 1.5 i.:
1:5 1.3 1.1
0.038 0.041 0.029 0.026
0.95 1.01 0.27 0.00
Summary statistics of the changing genotypic composition of the evolving ribozyme population. The variable k specifies the number of mutations relative to the wild type; the variable His the Shannon diversity (see Materials and methods). Values for generation 0 are expectations based on a binomial distribution of mutant production in the randomized pool. Data from generations 2, 4, 6, 8, IO and 12 are based on 50 individual sequences obtained from the population.
underlying genetic changes responsible for the altered behavior. The 50 randomly-selected individuals chosen at each time point for sequence analysis represent a minute fraction of the total number of molecules that are present, but nonetheless give an indication of the genotypic composition of the evolving population. By compiling the sequencle data from all 300 variants, one can focus either on the nucleotide positions that were never found to vary (Fig. 4), or on those positions that varied with high frequlency over the course of in vitro evolution (Fig. 5).
Many of the positions throughout the catalytic core of the ribozyme were found to be invariant (Fig. 4). Within the 140-nucleotide region that was extensively randomized in the initial population, 40 sites (29 %> were invariant. These 40 sites include all 8 nucleotide positions (114, 207, 261, 264, 301, 303, 310 and 311) that have been previously noted for their near-universal degree of phylogenetic conservation among known group I introns [13-151. For example, functional constraints prevent alteration of the 264G-311C base pair that binds the guanosine nucleophile as a G-C-G base triple within the active site of the ribozyme 1161. Similarly, the majority of nucleotides in the P7 stem and in the single-stranded region between P8 and P7 were unchanged. When the entire 353nucleotide-long portion of the ribozyme that is subject to ongoing mutation is considered, 42 % of all sites are invariant, and when mutations that preserve base pairings are disregarded, approximately 50 % of the sites fail to exhibit any disruptive mutations. Consequently, the variants that resulted from the in vitro evolution process appear to be subject to many of the same sequence constraints, that apply to the wild-type RNA, for example in the context of the standard Mg*+-dependent reaction. The frequency of the most commonly occurring mutations was tracked :across generational time (Fig. 5).
Considered as independent events, mutations at 11 nucleotide positions are noteworthy because they occur in more than 15 % of the 50 individuals sequenced from at least one of the studied generations. It is likely that these positions represent the constellation of genetic changes that evolved in response to the imposed selection constraint. Mutations at the following nucleotide positions, 103:A+G, 270:A+G, 271:U+C and 312:G+A, are the most striking. All four of these changes rose from very low frequency in G, (< 5 o/o>to fixation (approximately 100 %> in G,, and G,,. Mutations at four other nucleotide positions, 87:A+deleted, 94:A-+Y (Y = C or U), 187:A-+not A, and 189:C+not C, also show a general increase in frequency over the generations. The final three mutations of interest, 193:C+not C, 25&U-A, and 260:c+~, peak in frequency at G, or G6 and then drop off to near extinction by G,,. These patterns of evolutionary change are summarized in Figure 6.
utational Co-occurrences Throughout the evolutionary lineage, most variants possessed multiple mutations. We use the term ‘error’ to refer to mutations that distinguish an individual variant from the wild type. Summary statistics for all error classes are given in Table 1. The wild-type sequence was observed only six times: five times in G, and once in G,. After the initial culling of nonfunctional variants that occurred during the first generation, the evolving population steadily accumulated
4
6
8
10
12
Generation fig. 6. Frequencies of nine of the most common mutations at two-generation intervals over 12 generations of evolution in vitro. Sequence data were obtained from 50 clones at every even-numbered generation. Spline curves based on third-degree polynomials were fitted to the intervals between successive data points. Mutations at positions 103, 270, 271 and 312 (green) increase logarithmically; mutations at positions 187 and 189 (blue) increase linearly; mutations at positions 258 and 260 (red) occur transiently; and mutations at position 94 (yellow) increase chaotically.
727
728
Current
Biology
1993, Vol 3 No 11
k
Source
Number of errors per individual ,Q
Wi Id type
1
mut. mut. mut. 4-23 4-4 mut.
2
3
4
5
6
7
Mutations at specific nucleotide positions 94
103
187
189
258
260
270
271
312
A
A
A
C
U
C
A
U
G
0.01
G
0.02 0.02 0.02 0.04 0.04 0.08
A U A A C
mut. 4-69 4-25 4-21 mut. 4-36 6-17 mut. 4-44 6-54 6-2
other
A A
G
C C C
0.01 0.04 0.06 0.15 0.19 0.26
C C C C C
A A A
170
0.01 0.17 0.37 0.39 0.81
A A A A
193
G
C C C C C
G G G
C C C
A A
G G G
C C C C
G G
C C
A A A
283
G A G
G A
G
A U
6-64 6-19 6-30 6-20 6-33
A
G
6-10 12-15 8-61
G G c
U U U U U U U
U G
6-41 10-35 12-22 12-31
U
U
G G c
IO-43 12-37
C U
G G
A
A A
A A
G
Activity (% substrate cleaved)
A
0.23 0.29 0.36 0.57 2.56
193,292
0.60 1.78 2.21
A A A A
225 314 87
1.91 2.06 3.38 3.66
A A
335 87
2.94 4.34
Sequence and catalytic activity of the wild-type ribozyme and 31 variants. The source of individual varia@s was either clones obtained from the evolving population (for example, 4-23, clone 23 from generation 4) or constructs prepared by site-diiected mutagenesis (mut.; see Materials and methods). The variable k indicates the number of mutations relative to the wild type. For individual variants, only those nucleotide positions that differ from the wild-type sequence are shown. Mutations at position 87 were A-tdeleted; mutations at position 193 were C-SC. Catalytic activity was measured in a side-by-side assay, as described in Materials and methods.
mutations as the experiment progressed. The average number of errors per individual, Iz, increased monotonically from 2.3 in G, to 6.6 in G,,. The modal error class increased linearly, rising by one mutation for every two generations. While the population was moving further away from the wild-type sequence in multidimensional sequence space (which can be quantified as an increasing Hamming distance [171), the genetic variability expanded early and I:hen diminished in later generations. Such variability ‘can be measured in several ways. Most crudely, the-range of error classes present in the population was maximal in G, (0 < k < 8) and dropped to its lowest value by G,, (5 < k < 9). Similarly, the standard deviation of t.he error class value peaked at G, and subsequently fell steadily. A more relevant measure of genetic variability, one that weighs the contributions
of each nucleotide position, including those of the eleven positions that are thought to be responding to selection, is the Shannon diversity of the population (Table 1). Shannon diversity, H, is an information theory parameter that provides an indication of the heterogeneity of a population (see Materials and methods). Positional H values, whether averaged over the entire ribozyme or restricted to critical nucleotide positions 260 and 270 (see below), are greatest for G, and G,, and fall sharply for G,, and G,,. Contributing to this drop is an attenuation of the observed base changes at positions 94, 187 and 189, which by G,, become narrowed primarily to A+U at position 94, A-+U at position 187 and C+A at position 189. Taken together, these measurements corroborate the view that the population at later generations is converging on a single, highly adaptive genetic sequence, one that contains specific mutations at seven nucleotide positions: 87, 94, 103, 187, 270, 271 and 312.
Evolution in vitro of ribozymes Lehman and Joyce
RESEARCHPAPER
The reaction conditions directed mutagenesis. employed in this assay closely mimicked those of the of the selection step in the in vitro evolution experiment, so that the selective value of particular constellations of mutations can be inferred.
. . . . .
On the other hand, some mutations tend to occur together and not alone. For example, mutations at nucleotide positions 103 and 270 were both infrequent in the absence of a mutation at position 271. This can be seen from the lag between the time of proliferation of the mutation at 271 and the co-emergence of the mutations at positi0n.s 103 and 270 (Fig. 6). The frequency dynamics of mutations at positions 187, 189 and 312 appear to be unaffected by the presence or absence of other mutations. Finally, the deletion at position 87, which was either very rare or non-existent in the G, population, rose steadily in frequency once it first appeared in Gg.
In general, it can be said that the surviving, highererror variants are more active than the lower-error ones (Fig. 7). The wild-type RNA is less active than any variant assayed, whereas a seven-error variant (clone 37 from G,,) has the highest activity. This variant contains mutations at nucleotide positions 87, 94(U), 103, 187(U), 270, 271 and 312; it is unique among the 300 clones that were sequenced. The second most active variant (clone 31 from G,,) contains six errors, the same mutations as the seven-error variant except that it lacks the deletion at position 87 (Table 2). There are seven individuals with this exact sequence among the 50 sequenced G,, clones, three among the 50 sequenced. G,, clones, and none among the 200 sequenced clones obtained from earlier generations. The activity of the six-error variant is greater than that of a similar five-error variant that lacks the mutation at position 94, and a four-error variant that lacks both the mutations at positions 94 and 187 is less active still. The three-, two-, and one-error variants containing subsets of these mutations are, in general, progressively less active. With few exceptions, the addition of any of the major mutations (at nucleotide positions 94, 103, 187, 270, 271 and 312) to a variant that lacks that mutation results in increased activity. The same can be said for other, less frequent mutations, including the deletion at position 87. Therefore, the effects of many mutations appear to be additive with respect to their selective value, a fact that partially explains the increase in average k! value with increasing generation. i Nevertheless, the major mutations do not show strict additivity, but rather contribute to the overall selective value in a manner that is dependent on the context of co-occurring mutations. A clear example is the poor performance of a three-error variant containing mutations at positions 103, 189 and 271 (Table 2). While the 103 and 271 mutations both seem to be advantageous, the mutation at position 103 was found to increase in frequency concomitantly with the mutation at position 270, as noted above. If nucleotide positions 103 and 271 are both mutated, then a G-C base pair could occur at the end of the P3 helix of the ribozyme (Fig. 4). However, the enhanced activity that results from this putative base pair apparently cannot be realized unless the mutation at position 270 is also present.
Individual phenotypes The functional impact of the eleven major mutations on the Ca*+-dependekt RNA-cleavage activity of the ribozyme was assessed by a comparative assay of the wild-type RNA and- 3% individual variants (Table 2). Many of these variants were discovered as a result of sequencing individuals from the even-numbered generations, although :some were produced by site-
Another example of contingent mutations is seen with the mutation at position 189, which improves the activity of a single-error variant with a mutation at position 258, but lowers the activity of a variant containing the four most frequent mutations (at positions 103, 270, 271 and 312). An A-G mutation at position 314, observed in four individuals from G,,, does not
0
1
2
3 4 Mutations (k)
. .
:
5
6
7
J
Fig. 7. Relationship between phenotype and number of mutations relative to the wild type for 32 ribozymes, as listed in Table 2. There is a strong positive correlation (P~,~ = +0.84) between these two variables. Phenotype is the percentage of substrate cleaved in the Caz+-dependent reaction under our standard assay conditions (see Materials and methods).
6, and based on an examination of the individual sequences, it is clear that not all of the eleven most frequent mutations occur independently of each other. The most obvious covariation is a strict negative correlation beltween the mutations at position 260 (C+A) and at position 270 (A+G), which were never observed in the same individual despite the simultaneous high frequency of both mutations, in G,, for example. A less dramatic example of mutual exclusivity involves nucleoltides at positions 94 and 258. Only one individual out of the 300 that were sequenced was found to contain both of these mutations, although a chi-squared test failed to reject the null hypothesis that this frequency could be expected, based on thle frequencies of the individual mutations (p > 0.1).
As seen in Figure
729
7%
current
Biology
1993, Voi 3 No 11
appear to augment catalytic activity. The same is true for other mutations that occur at frequencies of no more than 10 % at any time during the evolutionary lineage (Fig. 5). In order to assess diirectly the mutual exclusivity of the mutations at nucleotide positions 260 and 270, we constructed two variants by site-directed mutagenesis, one containing only thesle two mutations and one containing these two mutations plus the U-X mutation at position 271. The 260/270 double mutant was virtually inactive, exhibiting lower activity than the single-error mutants with mutations at 260 or 270. The 260/270/27I triple mutant was active, but less so than either the 26O/271 or 270/271 double mutants. We conclude, therefore, that a structural constraint precludes the cooccurrence of mutations at positions 260 and 270 they represent alternative solutions to the problem of Ca*+-dependent acltivity, as previously suggested 151. Individuals that contain the mutation at position 270 are apparently superior, because the variants containing mutations at position 260 are driven to extinction.
Discussion Evolutionary
patterns
Laboratory experiments with evolving populations of RNA molecules provide a rare opportunity to follow genetic changes over several generations and to match these changes with concomitant changes in phenotype. Early studies with Q$ replicase and the RNA from bacteriophage Qp de:monstrated that certain RNA sequences can come to dominate the population after many rounds of selection 118-201. Experiments exposing bacterial strains in a chemostat to strong directional selection 1211,and analyses of natural populations under selection for gross phenotypic traits [22,231,have also enhanced our understanding of evolutionary responses. However, artificial lineages such as the one described here allow a detailed investigation of the genotypic and phenotypic covariation that occurs in a precisely defined environment. Our results show that RNA populations display some of the tendencies of other evolving systems, including the occurrence of functional traits that are synergistic or mutually exclusive. The overall change in our test-tube population of evolving molecules is analogous to the response of a natural population undergoing directional selection the wild-type sequence evolved into a distinct, sevenerror variant with a hligher ‘fitness’ under the chosen reaction conditions. There is a clear transition toward this solution from the diversity plateau reached during the middle generations. Closer inspection of the evolutionary lineage reveals’ a transient alternative based on variants that containmutations at either position 258 or 260, which arose early but eventually faded to extinction. Such a complex response indicates that the evolving population of ribozymes is behaving as a
‘quasispecies’, as defined by Eigen 1241, with an adaptive response that can fluctuate as variants of variants are generated over the course of evolution. The dominant seven-error variant in G,, may itself reflect only a temporary occupation of a local maximum on the adaptive landscape, and propagation of additional mutations could further enhance the population’s phenotype. Our in vitro evolution system iteratively selects subpopulations, beginning from a large, diverse, initial pool. This initial pool contains a large number of variants, distributed about the wiid type in such a way that there is a progressively more sparse representation of the increasingly higher-error sequences. Advantageous higher-error variants, which are either absent or present at very low frequency in the initial population, require many generations to become predominant. The increase in /z value with generational time reflects the emergence of higher-error variants that, because of the general additivity of the beneficial effect of individual mutations, are more active than the lower-error variants. We suggest that the plateau in overall activity that is reached by G,, reflects the fact that most of the genetic diversity has been selected out of the population, and that subsequent flutuations about this plateau are likely to be due to stochastic effects. We invoke a second population-genetic parameter, heritability, to help explain why this should be so. If the evolving population responds to selection to the same extent that the selected variants are superior to the average of all variants in the population, then the heritability of the catalytic trait is close to one - its maximal value. High heritability values lead to a rapid response to selection on a particular trait. However, when most variation in traits that affect fitness has vanished from a population as a consequence of continued selection, the heritability of these traits is near zero. Further selection on these traits will not be effective, and any variation that does exist will be due primarily to stochastic processes, including experimental variability. The G,, population, possessing little variation in the constellation of the seven major mutations, has reached such a fitness plateau. Subsequent generations are expected to exhibit phenotypes that oscillate about the current level until new, more advantageous mutations arise in the population, interdependent
mutations
We have demonstrated that there are at least eleven nucleotide positions in the Tetruhymenaribozyme at which mutations can enhance the ability of the ribozymes to cleave a target RNA substrate in the presence of CaCl,. In general, the effects of these mutations are additive, but this additivity can be highly context-dependent, Furthermore, two of the eleven most frequent mutations, at positions 260 and 270, are mutually exclusive.
Evolution
in
vitro of ribozymes Lehman and Joyce
The C+A mutation at position 260 first rose in frequency in the population, but then was displaced by the A+G mutation at position 270. The mutation at position 270 arose concomitantly with the A+G mutation at position 103, both occurring only after the U+C muttiion at position 271 had been established. Tested either alone or in concert with the mutation at position 271, the mutation at position 260 generates a more efficient catalyst than the mutation at position 270. However, the combination of mutations at positions 103, 270 and 271 is far superior to the pair of mutations at 260 and 271 (Table 2). Thus, establishment of the mutation at position 271 enabled the mutations at positions 103 and 270 to increase in frequency together and to displace the mutation at position 260. It is difficult to see why the mutations at positions 260 and 270 are mutually exclusive with respect to Ca2+dependent RNA cleavage activity of the ribozyme. In the wild-type ribozyme,, the cytosine residue at position 260 is believed to form a base triple with the C109-G212 base pair 115,251. The C-+A mutation at position 260 is expected to disrupt this interaction, although it is conceivable that a C109-G212-A260 base triple of somewhat altered geometry could form instead. Nucleotides A103 and U271 lie within opposing strands at the end of the P3 helix. Based on phylogenetic analysis and site-directed mutagenesis data, these two nucleotides are believed not to form a Watson-Crick pair, but rather a reverse Hoogsteen pair 126281. The reverse Hoogsteen geometry could not be accommodated by the 103:A+G and 271:U-X double mutant. The mutations at positions 103 and 271 might instead result in an additional Watson-Crick base pair at the end of the P3 helix, presumably enabled by the presence of the mutation at position 270. It is striking that among 100 published group I intron sequences 115,29-321 there is a s’trong correlation between the occurrence of a G at position 270, and the cooccurrence of a G at position 103 and a C at position 271 (x2 = 48.3, p << 0.001). Mutation
categories
Considering the patterns by which mutation frequencies change (Fig. 6) we delineate three distinct dynamics. Logarithmic behavior is seen for the four mutations (at positions 103, 270, 271, and 312) that rose quickly to fixation. These mutations appear to be unambiguously beneficial to any variant that possesses them. Linear behavior is seen for two mutations, at positions 187 and 189. ‘This pattern may be a reflection of an effect on the ribozyme that is not strictly on its catalytic efficiency (for example, an effect on folding of the ribozyme into the proper three-dimensional conformation), so that a constant fraction of all variants having these mutations survive each generation, without exponentiafienrichment through amplification. Thus, in vitro evolutionary fitness involves the same two components as‘oiganismal fitness, fecundity and survival, with mutations at positions 187 and 189 affecting primarily the 1:atter.Transient behavior is seen for mutations at positions 193, 258, and 260. These may
RESEARCH PAPER
represent advantageous changes that were outcompeted by, and/or incompatible with, later, more beneficial mutations. The behavior of the mutations at positions 87 and 94 over the course of evolution cannot readily be assigned to any of the three above categories. The way in which these mutations contribute to the catalytic activity of the ribozyme appears to be highly dependent on the existence of other mutations in the molecule. Superimposed on these three dynamics are four classes of mutations, distinguished by their presumed manner of imparting a selective value to the variants in which they occur. Ca*+-specific mutations (at positions 103, 258, 260, and 270) putatively improve the ribozymes’ ability to use Ca2+ as the divalent metal cation for catalysis. These mutations have not been seen at significant frequency in other phylogenies that have been carried out in our laboratory, and show a tendency to improve a variant’s activity in 10 m M CaCl, more so than in 10 m M MgCl, (unpublished observations). Generally favorable mutations (positions 87, 94, 271 and 312) have been observed at high frequencies in one or more of the other phylogenies that have been followed and that involve the imposition of different selection constraints. These mutations are hypothesized to improve the catalytic activity of the ribozyme, in some general sense, in the in vitro reaction. They have been shown, for example, to enhance the ribozyme’s ability to cleave a DNA substrate in 10 m M MgCl, 141. Putative folding mutations (at positions 187, 189, and 193) occur in a region that is necessary for the proper three-dimensional folding of the ribozyme [11,33,341. This region is highly mutated in other phylogenies, although not at the same nucleotide positions. Nucleotides at positions‘187 and 189 are especially notable because they are protected from cleavage by Fe(II)-EDTA in the intact ribozyme, but are susceptible to cleavage in the isolated P4--~6 folding domain t341. Finally, neutral mutations are those that appear over the course of evomtion in vitro, but do not show any of the above trends. They tend to occur at a low frequency and presumably have little effect on the ribozyme’s ability to meet the imposed selection constraint. Mutations that fall in the three non-neutral classes could potentially alter the active site of the ribozyme directly, or have an indirect effect that is propagated to the active site’from more distal regions. Analogous distinctions have been made for other macromolecules. One example is the tRNA molecule, in which certain positions have been directly implicated in aminoacylsynthetase binding, whereas others seem to influence binding without directly contacting the synthetase 1351. Another example is the antigen-binding domain of the immunoglobulin molecule, which includes both the hypervariable and framework regions, with only the former being directly involved in antigen contact 136,371.
731
732
Current
Biology
1993, Vol 3 No 11
Conclusions We demonstratethat RNA molecules with desired propelties can be generated by a laboratory technique that closely m imics the biological evolutionary process..In the l@age described here, the route by which the ultimately superior variant arose was strongly influenced by the variation that was present in the initial population. Had the initial pool been populated by a greater proportion of higher-error mutants, the ‘winning’ seven-error variant m ight have dominated sooner. Alternative, high-error solutions m ight have emerged, a result that could be made more easily attainable by increasing the level of ongoing mutation. Bqth of these pathways are currently being explored in our laboratory. Functionally significant portions of the ribozyme can be inferred from these results. We are able to discern the effect that specific mutations will have on the evolutionary future of a specific variant. The putative G-C base pair, involving nucleotides at positions 103 and 271, and the coordinate alteration of position 270, appear to be impo’rtant for the ribozyme’s new-found ability to use Ca2+ions to promote catalysis.Mutations at positions 258 and 260, which disrupt a base pair and a base triple, respectively [15,25,28],also seem to affect the ribozyme’s use of Ca2+ ions. All other mutations that we have observed are likely to be either neutral or able to improve the catalytic efficiency of the ribozyme in a general sense. Incorporation of the information gained in this study into models of ribozyme structure and function will contribute to our understanding of the chemical mechanismof RNA-mediatedcatalysis.
chromatography, yielding 183 pmol of material, of which 40 pmol was used to initiate the t;z vitro evolution procedure. Selective amplification
procedure
In each generation, 40 pmol (1 PM) of the ribozyme population was incubated with 100 pmol (2.5 PM) of RNA substrate (5’-CCCUCU-A3UA,LJA,UA3-3’prepared ; synthetically) in the presence of 10 m M CaC12and 30 m M EPPS (pH 7.5) at 37 “C for 3.25 hrs. Full-length ribozyme-containing products were separated from excess substrate by polyacrylamide gel electrophoresis and affinity chromatography on DuPont Nensorb. The resulting material was subjected to selective isothermal amplification [41,42], followed by selective cDNA synthesis and non-selective PCR amplification [51.The subsequent ribozyme population was generated by tn uttro transcription of 2 pmol of the PCR products in the presence of [3HIUTP. Cloning and sequencing
Products of selective cDNA synthesis from each of the evennumbered generations were amplified by PCR using primers that carry unique EcoR I and Hhzd III sites, suitable for cloning into the pUC18 plasmid. Recombinant plasmid DNA was used to transform competent DHSa-F’Escherlcbia coli cells 14,431,which were grown. on carbenicillin-containing plates. Colonies of successful transformants were chosen at random and grown overnight in liquid media. Plasmid DNA was prepared by the boiling lysis method 1441and screened for the presence of insert by restriction digestion. The DNA was sequenced throughout the ribozyme gene by dideoxy chain termination [45,46]. (Nucleotide sequences of individual clones are available upon request.) Individual RNAs were prepared by PCR amplification of the plasmid DNA, followed by itr vitro transcription of 2 pmol of the PCR products in the presence of f3H]UTP.The RNA was purified by polyacrylamide gel electrophoresis and affinity chromatography on DuPont Nensorb. Diversity estimates
Materials and methalds Preparation of the st;wtingpool
of variants
Plasmid p’IX-21 [38], containing the gene for the wild-type Tetrahymefza ribozyme, was linearized with Hi?zdIII, and then digested with ‘I7 gene 6 exonuclease to remove the template (coding) strand of the ribozyme-encoding portion of the plasmid [39]. Four mutagenic oligodeoxynucleotides, each randomizing 35 nucleotide positions at an expected degeneracy of 5 % per position, were prepared on an automated DNA synthesizer, gel purified, and 5’-phosphorylated using T4 polynucleotide kinase [4]. A five-fold molar excess of each mutagenic oligonucleotide, together with a five-fold excess of an oligonucleotide that is complementary to the last 20 nucleotides of the ribozyme, were hybridized with 10 pmol of the digested plasmid DNA by incubation at 70 “C for 5 mins, followed by slow cooling to 25 “C over 45 mins. The annealing buffer contained 2 m M MgCl,, 50 m M NaCI, and 20 m M Tris (pH 7.5). Subsequent extension and ligation. of the annealed oligonucleotides was carried out in the presence of 0.125 U/p1 T4 DNA polymerase, 0.25 U/p& T4 DNA ligase, 0.2 m M dNTPs, 0.4 m M ATP, 5 m M MgCl,, ,lCl m M Tris (pH 7.5), and 2 m M dithiothreitol, incubated*. at 37 “C for 90 mins. Approximately 6 pmol of the resul&g double-stranded DNA was used in an in vitro transcription reaction to produce a heterogeneous pool of RN.A [4,40]. The RNA was purified by polyacrylamide gel electrophoresis and Sephadex
Genetic diversity of ribozyme sequences was calculated by: H = -Zpi (In p>; where :pi is the frequency of a particular subtype in the population. Diversity, H, is maximal when nucleotide positions are equally likely to be A, C, G or U. Ribozyme populations were partitioned in two different ways, addressing either the diversity at positions 260 and 270, or the diversity averaged over all 353 variable nucleotide positions. Site-directed mutagenesis
The non-coding strand of plasmid DNA was obtained by T7 gene 6 exonuclease digestion and was hybridized with the appropriate mutagenic oligonucleotide and an oligonucleotide that is complementary to the last 20 nucleotides of the ribozyme (see Preparatiojl of the starting pool of variants, above). To produce various combinations of mutations at positions 260, 270, and 271, a single mutagenic oligonucleotide was prepared, having the sequence S-CCCCGACCGACRC’ITAGTCTGTKAACTGCATCCATATCAAC-3’ (mutated positions shown in bold; R = G or A; K = G or T). This oligonucleotide was used to construct partially mismatched plasmid DNA, which was amplified by PCR and cloned into pUC18 (see CloW~zgatzd sequeizci?zg,above). Plasmid DNA, prepared from individually selected colonies, was screened for the presence of the mutation at position 260 by digestion with H@I (which recognizes the sequence G?TAAC) and sequenced throughout the ribozyme gene, Clones were identified that carry the desired combination of
Evolution in vitro of ribozymes Lehman and Joyce mutations, and corresponding described above.
RESEARCH PAPER
RNAs were prepared as 21.
Assay of ribozyme activity
Composite populations and individual variants were assayed by incubati-ng 1 pM ribozyme and 2.5 uM RNA substrate (5’-GGCCCUCLJA,UA,UA,uA,-3’) in the presence of 10 mM CaCI, and 30 mM EPPS QpH7.5) at 37 “C for 3.25 hrs. The substrate was prepared by i?z Wro transcription of a synthetic DNA template in the presence of [a-3*PIATP [40]. Reaction products were separated by electrophoresis in a 20 % polyacrylamideB M urea gel and quantitated by Cerenkov counting. Acknowledgements: We thank D Decker, X-C Dai and R Breaker for helpful discussions, T Macke for expertise with computer graphics, and WA Reum for technical assistance.This work was supported by grants from NASA and the NIH.
References 1. BRENNERSA, LERNERRA: Encoded combinatorial chemistry. Proc NatlAcad Scf USA 1992,89:5381-5383. 2. SZOSTAKJW: In vitro genetics. Trends Bfochem Sci 1992, 17:8’+93. 3. JOYCE GF: Directed molecular
evolution.
Scf Am 1992,
26790-97. 4. BEAUDRY AA, JOYCE GF: Directed evolution of an RNA enzyme. Science 1992,257635-641. 5. LEHMANN, JOYCEGF: IEvolution fn v&o of an RNA enzyme. Nature 1993, 361:182--185. 6. BURKEJM: Molecular genetics of group I introns: RNA structures and protein factors required for splicing - a review. Gene 1988, 73:273-294. 7. CECH TR: Self-splicing of group I introns. Annu Rev Biochem 1990, 59543~568. 8. PYLE AM, MC~WICCEN,JA, CECH TR: Direct measurement of oligonucIeotide substrate biding to wild-type and mutant riboxymes from Tetrabymena. Proc Nat1 Acad Sci US A 1990, 87:8187-8191. 9. PICCIRILLIJA, VYLE JS, CARUTHERSMH, CECH TR: Metal ion catalysis in the Tetrahymena ribozyme reaction. Nature 1993, 361:85-88. 10. GROSSHANSCA, CECH TR: Metal ion requirements for sequence-specific endoribonuclease activity of the Tetrabymena ribozyme. Biochemisty 1989, 2868886894. 11. CELANDERDW, CECHTR: Viiuallzmg the higher order folding of a catalytic RNA molecule. Scfence 1991, 251:401-407. 12. HERXHLAG D, CECH TR: Catalysis of RNA cleavage by the Tetrabymenu tbermopbila ribozyme. 1. Kinetic description of the reaction of an RNA substrate complementary to the active site. Biocbemisty 1990, 29:10159-10171. 13. MICHEL F, JACQLIIERA, DUJON B: Comparison of fungal mitochondrial introns reveals extensive homologies in RNA secondary structure. Biocbfmie 1982, 64:867-881. 14. DAVIESRW, WARING RB, RAY JA, BROWNTA, %AZOCCHIO C: Making ends meet: a model for RNA splicing in fungal mitochondria. Nature 1982, 300:719-724. 15. MICHEL F, WESTHOFE: Modelling of the three-dimensional architecture of group I catalytic introns based on comparative sequence analysis. JMol Bioll990, 216:585-610. 16. MICHEL F, HANNA M, GREENR, BARTELDP, SZOSTAKJW: The guanosine binding she of the Tetrabymena ribozyme. Nature 1989, 342:391--395. 17. HAMMINGRW: Coding and Information l’heoy. Englewood Cliffs: Prentice-Hall; 1980. 18. MILLS DR, PETERSONRL, SPIEGELMANS: An extracellular Darwinian experiment with a self-duplicating nucleic acid molecule. P?TXN&l Acad Scf US A 1967, 58217-224. S: Further extrkellular Darwinian 19. LEMSOHNR, SPIECE$MA~~ experiments with:rcplicating RNA molecules: diverse variants isolated under different selection conditions. P?DC NatlAcad Scf USA 1969, 63:805-811 H, ORCELLE, SPEIGELMAN S: 20. SAFFHIUR, SCHNEIDER,~BERNLOEHR In vftro selection of bacteriophage Qfi ribonucleic acid
22. 23. 24.
variants resistant to ethidium bromide. J Mol Biol 1970, 51:531-539. DYKHUIZEN DE, HARTL DL: Selection in chemostats. Microbfol Rev 1983, 47: 150-168. ENDLERJA: Natural selection in color patterns in Poe&la reticukzta. Evolution 1980, 34:76-91. MALHOTRAA, THORPERS: Experimental detection of rapid evolutionary response in natural lizard populations. Nature 1991, 353:347-348. EIGENM, SCHUSTER P: The hypercycle: a principle of natural self-organization. Part A: emergence of the hypercycle. Naturwhenscbajien 1977,64: 541-565.
25. MICHEL F, ELLINGTON AD, COUTURE S, SZOSTAK JW: Phylogenetic and genetic evidence for base-triples in the catalytic domain of group I introns. Nature 1990, 347:578-580. 26. KIM S-H, CECH TR: Three-dimensional model of the active site of the self-splicing rRNA precursor of Tetrabymena. Proc Nat1Acad Sci Us A 1987, 8487888792. 27. COUTURE S, ELLINGTONAD; GERBERAS, CHERRYJM, DOUDNA JA, GREEN R ET AL.: Mutational analysis of conserved nucleotides in a self-splicing group I intron. J Mol Biol 1990, c 215:345-358. 28. GREEN R, ELLINGTON AD, SZOSTAKJW: In vitro genetic analysis of the Tetrahymena self-splicing intron. Nature 1990, 347:406-408. 29. DAVILA-A~ONTEJA, Huss VAR, SOGIN ML, CECH TR: A selfsplicing group I intron in the nuclear pre-rRNA. of the green alga, Ankistrodesmus stfptatus. Nucleic Acids Res 1991, 19:4429-4436. 30. DEPRIESTPT, BEEN MD: Numerous group I introns with variable distribution in the ribosomal DNA of a lichen fungus. JMol Bioll992, 228315-321. 31. OHTA E, ODA K, YAMATO K, NAKAMURAY, TAKEMURAM, NOZATON ETAL.: Group I introns in the liverwort mitochondrlal genome: the gene coding for subunit 1 of cytochrome. ox&se shares five intron positions with its fungal counterparts. Nucleic Acids Res 1993, 21:1297-1305. 32. BAO Y, HERRIN DL: Nucleotide sequence and secondary structure of the chloroplast group I intron Cr.psbA-2: novel features of this self-splicing rib&me. Nucleic Acids Res 1993, 21:1667. 33. FLOR PJ, FLANEGANJB, CECH TR: A conserved base pair within helix P4 of the Tetrabymena ribozyme helps to form the tertiary structure required for self-splicing. EMBOJ 1989, 8:3391-3399.
34. MURPHYFL, CECH TR: An independently folding domain of RNA tertiary structure within the Tetrabymerza rlbozyme. Biocbemfsty 1993, 32:5291-5300. 35. CAVARELLIJ, REESB, RUFF M, THIERRYJ-C, MORAS D: Yeast tRNA-Asp reconition by its cognate class II aminoacyl-tRNA synthetase. Nature 1993, 362:181-184. 36. SEGALDM, PADLANEA, COHEN GH, RUDIKOFFS, POTI-ERM, DAVIESDR: The three-dimensional structure of a phosphorylcholine-binding mouse immunoglobulin Fab and the nature of the antigen binding site. Proc Nat1 Acad Sci US A 1974, 71:4298-4302. U, WILSON IA: Structural evidence 37. Rr~i JM, SCHULZE-GAHAMN for induced fit as a mechanism for antibody-antigen recognition. Science 1992, 255~959-965. CA, CECH TR: Sequence-specific endo38. ZAUG AJ, GRO~~HANS ribonuclease activity of the Tetrahymena ribozyme: enhanced cleavage of certain oligonucleotide substrates that form mismatched ribozyme-substrate complexes. Biochemistry 1988, 27:8924-8931. 39. JOYCEGF, INOUET: A novel technique for the rapid preparation of mutant RNAs. Nucleic Acids Res 1989, 17:711-722. 40. MILLIGANJF, GROEBEDR, WITHERELLGW, UHLENBECKOC: Oligoribonucleotide synthesis using T7 RNA polymerase and synthetic DNA templates. Nuclefc Acids Res 1987, 15:8783-8798. 41. GUATELLI JC, WHITFIELD KM, KWOH DY, BARRINGERKJ, RICI-IMANDD, GINGERAS TR: Isothermal In vftro amplification of nucleic acids by a multienzyme reaction modeled after retroviral replication. Proc Nat1 Acad Sci US A 1990, 87:1874-1878.
733
734
Current Biology 1993, Vol 3 No 11 42. JOYCEGF: Selective amplification techniques for optimization of ribozyme function. In Antisense RNA and DNA. Edited by Murray JAH. New York: Wiley-Liss; 1992:353-372. 43. HANAHAN D: Techniques for transformation of E. calf. In DNA Cloning: A ,Pt-acticalApproach. Edited by Glover DM. Oxford: IRL Press; 1985:109-135. 44. HOLMES DS, QUGLEY M: A rapid boiling method for the preparation of bacterial plasmids. Anal Biochem 1981, 114:193-197. 45. SANGER F, NICKLEN S, COUEON AR: DNA sequencing with chain-terminating inhibitors. Proc Nat1 Acad Sci US A 1977, 74:5463-467.
46. ZAGURSKY R, BAUMEISTER K, LOMAXN, BERMANML: Rapid and easy sequencing of large linear double-stranded DNA and supercoiled plasmid DNA. Genet Anal Tech 1985, 2:8’$-94 47. CADWELL RC, JOYCE GF: Randomization of genes by PCR mutagenesis. PCR Methods Applic 1932, 2:28-33. 48. BURKE JM, ETAL.: Structural conventions for group I introns. Nucleic Actis Res 1987, 15:7217-7221.
Received:23 August 1993; revised:1 mtober Accepted:7 October 1993.
1993.