277
Nucleic acids From sequence to structure to function Editorial overview Paul .I Hagerman* and Ignacio Tinoco .Irt Addresses * University of Colorado Health Sciences Center, Denver, CO 80262, USA t Chemistry Department, University of California, Berkeley, CA 94720-1460, USA Current Opinion in Structural Biology 1996, 6:2?7-280
desired characteristics. Most of the studies to date have involved RNA. Target molecules have included amino acids, nucleotides, vitamins, antibiotics, proteins, viruses, ribosomes, and even some metal ions. N e w or improved catalytic activities for ribozvmes and deoxyribozymes have been the main goal of the functional selections.
© Current Biology Ltd ISSN 0959-440X Abbreviations 3D three-dimensional FRET fluorescence resonance energy transfer
One of the principal goals of the study of nucleic acid structure is the prediction of three-dimensional (3D) structure from primary base sequence. Our understanding of the sequence-structure relationship is far from perfect; however, as is outlined in the following reviews, significant progress is being made on several fronts: better understanding of the relationship between sequence and secondary structure, significant advances in the determination of 3D structures through better instrumentation and computational methods, and the further development of our understanding of the basic forces that govern higher order nucleic acid structures. T h e in vitro selection of a pool of randomly synthesized nucleic acids can identify molecules with nearly any desired function. For example, a random 25-mer corresponds to 1015 different sequences. Surely one of these, or maybe even a few hundred of them, should be able to bind a target molecule or catalyze a desired reaction. T h e specificity and functionality of the selected molecules depend purely on the originality of the experimenter. T h e number of publications and the improvements in the selection techniques in this field have recently increased exponentially, and Uphoff, Bell and Ellington (pp 281-288) review the most recent progress. T h e methods seem embarrassingly easy, although their implementation requires great care to avoid artifacts. T h e D N A is synthesized on a commercial machine with all four phosphoramidites introduced at each position in the sequence. R N A can be synthesized similarly, or it can be transcribed from the random DNA. Nucleic acids in the random pool are selected by binding them to a target molecule, or by having them perform some function such as catalyzing a reaction. T h e selected molecules are amplified - - by P C R for D N A and reverse transcription followed by P C R for RNA - - and another round of selection is carried out. Amplification and selection are continued until a few molecules are obtained with the
Pre- and post-modification of the nucleic acids have recently improved the specificity of the selection procedure. A substrate or inhibitor can be covalentlv attached to the random oligonucleotides to target them to a particular enzyme. Use of unusual nucleotides, such as 2'-O-methyl or 2'-fluoro derivatives, in the random synthesis can increase the stability of the molecule in vivo. This is crucial for the clinical use of the selected molecules. To produce nucleic acids that can bind irreversibly to their target we can add groups that can be photocrosslinked, such as 5-iodouracil-containing nucleotides, to the pool of molecules. T h e many improvements and new applications that are appearing make it clear that the power of this 'random' method is just beginning to be exploited. T h e ultinqate goal of computational biologists is to be able to calculate the 3D structure of a nucleic acid or protein from its sequence. Of course, the conformation will depend on the solvent and the temperature, so the environment must be considered also. Dynamics are important because the conformations are not rigid, and the biological functions may require changes in shape of the molecules, such as in RNA-protein interactions. Although knowledge of sequence alone is not yet sufficient to lead to a nucleic acid structure prediction, the tremendous advances that have been made in recent years in calculating conformations is reviewed by Louise-May, Auffinger and Westhof (pp 289-298). Direct quantum mechanical calculations of electronic wavefunctions and electron densities can be used to learn about local conformational perturbations, but for global conformations, semi-empirical methods must be used. Experimental bond lengths and bond angles are combined with potential-energy functions for stretching and bending bonds, for changing torsion angles, and for non-bonded interactions. A global energy minimization provides the lowest energy structure. It is so difficult to find a global minimum in a space of thousands of dimensions that one is usually forced to find the local minimum in a region chosen by some experimental criterion. To obtain a free-energy minimum, and to learn about the fluctuations in structure, molecular dynamics simulations are most often used. T h e potential-energy functions are used to calculate the forces of the macromolecule or solvent
278
Nucleic acids
on each atom, and Newton's Second Law is used to calculate their accelerations. The sample is started at a high temperature (1000K, for example) so that the atoms have high kinetic energy. Then the system is cooled to the desired temperature. This is repeated with different starting geometries in order to search all possible conformational space. In order to obtain a reasonably correct conformation, experimental data such as NMR-derived restraints must be added. For studying dynamics, a conformation is chosen, and fluctuations around this conformation are explored. Simulations of macromolecules are limited at present to less than a nanosecond by practical computer considerations. Of course, the larger the system, the longer it takes to calculate one step (-1 femtosecond) in the simulation. However, including sufficient solvent molecules in the simulation is particularly important for nucleic acids because of the long-range interactions produced by the phosphate charges. There have been recent advances in using crystal-like periodicities to calculate all non-bonded interactions with solvent without having to make an arbitrary, truncation of distant molecules. Empirical methods that are not dependent on potential energy functions have also been vew useful. The folding of nucleic acids can be considered as two separate problems: the formation of secondary structure (base pairing to form double helices, loops and bulges), and the interactions of the secondary structure elements to form the tertiary structure. Comparative sequence analysis and experimental thermodynamic parameters measured for each secondary structure element have been applied most often to the problem of determining secondary structure. Genetic algorithms and constraint-satisfaction programming methods that incorporate many different types of information can be applied to solving the general folding problem. Realization of the goal of predicting 3D nucleic acid structure from sequence rests with a detailed understanding of the thermodynamics of base stacking and helix stability. In his review on this subject, Turner (pp 299-304) outlines recent progress in the determination of experimental (thermodynamic) stability parameters for nearest-neighbor interactions among various base-pair combinations, including data on unnatural nucleic acids and tandem mismatches. The nearest-neighbor approximation, which holds that adjacent base-pair interactions are the principal determinants of sequence-dependent stabilities, is found to provide good agreement for standard Watson-Crick pairing arrangements in DNA and RNA helices. However, that approximation breaks down for non-Watson-Crick pairing arrangements, including tandem mismatches, and for mixed R N A - D N A hybrid helices. Also noteworthy is the observation that initiation free energies for oligodeoxynucleotide formation derived from optical melting studies often differ substantially from those derived from calorimetric measurements. Finally,
studies of changes in molar volume during melting transitions, coupled with studies of hydrophobic isosteres of thymine and adenine, are providing additional detail regarding the role of solvation in helix stability. Altona, Pikkemaat and Overmars (pp 305-316) review the conformations of multibranched DNA junctions, and discuss the advantages and limitations of the methods used to study them. The formation of multibranched junctions are a necessary, step in the homologous recombination of DNA, and the four-way Holliday junction is the best known example. Cruciform structures formed by extruding two hairpin loops at DNA sequences with inverted repeats are other examples of four-helix junctions. Three-helix junctions may occur during replication and repair in DNA. Junctions can occur as part of the folding of any RNA molecule, and are found in transfer RNAs, ribosomal RNAs, messenger RNAs and ribozymes. The conformations of the junctions affect their biological functions, so it is important to learn the rules for stacking and folding the helices at junctions. Oligonucleotides can be synthesized to form junctions with three, four or more intersecting helices, and their conformations have been systematically investigated. There are two types of measurements: those that reveal the global arrangements of the helices, and those that provide the detailed conformation at the junction. Electrophoretic mobility in native gels can identify the arrangements of the helices. The effect of lengthening pairs of arms can be very revealing. For example, if the helices form an X, lengthening the two arms at the bottom of the X will change the mobility less than lengthening the upper-right and lower-left arm. In a tetrahedral arrangement of helices, lengthening any two will produce the same change in mobility. Transient electric birefringence, as reviewed by Hagerman and Amiri (pp 317-321), can also establish the relative arrangement of the helices at a junction. Fluorescence resonance energy transfer (FRET) can measure distances between a donor and an acceptor in the range 20-70fi,. Therefore, carefully calibrated experiments can lead to the determination of the angles between the arms of the junction. Time-resolved fluorescence energy transfer gives the spectrum of distances between donor and acceptor; thus, the range of angles between each pair of arms can be obtained. NMR is good for local structure determination. Nuclear Ovethauser effects reveal proton-proton distances less than 5 ~, and spin-spin splitting depends on torsion angles. Assignment of the NMR spectra is difficult because even the smallest junctions tumble slowly enough in solution to produce broad NMR resonances. However, isotope labeling, together with 3D and four-dimensional experiments, has allowed progress to be made. The stacking of the helices at the junctions is very dependent on sequence. This is not surprising because of the very different stacking propensities of nearestneighbor base pairs. The arrangement of the helices is
Editorial overview Hagermanand Tinoco
very dependent on the concentration and charge of added cations. At low ionic strength, the arms of a four-helix junction are extended or unfolded, but as multivalent ions are added, stacking of two helices occurs and a side-by-side arrangement of stacked helices forms. More detailed conformations, along with thermodynamic and kinetic data on their formation is expected in the near future. Hagerman and Amiri (pp 317-321) concentrate on the global structure of RNA junctions as found in the hammerhead ribozyme and in transfer RNA molecules. Crystal structures give very detailed information about the atomic positions in the molecules, but it is important to know how much the overall f o l d i n g - the global conformation - - depends on crystal-packing forces. RNAs are much less compact than proteins, so crystal effects are much more important. B-form DNA molecules may crystallize in A-form, and RNA hairpins tend to crystallize as helical duplexes containing non-Watson-Crick base pairs. Two X-ray diffraction crystal structures of the hammerhead ribozyme give very similar conformations, but the conformations place some of the nucleotides required for catalysis far away from the cleavage site. It may be that these essential groups do not interact directly at the catalytic site, but it is more likely that the crystal conformation is not the active one. Solution studies of the hammerhead ribozyme by fluorescence energy transfer, chemical cross-linking and transient electric birefringence find global conformations that are qualitatively in agreement with the X-ray diffraction results, but not quantitatively. In transient electric birefringence, the molecules are oriented by an electric field pulse, and then the field-free relaxation of the molecules to random orientation is observed. This rotation relaxation can be modeled accurately for macromolecules of any shape. It is very sensitive to end-to-end distance in a molecule, and therefore to the angle between stems at a junction. As there are differences in sequences, and in the ionic composition of solvents among the crystal and solution studies, it is not possible to compare the results directly. N M R studies of hammerhead ribozymes in solution can provide local conformational results, but N M R work has progressed slowly because intermediate dynamics in the molecules broaden the resonance lines and make assignments and analysis difficult. Fluorescence spcctroscopy is rapidly emerging as a method for defining both the local environment and the global conformation of nucleic acids; Millar (pp 322-326) reviews reccnt advances in both these areas. Because the spectral properties of fluorophores are sensitive to their local environment, they can be used as probes of their surroundings within the helix. The adenine analog 2-aminopurine has proven quitc useful for this purpose, as it forms stable Watson-Crick base pairs and displays strong, environment-sensitive fluorescence. T h e adenine analog has been used to monitor the extent of base stack-
279
ing during melting transitions, and to report the presence of DNA binding proteins. Other fluorophores, attached to the C-5 position of pyrimidines and tht, s positioned in the major groove of the helix, can act as reporters of local groove geometry, detecting structural alterations in the major grove that arise through the binding of ligands to the minor groove. A second major application of fluorescence measurements involved the measurement of the efficiency of non-radiative energy transfer between donor and acceptor probes. Since the :ransfer efficiency depends on the sixth power of the donor-acceptor distance, transfer efficiencies are sensitive functions of global structure. Steady-state F R E T measurements have been applied to a wide range of structures in both DNA and RNA. T h e conclusions are subject to the caveat that the positions and mobilities of the fluorescent probes must be known with some precision; this issue remains an important challenge. As pointed out by Millar, most F R E T studies have employed steady-state measurements of donor or acceptor fluorescence. However, by performing time-resolved measurements, F R E T provides additional information about the dynamics of the transfer process, and hence the dynamics of the nucleic acid structure being probed. Such measurements have provided direct evidence for dynamic rearrangements in both three-and four-way DNA junctions, as well as in other simple non-helix elements. One class of nucleic acid structure receiving much attention of late, and reviewed by Sun, Garestier and H616ne (pp 327-333), is a triple helix formed by the association of a Watson-Crick duplex with an additional single strand. Although homopolynueleotides have long been known to form triple helices under certain conditions (e.g. poly[rU].2-poly[rA]), triple helix formation has gained importance more recently as a possible mode of modulation of gene expression. In a typical triplex, the third strand is positioned in the major groove of the helix formed by the Watson-Crick duplex, interacting via Hoogsteen or reverse-Hoogsteen interactions with the purine-rich strand of the duplex. Although only a restricted set of sequences lend themselves to triple helix formation, the number of arrangements forming stable complexes is growing, and rapid progress is being made in understanding permissible variations in sequence and even base and backbone modification. Although no crystal structures have been obtained thus far for any DNA or RNA triple helix, recent NMR and Fourier transform infrared studies have shed light on the local structure within the triplex. One important observation regarding triplex stability is that single strands capable of forming stable intramolecular and/or intermolecular structure will effectively destabilize triple helix formation. T h e authors point out that the choice of optimal oligonucleotides for triple helix formation must take into account both the intrinsic ability to form the desired structure and the propensity for forming competing structures. T h e authors cite an additional application of F R E T for the study
280
Nucleic acids
of triple helix formation: namely, the assignment of the polarity of the third strand. By attaching a probe to one end of the duplex and to one end of the incoming single strand, the polarity is easily assigned on the basis of the efficiency of transfer. Finally, the authors contrast the incoming strand ofa RecA nucleoprotein complex with the corresponding strand in normal triple-helix formation; in RecA-mediated triple-helix formation (R-form DNA), the third strand is apparently located in the minor groove of the extended Watson-Crick duplex. In Bloomfield's review on DNA condensation (pp 334-341), he discusses recent observations pertaining both to the process of compaction and to the nature of the compact state. It has been known for some time that multivalent cations, either alone or in conjunction with alcohols or 'crowding' polymers, cause large DNA molecules to undergo a profound collapse to compact structures resembling toroids or rods. The partitioning into various morphologies (e.g. toroids, rods and branched structures) depends on the conditions used to initiate collapse as well as the lengths and even sequences of the DNA molecules being used. In fact, one of the important
challenges in the study of DNA condensation is coming up with a precise definition for the process. For example, transition metals can promote the aggregation of linear DNA, but Bloomfield distinguishes this process from one that leads to ordered condensates. One tool that has been recently applied to the study of DNA condensation is the method of fluorescence microscop~; in which individual molecules can be monitored during the process of compaction. Alcohols such as methanol or ethanol appear to act in part by reducing the dielectric constant of the solution; however, their use appears to alter both the gross morphology of the condensed molecules (rods or fibrous arborizations) and the local helix conformation. In particular, in the presence of the alcohols, there appears to be a B--+A transition, which itself may promote aggregate formation. Finally, there has been a recent emergence of theoretical models for both the condensation process and the forces of interaction which stabilize the compact state. For example, correlated cation fluctuations have been proposed as a force of stabilization for closely apposed helical segments. Both the new theoretical models and the wealth of recent experimental observations provide a fertile arena for additional work.