TECHNICAL
R e g u l a t o r y RNA motifs include such elements as an RNA-processing signal 1, a ribozyme 1, a translational or mRNA stability control element z or an RNA localization signal 1. Motifs often contain a combination of sequence and structure: an example of a sequence motif is the polyadenylation signal AAUAAA, while structural motifs include complex secondary and tertiary structures; for example, the clover-leaf pattern found in RNA secondary structure. Herein, we discuss how databases can be used to search sequences for the presence of such regulatory motifs. A general outline of the approach is shown in Box 1. Throughout this article, we use the example of a search for the iron-responsive element (IRE) to illustrate the principles of the approach. The IRE is a regulatory RNA element that is characterized by a phylogenetically defined sequence-structure motif (Fig. 1). Its biological function is to provide a specific binding site for the iron regulatory protein (IRP, formerly known as IRE-BP, IRF, FRP or p90). Iron starvation of cells induces highaffinity binding of IRP to IREs, leading to repression of the translation of ferritin mRNA and stabilization of the transcript that specifies the transferrin receptor 2. To conduct a database search, obligatory, compatible and incompatible features in the description of the motif must be identified and defined; Fig. 1 outlines this process for the IRE. For example, the C residue that forms a bulge in the secondary structure of the IRE was defined as an obligatory feature, whereas base-pairing between the first nucleotides of the top helix, which follow the bulged C, was considered to be optional. Furthermore, it is helpful to estimate relative weights for the presence of each feature. The relative importance of these weights can have a profound impact on the outcome of the search, as some features are judged more important than others. In the IRE search, basepairs within the bottom helix were scored differendy to reflect the different stabilities of pairing involved. Negative statements can be introduced to exclude the presence of certain features, such as a G residue in position six of the IRE loop, and so reduce the number of candidates. The points for meeting each requirement are scored, and a minimum score required for the identification of an RNA motif defined. Energies for helical structures are also calculated and scored. Including positive and penalty scores (see Fig. 1), a minimum score of eight for the bottom IRE helix was an obligatory feature: this allowed only stable helices to be selected as candidates. Similarly, a combined search for tRNA motifs3 included the following list of defined features. (1) The T-qv-C signal (two of four invariant bases). (2) The ability to form the T-tlv-C arm (a loop of seven bases and a stem of 4-5 basepairs). (3) Similar motifs for the D-signal and D-arm, aminoacyl arm and anticodon arm. The search also required the definition of a general score for weighting and comparing these features to accept or reject putative tRNAs. When a feature list is established according to the flow chart in Box 1, the conversion of this description into a program can generally be delegated to a programmer. Alternatively, several general and specific RNA search programs are available (Boxes 2, 3).
Focus
Finding the hairpin in the haystack: searching for RNA motifs THOMAS DANDEKAR AND MATrtlIAS W. HENTZE A growing list of examples underscores the roles that regulatory RNA motifs play in controllingthe genetic repertoire of cells and developin~ organisms. Once either an RNA-processing signa~ a ribozyme, an element that controls translational or mRNA stability or an P,¥A localization signal has been identifle~ it is important to searchfor other RNA sequences that bear similar regulatory signals. WhileDNA regulatory elements can often be described by a consensus sequence, RNA signals arefrequently composed of a combination of sequence and structure motifs. Here, we discuss the approaches that can be used to identify RNA motifs by searching databases. To evaluate and improve the search program, the implemented description must be tested for its search performance: for example, all mammalian ferritin and transferrin receptor IREs had to be identified. For the
GO A G C N O~O O~O O~O O~O O--'O
C
O---O O---O O---O O---O O---O O---O O---O O---O 5' 3'
45
Bottom helix
FIcRJI~1. Definingthe consensus pattern for an RNAfamily.The consensus (defined on the basis of phylogeneticcomparisons and mutagenesisdata) for the iron-responsiveelement (IRE) is shown as an example. Severalnucleotidesof the primary sequence are important (shown in single-lettercode). N indicatesthat any nucleotide is allowed at this position, with the exception of G (excluded as a negativecriterion), as it has the potential to basepair with the C of the loop. Similarly, determinants of secondary structure are conserved: the six-nucleotideMop, the top helix, the bulged C, and the bottom helix. Fivebase pairs form the top helix. For the nucleotides at the bottom of the top helix, basepairingis not mandatory. In contrast, the four base pairs at the top must always be present (obligatory feature). Another obligatoryfeature is the bulged C. The bottom helix must have a minimalbasepairingstabilityof 8 (counting GC pairs as 3, AU pairs as 2 and GU pairs as 1). One bulged nucleotide is allowed in the bottom helix, b,,t leads to a penaltyscore of -2.
TIG FEBRUARY1995 VOL. 11 NO. 2 © 1995 Elsevier Science Ltd 0168 - 9525/95/$09.50
Top helix
TECHNICAL
FOCUS
Box 1. Outline for RNA motif searches (A) Description of the motifs
(1) Compile a list of known family members (helpful tools: sequence-retrieval systems9,13, in particular SRSI3). (2) Identify common features: primary sequence, sceortdary structure, position within the RNA (e.g. untranslated region, open reading frame, intron), predicted stability of secondary structure. (3) Identify-features known to be incompatible with the biological function of the RNA family. (4) Derive a consensus pattern of the RNA family. Keep the examined examples as a first test-set for the program. Factors that may limit the usefulness of the search include the following considerations: • Are the family members well defined? Tables of features that have not been updated may impede retrieval of known examples. • Do common RNAs such as snRNA, tRNA or repetitive sequences match the family consensus despite having completely unrelated biological functions? (B) Conversion of the description into a program (1) Compile the available information, features and consensus pattern in a list of clearly defined features. (2) Decide whether a certain feature must always be present (obligatory), or whether variation is allowed. (5) Introduce weights for the presence of each feature. The importance of meeting each requirement is scored. (4) Introduction of negative statements or weights so as to exclude certain features helps reduce the number of false-positive candidates obtained. (5) Define a minimum score required for the identification of each motif and of the complete structure. (6) Translate the exact description and list of rules into an appropriate program for screening the database. This task can be delegated to a programmer; alternatively, one of the programs currently available (see Boxes 2, 3) can be used.
(C) Testing the program (1) Can the first test-set used to describe the motif be retrieved from the database? (2) Which of the known RNAs in the family that were not included in the test-set used to define the consensus pattern were missed by the test search? (These are false negatives.) Possible problems with this approach that should be borne in mind include the following: • Motifs may be overlooked if divided between two sequence entries. • The test-set may not be recognized, or the search may yield a high number of false negatives. If this is the case, an important feature of the RNA consensus pattern has been overlooked or misrepresented in the initial description of the motif. This must be corrected before proceeding.
(D) Critical analFsis of the program output (1) (2) (3) (4)
Evaluate the program output in light of the biological information available for each candidate RNA. Adjust the weightings used to modify the quantity of output. Adjust the list of features used so as to reduce the number of overlooked structures (false negatives). After these refinements have been made, the output is further evaluated using all available biological knowledge and criteria: for example, is the candidate motif conserved between species?
(E) Comparison of strong candidates to known examples (1) Sequence alignment of any new candidate RNAs and known members of the family to assess conserved function. (2) Structural alignment of the new RNAs and known members of the family as above. Helpful tools: consider alternative foldings using the MVOLDprogram9. Also useful are Searls' printout programs TMand ali~ment programs9, in particular CLUSTAL15.
OF)~
evahmion of candidate RNAs
(1) Calculate the folding energy of the RNA (compare it to those of family members and experiment). Helpful tools: the program MFOI~9,a sophisticated calculation, using standard conformations, of a model of three-dimensional structure for smaller RNAs is now also available 16. (2) Look for the presence of functional features 1, e.g. protein-recognition sites 17, splicing signals 18 and polyadenylation sitesl9. (3) Test remaining candidates experimentally, using simple biochemical assays.
automatic identification of group I intron cores 4, an initial test-set of 93 intron sequences was used. If the first test-set is recognized and the number of false negatives is acceptably low, the main database search for new members of an RNA motif family can be performed. Thus in the combined motif search for tRNAs3, 97.5% of the 744 known tRNAs were correctly identified, allowing the identification of 42 novel putative tRNA sequences after critical analysis of the output.
Specific and general RNA motif search programs Below, we discuss suitable programs and tools that are currently available and that greatly ease the task of carrying out independent RNA motif searches.
Programs that search specifically for several RNA families and RNA structures are listed and discussed in Box 2. Another option is to use general-purpose search programs (Box 3), which allow interactive translation of the description of a given RNA family into a program. However, general-purpose search programs are, by definition, not so well tailored as the finely tuned specific programs. In particular, if structural motifs must be scored and evaluated, a specific search program is usually more suitable. Critical analysis of program output Program output is evaluated in light of the biological information available for each potential new candidate
TIG FEBRUARY1995 VOL. 11 NO. 2
46
TECHNICAL
FOCUS
Box 2. Examples o f searches for RNA motifs Each of the cited programs is available from their respective authors, and some are also directly available via the Intemet by file transfer protocol (FTP). Combined motif searches consider both sequence and structural motifs to identify members of an RNA family. Additional details of related algorithms and approaches for each application are cited in the original publications.
Searching for tRNA genes in genomic sequences Fichant and Burks3 predicted 42 previously unidentified tRNAs. Their program is a good illustration of a carefully defined combined motif search. It has a very low false-positive rate (0.003%) and was developed for genome analysis. Intriguingly, a new approach using covariance models performs even better, especially when diverged tRNAs are screened 20. The covariance model constructs a tree of connected transition probabilities to generate the known sequences of a family, and then searches for previously unrecognized members using this covariance model. This approach also appears to be suitable for a general search method. Its main limitation at present is that it is slow: only 10-20 basepairs can be searched per second. Identification of catalytic introns The various structure and sequence signals contained in catalytic group I introns were translated into a combined motif search. Nearly all (132/143) known group I introns were correctly identified by the algorithm. Although no novel group I introns were described4, with its very low false-positive rate of one false positive in one million basepairs, the pr~gram could be useful for genomic analysis of new sequences. Candidates for trans-splicing RNAs A combined motif search revealed previously unidentified RNAs that were either capable of trans-splicing or had the potential for similar catalytic activity. It successfully identified motifs in all genera known to have trans-splicing, together with most known trans-splicing sites. Negative controls were rRNA, snRNA and tRNA, which were erroneously recognized only in a few instances7. Later experiments verified some of the candidates, with experimental confirmation of trans-splicing activity in HeLa cells21. Iron-responsive elements in mRNA A combined motif search for iron-responsive elements is described in detail in the main text, and in Ref. 5. Identification of RNA pseudoknots Pseudoknots are three-dimensional structural motifs. The search for these motifs8 included a Monte Carlo step to test the stability of putative pseudoknots. Four of five predicted pseudoknots from tobacco mosaic virus (TMV) RNA were confLrmed experimentally. The program also successfully identified pseudoknot structures from other viral sequences.
Common secondary struaure motifs of homologous RNAs The program used predicts common secondary structure motifs, and is potentially very versatile; this approach complements the STADENprogram (see Box 3). Starting from a phylogenetic alignment of several sequences, a covariation matrix is generated. Non-overlapping, conserved stable helices eiSmi,:aatealternative, less stable helices to calculate the common most stable fold. The performance of the algorithm was evaluated with tRNA, 5S rRNA and 16S rRNA. Structures that may act as packaging sequences in human immunodeficiency virus 1 (HIV1) were identified by this search22. 5' splice sites in pre-mi~A A sequence motif search was used to classify specific common nucleotides around the 5' splice site into 33 subclasses. These were used to predict 5' splice sites in new sequences with 90% certainty23.
RNA. Since good search programs can be fine-tuned, the relative weights attributed to each feature can be adjusted in response to the output obtained. The threshold score controls the quantity of the output. It must be decided how many candidates can be evaluated further, and additional criteria can be applied to rank new candidates according to how they correspond to a putative biological role for the newly identified RNA motifs; this ranking includes a certain degree of subjectivity. In the search for IREs, the involvement of erythroid 5-aminolevulinate synthase (eALAS) in heme synthesis and thus in iron metabolism made this candidate mRNA the top priority for further evaluation, followed by mitochondrial aconitase, an iron sulfur protein that has significant amino acid similarity with the IRE-binding protein IRP. When a very high threshold score is used, only the original test-set of RNAs may be identified, yielding no new information. However, it is important that a low number of false positives be obtained if large amounts
of sequences are to be screened, e.g. tRNAs in genome sequencing3. A low threshold score may mask file presence of interesting candidates against a background of false positives. A good guide for setting the threshold is to use the maximum score that still pemaits retrieval of nearly all known members of the family in the databank that were not included in the first test-set. In the IRE search ~ (Fig. 1), the threshold score for the bottom helix was set at eight, because higher scores no longer pemaitted the retrieval of all known IREs. When the score was lowered by two points, the output more than doubled without yielding additional candidates that we (subjectively) considered potentially relevant. With good positive and negative controls, identification of many new candidates points to the existence of a large RNA family, as is evident for trans-splicing RNAs6.7. The list of features included may be refined further and the screen repeated after changing the weightings used to account for less-specific and more-specific features that can be identified from known family members not
TIG FEBRUARY1995 VOL. 11 No. 2
47
TECHNICAL (a)
TAmE 1. Further screening of the IRE motif search outpuO Searching for
No. sequences identifieda
IRE search CAGUGN(N not G) CXXXXXCAGUGN(N not G) Complete IRE motif b
44 975 11347 86
Step
First filter RNA? mRNA?
Non-transcribed DNA Transcribed, non-mRNAc mRNA
•U
Human:
AGCU
.i
.....
[]•.
.CG..U.U.
.CGC-.UAU..G
U ! l( ; C t ; C A ( ] U G G C A G U A U C G
....
--°
.....
- ! JAGC',
..............
'AAUGAGG
- ............
Chicken: Fly:
.....
Bean:
•U
........
Yeast:
.U
.C
A
...........
. . AU
UUG.
. . . CG.
A . . C . . - . . A
.......
. GCAAUGACG
....
. A.
- UALI
A.
. CGCA
A.
[]. G
.....
. .G
.....
(b)
30 34
Third filter d No New? Ferritin/transferrin Yes 3' UTR~ IREs New candidate 5' UTR IREs
S. pombe:
Rat:
12 10 64
second filter Location? Translated region Untranslated regions
FOCUS
u
u
A-U U-A G-U G-C
25
@
6 3
[]
C-G
m , ,,oooo,o,cuuuo :
aThe database searched was the EMBL database, release 25. The bold typeface indicates the sequences that were entered into the next filter. bThe complete IRE motif is shown in Fig. 1. CFor example, introns. d Note that the third filter demonstrates that the known ferritin members of the IRE family contained in the database are successfully (re)identified by the program (see text). e These new IRE motifs were not considered to represent likely candidates for elements that control mRNA stability because they occur as single IREs in six different mRNAs, whereas control of stability of the tmnsferrin receptor mRNA appears to require the presence of multiple IREs2.s.
u
GAG A CG-C G-C
included in the test-set. After preliminary searches for IREs, the crude description of the top and bottom IRE helices used was thus adjusted 5 to yield a more detailed description (see Fig. 1). If the program can identify the majority of known structures but does not identify new candidates near the ~hreshold score, it is possible that no further family members are stored in the database. A simple illustration of how output can be evaluated further by eye and on the basis of the biological criteria available is shown in Table 1: because of their defined roles in regulating mRNA translation 2, IREs were only considered further if: (1) they were found in RNA that is mRNA; (2) they lay within the 5' untranslated region of the mRNA; and (3) they had not previously been identified, Chen et al. 8 have also described a detailed analysis of search output; data from a search for potential RNA pseudoknots were assessed on the basis of statistical, thermodynamic and biological criteria.
mz2,7 Gppp AGICU U U (t FmtmE 2. Conservation of (a) primary sequence and (b) secondary structure suggests conserved function. (a) The sequence alignment shows a high degree of conservation of primary sequence across species• Conserved nucleotides are indicated by a dot, gaps introduced for alignment are indicated by a dash. (b) Secondary structures of the 5' ends of two small nuclear P~NAsI•I8:Scbizosaccharomycespombesnu4 (top) and human U4 (bottom). Similarities in their secondary structure, including compensatory changes, support the idea that these RNAs have a conserved function. In our example, the new iron-responsive elements from eALAS and mitochondrial aconitase closely resemble the control, the human ferritin chain IRE (Fig. 3). Moreover, the identified murine eALAS IRE motif is conserved in the human geneSA0, further sugges.ting that it has a biological role. In the search of tRNAs (Box 2), the comparison yields a similarly clear picture, but new candidates for pseudoknot structures are not so easy to compare 8. Non-obvious similarities between known and potential new members of an RNA family may be better recognized by dot-plot analysis9 or by correlation images 11. Correlation imaging can MFOLD9.
Evaluation o f suitable candidate RNAs Figure 2 illustrates the comparison of newly identified candidate motifs with known RNA family members using as an example small nuclear RNAs (snRNAs) from evolutionarily distant species. It is important to consider alternative secondary structure folding of the candidate RNAs; this can be achieved using a program such as
TIG FEBRUARY1995 VOL. 11 NO. 2
48
TECHNICAL
Focus
B o x 3. G e n e r a l - p u r p o s e s e a r c h p r o g r a m s
various programs are listed according to their level of complexity. Each can be obtained from the original authors, or is available directly via the Intemet by file transfer protocol (FTP). FINDPA~ This program is part of the GCG program suite9, and is easy to use, even for the novice searcher. Sequence motifs, including wobble positions and mismatches, can be extracted from databases. The pattern can be surprisingly complex, e.g.
GRYAC(AG,G){1,3}AW-(A,T)G(N){33,37}C This pattern represents G, followed by a purine and a pyrimidine, followed by AC, followed by either AG or G, repeated 1, 2 or 3 times, followed by A, followed by" A or T (W represents a wobble codon), followed by a nucleotide other than (-) A or T, followed by G, followed by a stretch of 33-37 nucleotides, followed by a C. The output of the program identifies the wobble positions for each sequence found. The main limitation of this program is that it only allows searching for sequence motifs and not definition of threshold scores. OVEIISimR This program searches for RNA motifs composed of strings, whether with or without mismatches, repeats, palindromes, single positions and interacting positions~-4. Palindromes can form RNA helices, hence this program can identify secondary structures, the length of both loops and stems can be specified. More complex RNA structures can be assembled interactively (a model dialog is described in Ref. 24), defining one building block after another. When using this program to search for motifs, the steps outlined in Box 1 can prove useful. A special feature of the program is that a search of sequence titles is possible. Its limitations are that specific weights or scores for the building blocks cannot be def'med. Mismatches in the stem structures are allowed, but bulges are not, and the energy of stems or of other structures is not considered. All building blocks are obligatory, and complex building blocks like pseudoknots cannot be incorporated. ANREP This program searches for patterns composed of spacers and approximate matches to motifs that are recursively defined and finally composed of 'atomic' symbolsz5. The user first has to learn the user language, A. However, complex sequence motifs can then be searched with motifs separated by different spacer lengths. The major advantage of this program is that it can include an elaborate scoring scheme, thresholds and even scoring matrices of allowed nucleotides in certain positions. It also permits insertions and deletions with respective weightings. When ANm' is used to search for motifs, the steps outlined in Box 1 can prove useful. A special feature of this program is that it searches DNA and protein sequences. Its limitation is that it does not identify structural motifs. STADEN The philosophy behind this program26 is to create a dictionary of 'nucleotide words' te.g. the occurrence of TTTTTT), and to compare closely related 'fuzzy' sequences (e.g. STADENdefines two mismatches as a fuzziness of 2). The program can be used to find common nucleotide motifs hidden in an RNA family and thus help reveal a consensus motif. When using STADENtO search for motifs, the steps outlined in Box 1 can prove useful. The major limitation of the program is that common structures may be missed in the search. Use of this program can complement that of FINDPATrERN,OVERSEERand ANREP;moreover, this is just one of a repertoire of motif search programs produced by Staden 27. Others Other resources that may prove helpful include: * Covariance methods and the common secondary structure search program (see Box 2) can also be applied to search for other RNA motifs. • Comparative sequence analysis protocols laave been described by Stormo's group 28, and a general overview of sequence analysis has been given by Doolitde29. SRS (Ref. 6) can be helpful for retrieving lists of known functional RNAs. For additional related algorithms and approaches, see the relevant original publications.
binding of a protein to a putative RNA family member. This suggested that the newly identified IREs associated with eALAS and aconitase, but not the Toll IRE, were likely to function in vivo5 (Fig. 3). In addition, it has been calculated that the fold in the Toll IRE is much less stable than that k, the other motifs5. Similady, four of the five pseudoknots found by the RNA search with the 3' terminal non-coding region of tobacco mosaic virus (TMV) RNA8 were experimentally confimaed, together with several other new examples. When preliminary experiments support the computer aided identification of a new RNA family member with
be illustrated by comparing a newly identified snRNA with a known one: a horizontal line indicates that the newly identified RNA is probably related, small bars indicate further regions of similarity (program availablell). Structural motifs, such as hidden repeats within the sequence, may also be revealed by this analysis lz. Direct experimental testing of candidate RNAs is exemplified by the analysis of branched intermediates of trans-splicing RNAs, which can be carried out in vitro using debranching enzyme and involves their anomalous migration in two-dimensional gels 6. Likewise, band-shift experiments can be used to assess
TIG FEBRUARY1995 VOL. 11 No. 2
49
TECHNICAL
GU
A G C C U-A G-C U-A U- A U-A C U-A A-U C-G U-G C C C-G A C G-C 5' 3'
Murine eALAS mRNA
Porcine mitochondrial aconitase mRNA
GU A G C C C-G G-U U-G G-U A-U C A A-U C-G A-U A-U A C G-U C G 5' U 3'
Acknowledgements We thank B. Seraphin, N.K. Gray, P. Stoehr and D. Higgins for discussion and critical reading of the article.
References
GU A G C C A-U A-U C-G U-G U-A C G-C U-G £-G U-A U-A U G-C G-C G-C 5'
Drosophila toll mRNA
family members from databases. Clearly, it is now feasible to apply computational approaches as powerful tools for identifying RNA motifs.
GU
A G C C U-A C-G C-G U-G G-C C U-A U-A G-C G A U-G 5o 3'
FOCUS
3'
Human ferritin H-chain mRNA
Ftc,ut~ 3. Comparison of ~econdary-stmcture predictions for the three most likely candidate IRE motifs identified by the database search described in Table 1, using a ferritin IRE as a control. Note that the stem for the Toll mRNAmotif is GU-rich in the top helix, suggesting impaired stability.
a suggested biological function, the phase of searching is completed and detailed tests of the biological relevance of the f'lndings must follow.
1 The RNA Worh/(1993~((;csteland. R.F. and Atkins, J.F., eds), Cold.Spring i larb¢~r I.ab¢,atory Press 2 Melefors. O. and I It.nt/t.. M.\\. ¢1993) BioEssays 15, 85-90 3 Fichant, G.A. a n d i~urk-. ( i t 1991 ),L Mol. BioL 220, 659--671 4 Lisacek, F., l)iaz. Y anti Mitht'l. F. ( 1994)J. Mol. Biol. 235, 1206-1217 5 Dandekar, T. et al. ( 1991 ) I-MBOJ. 10, 1903--1909 6 Laird, P.W. (1989) Trends Genet. 5, 204---208 7 Dandekar, T. and Sibbald, P.R. (1990) Nucleic Acids Res. 18, 4719--4725 8 Chen, J-H., Le, S-Y. and Maizel, J. (1992) Comp. Appl. Biosci. 8, 243-248 9 Devereux, J., Haeberli, P. and Smithies, O. (1984) Nucleic Acids Res. 12, 387-395 10 Cox, C.T., Bawden, M.J., Martin, A. and May, B.K. (1991) EMBOJ. 10, 1891-1902 11 Nedde, D.N. and Ward, M.O. (1993) Comp. Appl. Biosci. 9, 331-335 12 Milosavljevic, A. and Jurka, J. (1993) Cutup. Appl. BioscL 9, 407-411 l J Etzoid, T. and Argos, P. (1993) Cutup. Appl. Biosci. 9, 49-57 14 Sears, D.B. (1993) Comp. Appl. Biosci. 9, 421-426 /5 Higgins, D.G., Bleasby, A.J. and Fuchs, R. (1992) Cutup. Appl. BioscL 8, 189-191 16 Gautheret, D. and Cedergren, R. (1993)J. Mol. Biol. 229, 1049-1064 / 7 Wittop-Koning, T.H. and Sch0mperli, D. (1994) Eur.J. Biochem. 219, 25-42 18 Pdo, D.C. (1993) Curr. Opin. Genet. Dev. 3, 574-584 19 Manley, J.L. and Proudfoot, N.J. (1994) Gene: Dev. 8, 259-264 20 Eddy, S.R. and Durbin, R. (1994) Nucleic Acids Res. 22, 2079-2088 21 Bruzik, J.P. and Maniatis, T. (1992) Nature 360, 692-695 22 Han, K. and Kim, H.J. (1993) Nucleic Acids Res. 21, 1251-1257 23 Kudo, M., Kitamura-Abe, S., Shimbo, M. and Iida, Y. (1992) Cutup. Appl. BioscL 8, 367-376 24 Sibbald, P.R., Sommerfeld, H. and Argos, P. (1992) Comp. Appl. Biosci. 8, 45-48 25 Mehldau, G. and Myers, G. (1993) Comp. Appl. BioscL 9, 299-314 26 Staden, R. (1989) Comp. Appl. Biosci. 5, 293-298 27 Staden, R. (1991) DNA Seq. 1, 369-374 28 Gutell, R.R. et al. (1992) Nucleic Acids Res. 20, 5785-5795 29 Doolittle, R.F., ed. (1990) Molecular Evolution: Computer Analysis of Protein and Nucleic Acid Sequences, Academic Press
T.
DANDEKAR AND
211,/.W. HEm72
ARE IN
THE
BIOCOMPUTING • STRUCTURES AND GENE EXPRESSION PROGRAMM~ EUROPFANMOLECULARBIOLOG¥ LABORATORY, MEYERHOFSTRASSEl, J~.69117 HEIDELBERG, G£RMAA~.
Conclusion
Several biologically important RNA families have been successfully extended by the identification of new
TIG FEBRUARY1995 VOL. 11 NO. 2
50