C:ell, Vol. 68. 827-828,
March 6, 1992, Copyright
0 1992 by Cell Press
Book Review
The A, 6, C of Molecular
Grammar
Sequence Analysis Primer. Edited by M. Gribskov and J. Devereux. New York: Stockton Press. (1991). 279 pp. $39.95.
The analysis of DNA and protein sequence is quickly becoming a national pastime. While a variety of analyses may be applied to a DNA or protein sequence, often the most pressing question for an investigator is, “Does my sequence look like anyone else’s sequence?“. More and more, the answer to this question is yes. The rapid sequencing of large DNA pieces, coupled with the now trivial conversion of these DNA sequences into predicted proteins, has led to logarithmic increases in the amount of sequence information available to most investigators. The latest release of the GenBank database (Release 70) contains over 80 million nucleotides amassed from 60,000 entries, with no end in sight. This enormous amount of sequence data increases the likelihood that one’s sequence will be related in some way to a previously identified gene or protein. Yet the sheer volume of this information can complicate the attempts to extract useful information. Sequence similarities are often clearly meaningful, but many biologically significant relationships are subtle and will not be readily discovered without some understanding of the strengths and weaknesses of various DNA and protein sequence analyses. A variety of sequence analysis software packages are available, all of which consist of a set of routines that will allow an investigator to perform, with varying degrees of success, basic DNA and protein sequence manipulations: translation of DNA, identification of restriction sites, protein hydropathy and hydrophilicity plots, sequence similarity searches, etc. While these programs are relatively easy to use and will allow an investigator to arrive at an answer of some kind, the accompanying manuals rarely provide any practical advice when it comes to interpreting the analysis. On the other hand, detailed volumes have been published on the utilization and interpretion of sequence analysis data (e.g., the entire Volume 30 of Methods in Enzymology, Academic Press, San Diego, 1974, is devoted to sequence analysis); but rarely do these authors have the true novice in mind. Sequence Analysis Primer is a successful attempt to provide a manual designed specifically for individuals with little past exposure to computerized sequence analysis. This book is the first in a set of four texts to be published as the University of Wisconsin BiotechnologyCenter(UWBC) Biotechnical Resource Series; however, only this volume will pertain to sequence analysis. As an introduction to sequence analysis, those new to these techniques will benefit from this very useful text, as will more experienced investigators with only an empirical understanding of the
subject. This relatively short book has been divided into four sections contributed by different authors. First, Rice, Elliston, and Gribskov discuss a step-by-step approach for analyzing DNA using a variety of techniques. Emphasis is placed on understanding the practical details of each technique. Potential pitfalls are pointed out, and useful advice for identifying and avoiding errors in analysis is provided. The reader is given helpful tips for adjusting the parameters associated with many common sequencing programs in order to arrive at a meaningful answer. Luthy and Eisenberg then discuss the analysis of protein sequence, again focusing on the practical aspects, especially as these pertain to understanding what the primary sequence of a protein can tell an investigator about its putative structure. States and Boguski describe the various techniques that can be employed to compare one protein sequence to another. In addition to a discussion of the proper uses of the terms “similarity and homology”(the title of this chapter), the authors offer some practical aids to identifying and interpreting sequence relationships between different proteins as well as within the same protein (internally repetitive domains). As these authors point out, internal repetitions are a relatively common feature of many proteins; yet their existence is often overlooked. Finally, Caballero uses the Drosophila Notch sequence as an example to describe the sequence analysis of a gene, from entry of the DNA sequence and its assembly into a gene-size piece to a search for sequence similarities to the predicted gene product. The Notch gene is an excellent choice for this chapter because its inherent complexity allows the reader to see how the presence of a variety of protein structural elements can complicate protein sequence analysis. The book is not without its faults. While the text is easy to read, and concepts are usually explained clearly, the various authors occasionally digress into detailed discussions that are beyond the needs of the beginner (or most of us, for that matter). Some of the figures are difficult to follow because they are not fully described in the text or figure legends. The pros and cons of the various sequence analysis packages are discussed only sparingly, and I did not feel that these comparisons were unbiased or complete. These difficulties were most noticeable in the first chapter. In spite of these minor flaws, the practical, commonsense approach to sequence analysis provided by this book makes it a valuable tool, not only for the novice but even for the more experienced. I was personally enlightened after no more than a simple perusal of the short glossary of terms commonly used in discussions of sequence analysis. While the emphasis is placed on the simpler analysis techniques, more sophisticated analyses are also described, and the thorough documentation accompanying each chapter will allow the reader to locate techniques mentioned in passing. Relevant addresses and phone numbers are included throughout the text and ap-
Cell 828
pendices and will greatly facilitate the acquisition of software and computer network access by interested individuals. While the most critical aspect of a DNA sequencing project is arguably the analysis of the determined sequence, very few molecular biologists are trained to analyze and interpret the conclusions that such analysis provides. In sequence analysis, avoiding the temptation to overinterpret a result of marginal significance is often as great a problem as the possibility of overlooking a subtle result of clear biological significance. This Sequence Analysis Primer provides guidelines that will make both of these pitfalls less likely. Mark G. Goebl Department of Biochemistry and Molecular Biology and the Walther Oncology Center Indiana University School of Medicine Indianapolis, Indiana 46202
Books
Received
Duncan, C. J. (1991). Calcium, Oxygen Radicals and Cellular Damage. Cambridge University Press, New York. 224 pp. $79.95. Hawes, C. FL, Coleman, J. 0. D., andEvans, D. E. (1991). Endocytosis, Exocytosis and Vesicle Traffic in Plants. Cambridge University Press, New York. 252 pp. $85.00. Lesk, A. M. (1991). Protein Architecture: A Practical Approach. Oxford University Press, New York. 287 pp. $45.00. Maclean, N. (1991). Oxford Surveys on Eukaryotic University Press, New York. 166 pp. $72.00.
Genes. Oxford
Miflin, 6. J. (1991). Oxford Surveys of Plant Molecular and Cell Biology. Oxford University Press, New York. 334 pp. $50.00. Newland, A., Burnett, A., Keating, A., and Armitage, J. (1991). Haematological Oncology. Cambridge University Press, New York. 247 pp. $84.95. Simonetta, A. M., and Morris, S. C. (1991). The Early Evolution of Metazoa and the Significanceof ProblematicTaxa. Cambridge University Press, New York. 296 pp. $69.95.