BOOK
REVIEW
Gunnar von Heijne, Sequence Ana&is in Molecular Biology: Treasure Trove or Trivial Put-w , Academic Press, San Diego, CA, 1987, 188 pp., $27.95. /
This book attempts to introduce the many and varied methods of genetic sequence analysis to the nonmathematical biologist, without skimping on essential details. It is a very good-humored work and, unlike most efforts in the field, entertaining as well as informative. In large part the author succeeds in his goal. It also is comprehensive, covering almost all nucleic acid and protein sequence analytical algorithms. The analysis of sequence information is treated by most biologists as a “black box”: they obtain a computer program and blindly run the data through it, hoping to see something of biological or statistical interest. However, few understand the underlying details. This book attempts to explain those underlying concepts in moderate detail so that the user of a standard algorithm will be able to apply it with some intelligence. It does not help, of course, that most biologists’ eyes glaze over when they encounter a summation sign. There is little mathematical knowledge required on the part of the reader; for details, comprehensive literature citations are included. Where needed to nail down a concept, appropriate equations with discussion are included; in general these are well chosen and well explained. The book is divided into eight Chapters; chapter 3 discusses the programs available for designing oligonucleotide probes and finding restriction sites, among other topics. Chapter 4, titled “Nucleotide Sequences: What You Can Do Once You Have It,” is an excellent review of basic molecular biology general site recognition (promoter and splice site recognition, for example). It also covers coding region detection and RNA and DNA structure analysis programs. Chapter 5 is the protein complement to chapter 4 and discusses secondary and supersecondary structure prediction, prediction of functional regions of a protein, and general prediction methods based on hydrophobicity and other properties. Chapter 6 covers sequence similarity searching programs. Chapter 7 contains some discussion of conclusions based on programs discussed in the book that turned out to be incorrect. The last chapter is noteworthy in that it mentions that the emerging field of “computer molecular biology” (I prefer the term computational molecular biology) and theoretical molecular biology cannot be based on the same foundation as theoretical physics because the first principles have not been, and are not likely to be, established. [This point has been the subject of a report by the National Academy of Sciences, and a workshop was held on the subject (now called Matrix biology) in the summer of MATHEMATICAL
BIOSCIENCES
93:147-148
(1989)
Published by Elsevier Science Publishing Co., Inc., 1989 655 Avenue of the Americas, New York, NY 10010
147 0025-5564/89
148
ROOK REVIEW
1987.1 There
are two useful appendixes, the first of molecular biology databases, the second of commercial sequence analysis programs. There are several things worth noting about the author’s discussion in general. He occasionally calls a program more advanced when it is actually more complicated, and the complications do not introduce improved accuracy. More editorial guidance would have been helpful; for instance, the index omits the use of the z-score, which is discussed in the text; and some help could have been used in spelling. Appendix 2, a list of commercial sequence analysis software, omits several well-known vendors-Amersham, Hitachi, and IBI. The discussion of protein sequence analysis is extensive and thorough. Dr. von Heijne introduces many important concepts and methods of analysis. In particular there is an excellent hydropathicity discussion, including the first general discussion (that I am aware of) of cluster analysis applied to amino acid properties. The chapter on sequence similarity algorithms follows my own review on the subject while usefully extending it. The author presents an especially clear description of the hash-coding-based algorithms. There are two troubling aspects to this chapter: first, the algorithms are described rather than critiqued, unlike the rest of the book. Second, there is a startling omission of most of Sellers and collaborators’ work. Furthermore, the author apparently failed to appreciate that the Goad and Kanehisa algorithm not only introduced a useful statistical measure but also introduced the concept of match density as a weighting function. In general, this chapter is weaker than the rest of the book. However, references are provided to appropriate reviews. The discussion of statistical significance of nucleic acid similarities omits the work of Waterman and collaborators; this may be forgivable as it is a rather advanced topic, but some mention would seem to be in order. Overall, this book is an essential work for anyone remotely concerned with sequence analysis. It is also a fine complement to Russell Dolittle’s Oj Urjs und Orjs. It would be ideal for a one- or two-semester course in the subject, especially if supplemented with appropriate reviews and the Dolittle monograph. Its coverage of the analysis literature alone would make it worthy: the experience in the techniques communicated by the author makes it even more so. DAN DAVISON Theoretical Biology md Bioph.b>sicsGroup Los Alumo.~ Nationul Luhorutorv Los Alumos. New Mexico 87545