Computers and Chemistry 25 (2001) 425– 426 www.elsevier.com/locate/compchem
Book Review Post-genome Informatics Minoru Kanehisa, Oxford University Press, Oxford, 2000, 148 pages In times with intensified public interest in science, very much due to the sequencing efforts that have resulted in the publication of the draft human genome and those of about 70 other species at present, writing a general-purpose book on post-genomic research is a significant task. On the one hand, such a book or book series should be comprehensive and elaborate in order to present a complete picture of the efforts invested by the scientific community to squeeze biological sense out of the bewildering magnitude and diversity of biomolecular data. On the other hand, to make the book accessible for a larger readership, it should provide an overview rather than be full of potentially off-putting details. Kanehisa has clearly chosen the latter approach for his recent book ‘Post-genome Informatics’. The book is small with just 148 pages, and is divided into four chapters: a general introduction, biomolecular databases, sequence analysis methods and a final chapter dealing with formalisms applicable to molecular metabolism and other pathway interactions. Chapter 1 entitled ‘Blueprint of life’ introduces DNA, RNA and proteins in relation to the ‘central dogma’ of molecular biology. It also talks about codons and provides a useful table with variations of the codons used within different organisms (and mostly local to the mitochondria). Retroviruses reversing the information flow of the central dogma as well as splicing are mentioned. Although some of the arguments for a primordial RNA world are covered, the chapter does not deal with the controversy between the intron-early or intron-late hypotheses, which is pertinent to splicing. After discussing some crucial technological developments in elucidating biomolecular sequence and structure, the chapter concludes with listing the grand challenges of the post-genomic era. According to the author there are two: (i) the protein folding problem and (ii) the problem of reconstructing a biological organism given its complete genome. The latter—in the view of the author—would involve the prediction of all interactions between genes and molecules. It could be
argued that this is a minimal requirement for accomplishing this reconstruction. In Chapter 2, first some longstanding molecular databases, such as PIR and SWISS-PROT, are quickly discussed. Then, some very brief computer science theory is given concerning relational and object-oriented databases. The chapter concludes with some details of, what the author calls, a new generation of molecular biology databases. In fact, the databases mentioned have been all available for quite a number of years, and post-genomic databases such as those holding gene expression data, structural genomics and the like, are not covered. Chapter 3 is divided into two parts. The first provides a brief introduction to dynamic programming (DP), still the most widely used technique for sequence alignment, and ways to parallelise the DP algorithm for gaining speed. The sequence database searching programs FASTA and BLAST are briefly discussed, as well as multiple alignment and phylogenetic analysis. The second part deals with methods to predict RNA and protein 3D structure, and also covers gene finding and protein sorting. A selection of general optimisation and machine learning algorithms is given here as well. Chapter 4 is an attempt to relate the problems to the genomic level and beyond. Although most of the chapter introduces basic graph theory, the discussion then goes into biochemical pathways, with examples taken from the author’s KEGG database. The chapter ends with a short discussion of gene regulatory networks and their principles, concluding with some thoughts on complex systems. The discussion here remains very general, and no attention is given to large-scale efforts in gathering and integrating genomic data, such as gene expression data, proteomics, etc. Kanehisa states in the Preface of his book that his motivation has been ‘‘to provide conceptual links between different disciplines, which often share common ideas and principles’’. However, he does not help the reader much in making these conceptual links between the various disciplines. The small selection of methods and formalisms are mainly given as stand-alone descriptions, often not sufficiently explained and placed in relation to their application. For example, simulated
0097-8485/01/$ - see front matter © 2001 Elsevier Science Ltd. All rights reserved. PII: S 0 0 9 7 - 8 4 8 5 ( 0 1 ) 0 0 0 7 8 - X
426
Book re6iew
annealing is described as a technique for multiple sequence alignment, whereas it would be much better placed under conformational searches and 3D-structure prediction. The choice of organising the material within the above four chapters leads sometimes to inconsistencies as well, such as the description of protein folding and structure prediction under sequence analysis. Furthermore, there are hardly any pointers in the text to the primary literature, and only a small number of references appear in an appendix. Perhaps, the book would have been improved by emphasising how the various algorithms can be reused in dealing with the different questions and problems given in the book. Surely, a lot of the author’s theory about molecular pathways and complex systems could be applied to the associations between data, databases and algorithms for tackling the challenges of post-genome informatics. A good thing about the book is that it most of the time approaches the problems with a biological perspective. This is often lacking, but crucial for bioinformatics research. Interestingly, at a number of points in the book, Kanehisa seems to take a holistic approach to life. For example, on p. 104, he writes: ‘‘It is unlikely that the genome contains information about all necessary molecular interactions needed to make up life. The analysis inevitably involves space- and time-dependent behaviours of molecular interactions and reactions, both in terms of the physico-chemical principles and the biological constraints’’. Assuming the author does not refer here to facts such as the necessity for humans to eat vitamin C, for example, what does he mean? For all other processes, although spaced out in time and place, there is an intimate interaction between the genome and the cellular environment, to an extent that we are only beginning to understand as a result of recent gene expression analyses. The fundamental descriptor of a species, and each individual within it, is the genome. Earlier in Chapter 1 (on p. 20), Kanehisa argues that a gene coding for a protein in many instances would not contain all necessary information for the protein’s 3D shape, and he mentions the existence of molecular
.
chaperones, which assist in folding specific proteins, to back up this claim. In addition to the prevailing view that chaperones catalyse rather than direct the folding process, one would argue that, at a higher level, such chaperones are generated by the very same genome that produces the folding protein considered. Kanehisa stretches his argument even more on p. 21, where he expresses a view that: ‘‘… the genome is only part of the whole network of interacting molecules in the cell. The genome is not the headquarters of instructions; rather it is simply a warehouse of parts. The headquarters reside in the network itself which has an intrinsic capacity, or a programme, to undergo development and to reproduce gamete cells’’. Although, in biological organisms, much is to be learned from the heterarchical (or networked) associations of molecular entities, stating that the networks themselves are the templates of evolution seems too much of a stretch. Perhaps, Kanehisa resolves the issue in the concluding statements of Chapter 1, where he writes that: ‘‘The genome is only a part of what makes up an individual organism, but it probably contains the most information within the network of interacting molecules’’. Researchers in cloning would probably agree with this statement. All the above issues and idiosyncrasies aside, Kanehisa has written a useful introduction, giving a good glimpse of what post-genome informatics is all about. People working in biomedical industries might regret that issues in handling recent large-scale genomic data remain largely untouched. For computer scientists and other non-biologists entering the field, it might be a pity that the algorithmic aspects as well as the discussion of the biological issues have not been elaborated further. Nonetheless, ‘Post-genome Informatics’ can serve as a quick reference for anyone interested in genome bioinformatics. Jaap Heringa National Institute for Medical Research, London, UK E-mail:
[email protected]