Metabolic Engineering 2, 157158 (2000) doi:10.1006mben.2000.0157, available online at http:www.idealibrary.com on
FOREWORD Bioinformatics and Metabolic Engineering Published online July 28, 2000
Advances in sequencing and DNA replication technologies have made it possible to sequence the entire genomes of many organisms. By the end of 1999, more than 30 species had been completely sequenced and this number is estimated to exceed 100 in the next 2 years. The information contained in these genomic data will eventually allow researchers to synthesize roadmaps of cellular function with enormous implications for medicine, pharmacology, biotechnology, and biology in general. Cocurrently with the various genome-sequencing projects, powerful technologies have recently been developed for the simultaneous measurement of the degree of expression of each individual gene in a genome under a particular set of environmental conditions. These technologies utilize DNA hybridization reactions between the complimentary DNA (cDNA) of a sample and thousands of DNA probes, specific to each individual gene, immobilized at high resolution in precise locations on a glass or membrane substrate. Each cDNA strand in the sample recognizes uniquely, through base pairing, the corresponding probe on the substrate and binds to it. Bound DNA strands fluoresce and can be assayed by several fluorescence-scanning technologies. These methods form the cornerstone of DNA microarrays (gene chips) that have now made possible parallel monitoring of the differential expression of large numbers of genes. Similarly, in the area of protein analysis and quantification, current research is developing methods for the parallel characterization of protein identity and amount in cells. Besides two-dimensional gel electrophoresis, methods under development include specific antigenantibody reactions on a chip, integrated peptide analysis by mass spectrometry following separation by liquid chromatography, and trypsin digestion and others. Although not quite at the same level of development as microarrays yet, the intensity of activity and importance of proteomic data suggest that developments in this area will accelerate in the near future. The set of technologies probing the intracellular make-up and function is completed with new isotopic tracer methods that accurately determine intracellular metabolic fluxes as measures of cell physiology and function. Flux determination is carried out by enumerating complete metabolite isotopomer balances and solving for isotopomer content as function of metabolic fluxes. Using various measurements of isotopomers, such as those reflected in the fine structure of 157
NMR or GCMS spectra, the intracellular metabolic fluxes can be determined such as to maximize the agreement of these measurements with the corresponding values predicted through metabolic reconstruction. Fluxes so determined are robust in the sense that they satisfy a great degree of redundancy and are sensitive to variations of the intracellular state. The above technologies probe important biological processes in a parallel manner and are already generating an unprecedented volume of data about the intracellular state. As such, they challenge the traditional paradigm of biological research that has proceeded over the years by investigating only a small number of processes (genes or enzymes) at a time. These advances create enormous opportunities for accelerating the pace toward the main goal of biological research, namely understanding global gene regulation. It is expected that, in most situations, a host of genes (as opposed to a single gene) will be responsible for cellular activities associated with, for example, high productivity in a fermentor or a diseased state. Identifying the characteristic gene expression fingerprint underlying a cellular physiological state is a very critical step in realizing the full potential of the above exciting technologies in biological research. Furthermore, reconstructing genetic regulatory networks, linking the expression and metabolic phenotypes, and elucidating interactions in signal transduction pathways are a small sample of the problems that will be possible to address using the data of the above technologies. These questions, along with the spectrum of problems related to sequence annotation and analysis, are the subject of the emerging field of bioinformatics. In general, we define bioinformatics as the methods and framework aiming at the extraction of biological knowledge from sequence, expression, proteomic, and isotopic tracer distribution data. The upgrade of information content is the main theme of bioinformatics research. There is a strong bidirectional relationship between bioinformatics and metabolic engineering. First, metabolic engineering provides an integrated, systems theoretic framework for analyzing the data generated from the above technologies. At the same time, metabolic engineering can benefit immensely from the information that will be extracted from such data. Think, for a moment, of identifying the expression profiles associated with high productivity 1096-717600 35.00 Copyright 2000 by Academic Press All rights of reproduction in any form reserved.
Foreword
Metabolic Engineering 2, 157158 (2000) doi:10.1006mben.2000.0157
periods in the course of a fermentation run or, similarly, isolating a set of differentiating genes and their characteristic expression pattern that are associated with the onset of a particular disease, especially in a dynamic sense that shows the movie of expression profiles as the disease evolves with time. These, and many similar examples, fuel the growing excitement about genomics research and derivative technologies. There are, however, several problems to overcome in realizing the above potential. In contrast to the impressive progress in the development of methods and instrumentation for probing the intracellular state and function, systematic methods for the effective analysis of such data have received rather scant attention. Data evaluation is usually limited to cursory inspections by the user or, at best, to automated spot comparison (spot-oriented analysis) and rudimentary statistical analysis. Furthermore, faced with information overload, there is a natural tendency to subjectively focus on what is viewed a priori as relevant or
158
important and relegate to the background everything else. Most importantly, besides methods and algorithms, there is a scarcity of experienced personnel who have the computational skills to develop such technologies and use them for extracting important information from the above data sets. These limitations are receiving broad attention, currently calling for innovative approaches to provide much-needed solutions. The present issue of Metabolic Engineering is focused on the subject of bioinformatics. Nine high-quality papers have been selected to illustrate some of the methods and problems in this area. The goal is to bring to the attention of our readers the importance of the field and encourage more activity in this area. We look forward to covering topics of bioinformatics on a regular basis in the future. Gregory Stephanopoulos Massachusetts Institute of Technology