Impact of Genomics and Genetics on the Elucidation of Bacterial Metabolism

Impact of Genomics and Genetics on the Elucidation of Bacterial Metabolism

METHODS 20, 47–54 (2000) Article ID meth.1999.0904, available online at http://www.idealibrary.com on Impact of Genomics and Genetics on the Elucidat...

61KB Sizes 2 Downloads 106 Views

METHODS 20, 47–54 (2000) Article ID meth.1999.0904, available online at http://www.idealibrary.com on

Impact of Genomics and Genetics on the Elucidation of Bacterial Metabolism Diana M. Downs 1 and Jorge C. Escalante-Semerena Department of Bacteriology, University of Wisconsin—Madison, Madison, Wisconsin 53706 –1567

In the last few years, the emergence of complete genome sequences has had profound effects on all fields of biology. While the existence of these genome sequences has served to facilitate experimental work, it has also highlighted the gaps in our knowledge of bacterial metabolism. Our current knowledge of metabolism is primarily the result of data accumulated from decades of study by biochemists and geneticists. In general these studies focused on discrete pathways and their regulation. The technical innovations of the last decade, culminating with the sequencing of complete genomes, provide us with the ability to address the next frontier in physiology, metabolic integration. Herein we describe current approaches that can be used to complement classic genetic approaches and further our understanding of both novel metabolic functions and metabolic integration in microorganisms. © 2000 Academic Press

As we enter the age of genomics, renewed attention is being focused on the study of bacterial physiology and metabolism. This focus is the result of several factors including: (i) genome sequences that have identified a large number of open reading frames (ORFs) with no ascribed functions; (ii) increased emphasis on metabolic diversity of microorganisms; (iii) increased reports of metabolic phenotypes caused by mutations that affect central cellular processes; and (iv) increased evidence for key roles of metabolic processes in pathogenesis and symbiosis. All of these factors have resulted in the growing realization that a better understanding of basic bacterial metabolism and physiology is critical. Our intent in this article is not to describe all the efforts being put forth under the flagship of “genomics.” Although this is an area of intense study and rapid 1 To whom correspondence should be addressed. Fax: (608) 262– 9865. E-mail: [email protected].

1046-2023/00 $35.00 Copyright © 2000 by Academic Press All rights of reproduction in any form reserved.

progress, it is not our desire, nor the area of our expertise, to review the technical or theoretical aspects of work in this area. Rather the purpose of this article is to emphasize the need to reinvigorate the approach to metabolism that has been historically successful. Our goal with this article is to provoke thought and discussion among workers interested in metabolism and functional genomics that will result in increased integration between modern techniques, genomic applications, and classical genetic approaches. Metabolic pathways have been studied over the years by biochemists and geneticists, yielding an impressive knowledge base of the catabolic and anabolic pathways of prokaryotic cells. The technical innovations of the last decade have provided us with powerful new tools to probe metabolism at levels previously unimagined. Herein we describe genetic and genomic means to address two major areas of bacterial metabolism: (i) discrete, individual biochemical processes; and (ii) metabolic integration, what is likely to be the next frontier in cell physiology.

1. PREDICTING FUNCTION FROM SEQUENCE HOMOLOGY Before complete genome sequencing efforts became formalized, studies carried out in various biological systems had identified the sequences for a large number of genes as well as the functions associated with the gene products. These sequence–function databases have become indispensable in biology since they can be used for computational analyses to define “signature sequences” that ultimately predict functional motifs. As with any comparative analysis, the accuracy of the theoretical predictions increases with more known components. This fact alone emphasizes the importance of continued sequence annotation by experimen47

48

DOWNS AND ESCALANTE-SEMERENA

tal work in any organism, regardless of the status of its genome sequence. Comparison of query sequences with these sequence databases has become the prevalent tool to predict function for new ORFs. Although this approach is seen as a good first step toward elucidation of function, it should be kept in mind that experimental work is needed to validate computer generated predictions of function. 1.1. Experimental Approaches to Elucidate Function To benefit from the power of current sequence comparisons, one must identify a sequence of interest. Traditionally, the starting point for such analyses has been isolation of a mutation that results in a phenotype of interest (The reverse approach, that of generating a mutation in an ORF and looking for a resulting phenotype, has become feasible with genome sequences and is discussed in Section 3.) There are a variety of techniques available to generate and isolate mutants in different organisms and the reader is referred to several review articles (1– 4). In a phenotype driven approach, once the mutant of interest has been isolated, the disrupted gene must be identified by sequence to allow potentially informative sequence comparisons. We discuss below several general approaches that can be used to identify the sequence of a disrupted gene. 1.1.1. Complementation of Function and Sequence Identification Classically, DNA–function correlations have been determined by direct complementation of mutant phenotypes with a fragment of DNA. This strategy, in the case of a recessive, negative phenotype, benefits from the power of a positive selection. In the simplest scenario, a library of plasmids containing sized chromosomal DNA fragments are introduced into the mutant under investigation by selecting for a plasmid-encoded marker (usually drug resistance). The resulting transformants, transductants, or exconjugants are then tested for complementation by demanding growth under conditions nonpermissive for the parental mutant strain. With this approach, it is important to retain selection for the plasmid throughout the manipulations to increase the chances that restoration of function is due to a plasmid-contained gene or genes. Even with this precaution one must eliminate reversion, marker rescue, and other “artifacts” that could have been obtained in the initial selection, by demanding that growth results every time the plasmid is inherited. Once a complementing plasmid is obtained, the DNA insert can be subcloned, to define the minimal amount of DNA required for complementation, or completely sequenced. In organisms that are poorly understood at the genome level, knowledge of the sequence flanking

the gene of interest can provide valuable information about context, gene organization, etc. Various related metabolic genes and/or regulators have been identified by sequence walking [e.g., see (5, 6)]. Finally, one must demonstrate that the final plasmid contains the actual gene disrupted in the mutant strain. This can be accomplished in a number of ways including, marker rescue analysis, sequencing of the entire mutant allele, and genetic mapping. The above precautions are necessary since growth scored as complementation can also be due to “multicopy suppression,” essentially a property of overproducing a gene product. In fact, multicopy suppression is a phenomenon that can be very informative when probing metabolism, as elaborated on in Section 5.2 of this article. While in principle the approach described above allows researchers to analyze the effects of insertion or point mutations, there are several circumstances that could prevent identification of the desired plasmid; these include dominant mutations, lethality of overproduced gene products, polarity of a mutation, and inability to easily introduce DNA into the organism. 1.1.2. Direct Sequencing of Insertion Mutations One of the most powerful techniques made possible by current sequencing technologies is the ability to rapidly sequence DNA flanking insertion elements. This approach is feasible in any organism that has a transposon delivery system, and it involves cycle sequencing using two DNA primers. Although many variations of this technique are being used, in general, primers complementary to transposon sequences are used in combination with “degenerate” primers that will hybridize with little sequence preference, resulting in amplification of a fragment that can be sequenced using the transposon-specific primer (7, 8). The chief advantages of the direct sequencing approach are its universal application (i.e., any known insertion in any organism) and the rapid results. This approach identifies the site of insertion from which the sequence of the complete gene can be readily obtained by sequence walking. 1.1.3. Computer Analysis Once a stretch of sequence and/or an individual ORF is defined, the data can be processed by various (and ever-increasing) computer programs. Search programs are readily accessed via the Internet and information is provided to help perform these analyses (www.emgm. stanford.edu/classes/genefind/; www.ncbi.nlm.nih.gov/ BLAST). In the context of function, these comparisons should be performed with the predicted amino acid sequence not the nucleotide sequence, thus facilitating identification of protein homologs that may have diverged significantly at the nucleotide level. Once a query sequence is submitted one of three general out-

ELUCIDATION OF BACTERIAL METABOLISM

comes can be expected: (i) clear sequence homology across the predicted protein sequence with proteins of known function; (ii) blocks of homology to defined “functional motifs”; or (iii) no significant homology, or homology only to additional proteins of unknown function. If a function, general or specific, is predicted from sequence analysis, efforts should be made to demonstrate activity in vitro. A variety of overexpression systems are commercially available to facilitate the purification and analysis of proteins and the functions associated with them. If a general functional class is predicted, in vitro activity assays (e.g., reductase activity, ATPase activity, dehydrogenase activity, DNA, RNA binding activity) can most likely be found in the literature. Researchers working with structural proteins face a more difficult task, since in many cases the lack of a given structure may be lethal, and an in vitro assay is just not feasible. In this case, and that where no homology is found among the query ORF and sequences in the databases, additional manipulation and/or genetic analysis is required. The extent to which these analyses can proceed depends on the respective organism. In the last few years genome sequences have led to the definition of protein families based exclusively on conserved sequence motifs. Because of their broad conservation, functional information about these families gained in genetically tractable organisms will have broad implications for workers in diverse organisms, including higher eukaryotes. The power of a comparative genomic approach has been extremely useful, and will continue to be as more and more sequences encoding known functions are deposited into databases. Because sequence homology is only predictive of function, we must continue to test these predictions experimentally and not fall into the habit of assuming function based on computer analyses. Each time a sequence is identified, an associated function determined, and an explanation for the way the organism behaves in the absence of this function offered, our knowledge of metabolism is significantly enhanced.

2. GENOME ANALYSIS TO IDENTIFY NOVEL METABOLISMS The complete genome sequence of a variety of “model” organisms has resulted in insights into the evolution of metabolisms as well as the organization and distribution of functions (9) (see www.cme.msu. edu/WIT; ecocyc.PangeaSystems.com/ecocyc/ecocyc.html; www.genome.ad.jp/kegg/kegg3.html; www-c.mcs.anl. gov/home/compbio/PUMA). Complete genomes are now being annotated based on predicted, if not known,

49

sequence-to-function relationships. One feature of these annotated genomes has been the ability to specify the approximate distribution of genes involved in various cellular functions, i.e., structure, biosynthesis, transcriptional regulation. This analysis has in general resulted in the distribution of genes expected when considering the known lifestyle of the organism [e.g., the bulk of genes in Bacillus subtilis are devoted to signal transduction and secretion of secondary metabolites (10), Helicobacter pylori devotes significant genes to surviving acidic environments (11)]. In addition to expected findings, analysis of complete genomes has identified novel genes that would not have been predicted by the environmental habitat of the organism. For instance, recent work has identified a Rubisco (ribulose-1,5-bisphosphate carboxylase/ oxygenase) homolog in the obligate anaerobic archaeon Methanococcus jannaschii. Interestingly, the archaeal enzyme is sensitive to oxygen inactivation, a finding that raises important questions about the origins of the oxygenase activity associated with Rubisco found in bacteria and plants. As such, this finding now directs functional work that is informative with respect to the divergence of enzymes for novel, altered function (R. Tabita, personal communication). Analysis of genome sequences aimed at identifying such metabolic outliers is a particularly valuable approach in organisms where genetic selections are not easy. Again, however, emphasis should be placed on obtaining the experimental evidence to support predicted functions. Workers should be cautioned against drawing conclusions when the orthologue of a particular enzyme is not present in the reported genome sequences. For instance, the cobU gene and its gene product have been structurally and functionally characterized in Salmonella typhimurium (12, 13). The CobU enzyme is involved in the late steps of the biosynthesis of adenosylcobalamin (a.k.a. coenzyme B 12) in this bacterium. Structural cobU homologs can be identified in Synechocystis (14), Pseudomonas denitrificans (15), Escherichia coli (16), and R. capsulatus (17). However, the cobU orthologue is absent in the methanogenic archaea Methanoccocus jamnaschii (18) and Methanobacterium thermoautotrophicum strain DH (19), despite the fact that these organisms synthesize significant amounts of cobamides. Results like these offer an opportunity to identify different enzymes with similar functions, and analysis of such enzymes will advance our understanding of the metabolic differences among prokaryotic cells, and shed light on how metabolic activities may have evolved. Perhaps the most revealing result from the annotation of complete genome sequences is the presence of a large percentage of the genes [sometimes reaching 40% (H. influenza)] whose function is completely unknown (20). Strikingly, this is true even in E. coli, arguably

50

DOWNS AND ESCALANTE-SEMERENA

the best understood organism and the subject of decades of biochemical and genetic analysis.

3. USING GENOME SEQUENCE TO PREDICT METABOLIC PROPERTIES OF AN ORGANISM Although more and more genome sequences are being reported, it is neither practical nor economical to presume a complete annotated genome sequence will be available for all metabolically interesting microbes. This fact demands that innovative methods to pursue a genomic approach to metabolism be employed in these organisms. One particularly effective idea that has been put forward for obtaining sufficient sequence to proceed involves a strategy to generate a “metabolic fingerprint” of an organism without a complete annotated genome sequence (M. Lidstrom, personal communication). The idea is partially based on work by those whose goal is to “reconstruct metabolism” from complete annotated genome sequences (9). It has been suggested that with current technologies obtaining partial sequence of 95% of the genes could be completed at a fraction of the cost of a complete, annotated genome. The resulting sequences, obtained from overlapping clones, can be used to screen databases and thus provide information about the metabolic capabilities of the organism. Most significantly for our understanding of physiology, this work will provide raw material to direct experimental analysis of metabolism. This approach is attractive because it not only identifies potential metabolic capabilities by sequence homology but also facilitates functional studies since the gene(s) has been cloned in the process. This immediate access to genes of interest facilitates directed mutant generation and, thus, allows the phenotypic analysis that is critical to building a complete picture of an organism’s metabolism. In many organisms, studying metabolism/ physiology still requires a targeted approach where the effect that the absence of a specific enzyme has on growth under different conditions must be tested to confirm conclusions about metabolism. In addition, partial genome sequences can identify one or more genes indicating the presence of an unexpected metabolic pathway. A example illustrating this point was recently reported (21). These groups identified a cluster of genes in the aerobic methylotrophic bacterium Methylobacterium exotorquens AM1 whose sequence analysis predicted they encoded enzymes previously thought to be unique to the strictly anaerobic methanogenic archaea. In these prokaryotes, the identified functions are needed for the generation of methane from C1 compounds. Inactivation of these genes by insertional mutagenesis rendered M. exotorquens AM1 unable to grow on C1 compounds, indicating that these gene products function in aerobic C1

metabolism as well. This important discovery was accomplished in the absence of the complete, annotated genome sequence of M. exotorquens AM1, and emphasizes the value of analyzing sequences adjacent to those of immediate interest.

4. IDENTIFICATION OF GENES ENCODING TARGET FUNCTIONS FROM GENOME SEQUENCE Unlike the above sections, we now focus on approaches that at this point are unique to workers in model organisms whose complete genome sequence (or that of a close relative) is known, and a genetic system is available. In this context, the complete genome sequences of model organisms can be used to fill gaps left by decades of standard genetic and biochemical approaches to metabolism in these organisms. Because of the expertise of the authors, examples from E. coli and S. typhimurium will be emphasized although in principle these approaches are applicable to many organisms. In several metabolic pathways that have been biochemically defined, the genes encoding some of the predicted enzymes have not been identified. There are a number of reasons why such genes may have been missed by genetic screens alone: (i) functional redundancy; (ii) technical difficulty in the specific screen; (iii) inability to provide or transport the missing nutrient; or (iv) inability to correctly predict the mutant phenotype. The availability of a complete genome of the respective organism (or its close relative) allows researchers to perform a variation of the previously coined “reverse genetics.” In what is a direct illustration of this approach, the gene for 1-deoxy-D-xylulose was recently identified (22). These authors based their work on the premise that 1-deoxy-D-xylulose 5-phosphate (DXP), a precursor for thiamine, pyridoxol, and isoprenoids, was generated by an acyloin condensation reaction involving pyruvate and D-glyceraldehyde 3-phosphate. It was well documented in the literature that catalysis of this type could be performed by the E1 component of the pyruvate dehydrogenase complex and pyruvate decarboxylase. Further, these authors proposed that transketolase could also perform this reaction. With this idea in mind, the authors hypothesized that the gene for DXP synthase (dxs) would share sequence motifs with genes for the proposed redundant activities. In scanning the complete E. coli genome sequence, one ORF with the desired homology to E1, pyruvate decarboxylase, and transketolase was identified. The respective ORF was amplified and cloned, and the product overexpressed, followed by biochemical analysis that confirmed the predicted activity. The appropriate mu-

ELUCIDATION OF BACTERIAL METABOLISM

tant is being constructed to confirm in vivo function of this protein (M. Winkler, personal communication). The above approach requires that a specific enzymatic activity can be predicted in the context of functionally characterized gene products. With a complete genome sequence, the polymerase chain reaction (PCR) amplification of even multiple putative ORFs is reasonable. Thus genome scanning is analogous to a genetic selection that results in putative mutants which must be further analyzed to identify the correct one. In this case, the resulting product is a clone or clones (in the desired vector) of potential genes of interest. To confirm function these gene products can then be characterized biochemically, followed by mutant analysis. An extension of the above approach can be used to identify redundant and/or overlapping metabolic functions. In several cases, eliminating an enzyme with a demonstrated in vitro function fails to result in the predicted phenotype in vivo. Such a result can be indicative of redundancy that can sometimes be dissected genetically (23, 24). With complete genome sequences available, homologs of the respective protein can be identified and characterized as a potential source of functional redundancy. This approach has been successfully illustrated in two recent cases involving pyridoxine biosynthesis (25, 26). As always, one should keep in mind that redundant enzymes do not always have structural similarity.

51

may not be readily apparent. It is likely (particularly in well-studied, genetically tractable organisms) that knockout mutations resulting in a dramatic phenotype have been previously identified. While there is no question that functional genomic technologies will yield new information, and the existence of knockout libraries will speed work on metabolism, we feel it is unlikely that these technologies alone will close the gap in our knowledge of metabolic integration. The approach we feel has the most potential for this purpose is theoretically no different than the classic approaches used by bacterial geneticists and biochemists. Instead, we seek to reemphasize classic genetic approaches employing genomics and modern molecular techniques to facilitate the work. If researchers in this field pursue questions raised by phenotypic analysis, the function of many ORFs can be defined within the context of concrete metabolic questions or predictions. Such a phenotypic context is particularly beneficial for ORFs with no functionally characterized homologs. Often, approaches that pursue phenotypic characterization are misperceived as lacking the defined focused goals and molecular detail characteristic of in vitro work. It should be emphasized that this broad approach is not a substitute for molecular work but rather its value is in defining questions for molecular analysis, within a metabolic framework. 5.1. Analysis of Complex Phenotypes

5. USE OF GENOMICS TO DEFINE FUNCTION AND FACILITATE ANALYSIS OF METABOLIC INTEGRATION Two current challenges in microbial metabolism and physiology are: (i) identification of function for novel ORFs, and (ii) definition of the regulatory and biochemical integration of metabolic pathways that result in productive physiology. A number of new technologies have been developed to address these challenges. In general these approaches are of two types. First are those designed to detect, on a genomic level, the transcripts or proteins that are differentially expressed. These strategies are based on the valid assumption that patterns of expression yield information about function. Second, efforts are underway in several organisms to systematically knock out all genes, resulting in a defined library of mutants. Such a library can then be subjected to phenotypic analysis with the goal of elucidating function of the respective ORFs. Enthusiasm for the power of such approaches should be tempered by the problems that can be anticipated. These include (i) redundant functions preventing phenotypic display, (ii) essential nature of some genes, (iii) accumulation of suppressors masking respective phenotypes, and (iv) subtle or highly specific phenotypes that

The great deal of work performed by bacterial geneticists has used rich or glucose-based minimal medium under aerobic growth conditions. In several recent cases, additional phenotypes of a characterized mutant(s) have been identified under nontraditional growth conditions. These observations have led investigators to describe novel pathways in addition to unexpected pathway integration (27–33). For this reason, investigators interested in metabolism are urged to analyze mutant phenotypes under a variety of conditions, including poor carbon sources, limiting nutrients, conditions of respiration versus fermentation, anaerobiosis, pH, cell density, temperature, and osmolarity. Analysis of these additional phenotypes often provides a springboard into aspects of metabolism that could not be predicted by biochemical or computational methods. Unexplained (or complex) phenotypes, both those observed in the context of current metabolic work and those noted in the literature but not pursued, provide fertile ground for investigators to begin unraveling metabolic integration. Thorough analysis of phenotypes caused by a single mutation in various genetic backgrounds can be just as informative as altering growth conditions. This approach can be particularly rewarding if researchers are interested in understanding the strategies behind functional redundancy and/or alternative metabolic

52

DOWNS AND ESCALANTE-SEMERENA

pathways. In addition, this work will result in insights about how the cell can compensate for a metabolic defect, possibly by altering distinct pathways. In many cases, the phenotype caused by a single mutation will depend on the genetic background, thus suggesting integration of various pathways. We caution investigators performing these analyses to rigorously generate and work with isogenic strains, that is, strains differing by a single mutation. Because of the complexity of metabolism and our incomplete knowledge of its more subtle features, it is not unusual to observe differences in phenotypic behavior between two “wild-type” strains. This is likely due to fortuitous “selections” for various background mutations under conditions of storage and/or propagation. 5.2. Suppressor Analysis Mutations such as those alluded to above often appear to be dead ends in terms of learning more about metabolism. Many times the identity of the mutant locus (see below) is noninformative with respect to the observed phenotype, either because it is an unknown ORF or because its role is not immediately obvious. In this case, the powerful tool of functional suppressor analysis should be pursued. Functional suppression analyses are able to detect metabolic connections that are not always predictable by theoretical means and can be revealed only by the organism. Pursuing this avenue of research demands an appreciation for the power of phenotypic analysis and the knowledge that suppressor mutations will increase our understanding of the parental mutant phenotypes in the context of cell physiology. In general, the mutations of interest must result in a “loss-of-function” phenotype for suppressors to be identified by positive selection. We can divide the outcomes of suppression analyses into two general classes: (i) functional suppression, and (ii) metabolic suppression. Both of these classes identify connections that further our understanding of cell physiology. In the former case, suppression is achieved by providing one or more functions that are defective in the parental mutant. This can be accomplished by overproduction of a similar function (multicopy suppression) or by altering a chromosomally encoded function to carry out a function sufficiently similar to the one lacking in the parental mutant to allow growth. The feature that distinguishes this class from the second is that a function is being restored. The second class of suppression includes compensatory mutations that result in an alteration in metabolism such that the missing parental function is no longer needed. In other words, metabolic suppression highlights the ability of the cell to compensate by altering integrated pathways (27, 31, 34, 35).

5.2.1. Avoiding Biased Analysis To prevent biasing the outcome of screens for suppressors, it is important to allow the identification of all mutant types. To achieve this, mutant searches for suppressors can be done by screening plasmid pools and/or mutagenizing the respective strains in a way that will produce both missense and null mutants. In screening plasmid pools, one is simply relying on multicopy suppression of the defect. Thus restored growth by a plasmid could result from expression of a similar enzyme or increased levels of an enzyme that then changes the metabolic environment such that the defective function is unnecessary (34, 36). It should go without saying that to benefit from the suppressor analysis described here, both precise reversion and “true” complementation need to be eliminated in the screening process. 5.2.2. Identification of Chromosomal Mutations A productive method to generate chromosomal suppressor mutations involves mutagenesis of a large pool of strains, each of which carries a defective transposon randomly inserted in the chromosome (37, 38). By mutagenizing such a pool of cells or a resulting phage lysate (38, 39), one can select inheritance of the insertion element (drug resistance) and screen for the appropriate phenotype. The first advantage to this method is that either point or insertion mutations can be readily identified. This is important since it is not always possible to predict which type of mutation can cause the suppressed phenotype. Second, if a point mutation is required, an insertion linked to the mutation has been isolated in the screening process. A linked selectable marker greatly facilitates future work reconstructing the mutants and identifying the causative mutation. 5.3. Use of Genomics in Gene Identification A combination of genomics and modern molecular biological techniques has facilitated the rapid and efficient analyses of mutations resulting in relevant phenotypes. In the case of researchers working with either S. typhimurium or E. coli (also applies to any other closely related genera) where the complete genome sequence of at least one of the genera is known, once a mutation of interest has been identified, rapid sequence analysis allows one to determine map location and/or directly identify the affected ORF. In the case of insertions linked to point mutations, map location is particularly valuable as it provides a physical location to scan the genome for putative ORFs that may contain the mutation responsible for the identified phenotype. In the latter case, a particular ORF may have an obvious connection to the respective work, or it may require some imaginative thinking and/or educated guesses to determine which is the affected ORF. This

ELUCIDATION OF BACTERIAL METABOLISM

analysis is distinct from, but complementary to, the one described in Section 1.1.3, which dealt with protein homology searches. In the case here, nucleotide sequences are compared to determine the precise physical location in addition to putative function. This approach demands that one is working with DNA from the sequenced organism or that of a very close relative. For instance, sequence similarity and map conservation have allowed us to position the vast majority of sequences identified by our work in S. typhimurium by comparison with the annotated genome of E. coli. However, in some notable cases, the physical map is sufficiently different between these two organisms that it prevents such identification (40) (W. Metcalf, personal communication). Because of the distinction between organisms and the infallibility of sequence analysis and comparison, it is imperative that the location of respective ORFs be confirmed. The easiest way to do this is use the genetic map to identify linkage of the mutation of interest to markers predicted by sequence mapping to be close.

6. CONCLUDING REMARKS 6.1. Need for a Phenotype/ORF Database The microbiology literature is peppered with descriptions of phenotypes that have been ascribed to particular lesions. In many cases, the map position of the causative mutation has been determined, but in the context of the complete genome sequence no ORF has been assigned. In these mutant phenotypes there is significant information regarding the function of ORFs. Of particular interest are those ORFs for which no function can be attributed by sequence analyses. It is important that workers in the field correlate these phenotypes with specific ORFs, since such work has the potential to tie a large amount of past data to the genome sequences and thus avoid redundant efforts that would “rediscover” such genes. Only when all phenotypes associated with a specific gene are functionally explained can we hope to understand the role of the specific gene product in metabolism. 6.2. Value of Studies in Bacterial Metabolism to Workers in Higher Eukaryotes With the complete annotated genomes of many organisms becoming available, it is clear that there are many families of proteins that are conserved throughout biology. Several of these families are defined solely on sequence conservation and no functional role has been elucidated for them. Functional elucidation of members of these highly conserved families is likely to occur in prokaryotes due to the relative ease of manipulation of these organisms. It is important that the power of research in bacterial metabolism and physi-

53

ology be recognized in this capacity. Further, as prokaryotic physiologists, we need to recognize the significance of these homologies and put any functional information gained about these gene products with prokaryotic systems in the context of work being performed in higher eukaryotes. 6.3. Impact of Genomics on Metabolic Research The impact of genomics on work in bacterial metabolism can be emphasized if we focus on the rate at which gene/metabolic phenotype correlations can be established when the complete genome sequence is available. In the not too distant past, it would have taken 6 – 8 months to identify an affected gene and determine if the function of its gene product was known, particularly if dominant mutations were involved. Now, researchers can identify the respective gene in less than a month. With this kind of efficiency, the investigator can spend more time and energy designing experiments to address models for the role of the gene product that would explain, metabolically, the phenotype involved. Thus, genomics has allowed the field of metabolism to move at a speed that greatly reduces intellectual downtime, which in the past resulted from the need for tedious technical work to identify the defective gene. As with any approach, there are cautions to be issued when following a genetic approach such as that expounded here. To produce reliable, consistent results it is important to be rigorous in experimental conditions, nutrient concentrations, and strain constructions. Particularly when dealing with subtleties of metabolism, slight alterations in any of these things could change phenotypic outcomes and thus result in apparent problems in reproducibility and/or phenotypic interpretation. In addition, standard parameters of genetic analysis, such as dominance, reversion, and polarity effects of mutations must be kept in mind. Perhaps it is most important to emphasize that genetics, sequence analysis, molecular biology, and biochemistry each have limitations. To significantly increase our understanding of metabolic integration and cellular physiology will require appreciation for and utilization of broad, multidisciplinary approaches.

ACKNOWLEDGMENTS Work in the laboratory of D.M.D. is supported by Grant GM47296 from the National Institutes of Health and Grant MCB9723830 from the National Science Foundation. Work in the laboratory of J.C.E.-S. is supported by Grant GM40313 from the National Institutes of Health and Grant MCB9724924 from the National Science Foundation.

54

DOWNS AND ESCALANTE-SEMERENA

REFERENCES 1. Miller, J. H. (1972) Experiments in Molecular Genetics, Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. 2. Metcalf, W. W., Zhang, J. K., Apolinario, E., Sowers, K. R., and Wolfe, R. S. (1997) Proc. Natl. Acad. Sci. USA 94, 2626 –2631. 3. Robb, F. T., and Place, R. A. (Eds.) (1998) Archaea: A Laboratory Manual, Vol. 1, 3 vols., Cold Spring Harbor Laboratory, Cold Spring Harbor, NY. 4. Harwood, C. R., and Cutting, S. M. (eds.) (1990) Molecular Biological Methods for Bacillus, Wiley, New York. 5. Horswill, A. R., and Escalante-Semerena, J. C. (1997) J. Bacteriol. 179, 928 –940. 6. Shelver, D., Kerby, R. L., He, Y., and Roberts, G. P. (1995) J. Bacteriol. 177, 2157–2163. 7. O’Toole, G. A., and Kolter, R. (1998) Mol. Microbiol. 28, 449 – 461. 8. Caetano-Annoles, G. (1993) PCR Methods Appl. 3, 85–92. 9. Karp, P. D., Riley, M., Paley, S. M., Pellegrini-Toole, A., and Krummenacker, M. (1998) Nucl. Acids Res. 26, 50 –53. 10. Kunst, F., et al. (1997) Nature 390, 249 –256. 11. Tomb, J.-F., et al. (1997) Nature 388, 539 –547. 12. O’Toole, G. A., and Escalante-Semerena, J. C. (1995) J. Biol. Chem. 270, 23560 –23569. 13. Thompson, T. B., Thomas, M. G., Escalante-Semerena, J. C., and Rayment, I. (1998) Biochemistry 37, 7686 –7695. 14. Kaneko, T., et al. (1995) DNA Res. 2, 153–166. 15. Blanche, F., Debussche, L., Famechon, A., Thibaut, D., Cameron, B., and Crouzet, J. (1991) J. Bacteriol. 173, 6052– 6057. 16. Lawrence, J., and Roth, J. R. (1995) J. Bacteriol. 177, 6371– 6380. 17. Pollich, M., and Klug, G. (1995) J. Bacteriol. 177, 4481– 4487. 18. Bult, C. J., et al. (1996) Science 273, 1058 –1072. 19. Smith, D. R., et al. (1997) J. Bacteriol. 179, 7135–7155. 20. Fleischmann, R. D., et al. (1995) Science 269, 496 –512. 21. Christoserdova, L., Vorholt, J. A., Thauer, R. K., and Lidstrom, M. E. (1998) Science 281, 99 –102.

22. Sprenger, G. A., et al. (1997) Proc. Natl. Acad. Sci. USA 94, 12857–12862. 23. Trzebiatowski, J. R., O’Toole, G. A., and Escalante-Semerena, J. C. (1994) J. Bacteriol. 176, 3568 –3575. 24. Primerano, D. A., and Burns, R. O. (1983) J. Bacteriol. 153, 259 –269. 25. Zhao, G., Pease, A. J., Bharani, N., and Winkler, M. E. (1995) J. Bacteriol. 177, 2804 –2812. 26. Yang, Y., Tsui, T. H.-C., Man, T.-K., and Winkler, M. E. (1998) J. Bacteriol. 180, 1814 –1821. 27. Tsang, A. W., and Escalante-Semerena, J. C. (1996) J. Bacteriol. 178, 7016 –7019. 28. Downs, D. M., and Roth, J. R. (1991) J. Bacteriol. 173, 6597– 6604. 29. Downs, D. M. (1992) J. Bacteriol. 174, 1515–1521. 30. Enos-Berlage, J. L., and Downs, D. M. (1996) J. Bacteriol. 178, 1476 –1479. 31. Petersen, L., Enos-Berlage, J. L., and Downs, D. M. (1996) Genetics 143, 37– 44. 32. Zhao, G., and Winkler, M. E. (1996) FEMS Microbiol. Lett. 135, 275–280. 33. Man, T.-K., Zhao, G., and Winkler, M. E. (1996) J. Bacteriol. 178, 2445–2449. 34. Enos-Berlage, J. L., and Downs, D. M. (1997) J. Bacteriol. 179, 3989 –3996. 35. Frodyma, M. E., and Downs, D. M. (1998) J. Bacteriol. 180, 4757– 4759. 36. Claas, K., and Downs, D. M., unpublished results. 37. Kleckner, N., Roth, J. R., and Botstein, D. (1977) J. Mol. Biol. 116, 125–159. 38. Way, J. C., Davis, M. A., Morisato, D., Roberts, D. E., and Kleckner, N. (1984) Gene 32, 369 –379. 39. Hong, J.-S., and Ames, B. N. (1971) Proc. Natl. Acad. Sci. USA 68, 3158 –3162. 40. Elliott, T., Avissar, Y. J., Rhie, G.-E., and Beale, S. I. (1990) J. Bacteriol. 172, 7071–7084.