REVIEWS
S
Y
S
DE MO L S TEM
Transposon-based approaches to identify essential bacterial genes Nicholas Judson and John J. Mekalanos
T
he completion of many found and affects subsequent Transposons are a powerful tool for bacterial and eukaryotic interpretation of the results. identifying genes essential for bacterial genome sequencing projNot all genes that are essenviability. The availability of many bacterial ects has led to the generation tial for viability will be easily genome sequences and the large number of of a large amount of sequence identifiable. Many biological genes of unknown function therein have information. Various approcesses, especially essential inspired the generation of a variety of proaches have been taken to ones, are accomplished by muldifferent approaches. These methods are described and their advantages and determine which open-reading tiple redundant or partially disadvantages are discussed. frames (ORFs) are required overlapping pathways. Gene for growth or survival of an duplication events also occur, organism, ranging from biowhich complicate further analyinformatic analysis1 to directed knockouts of all the sis as a knockout of one copy of the gene might not ORFs in yeast2. High-throughput methods to identify have a lethal phenotype. The identification of essential which ORFs are essential for an organism’s survival, genes of this type is difficult and as yet no systematic especially with the large number of ORFs of unknown experiments have addressed this particular gene class. function that are present in sequenced organisms3, are of great interest to the research community. Methods to identify essential genes Non-transposon-based methods What is essential in a bacterial genome? There are several methods to identify essential genes The largest region of a bacterial genome that is essen- that do not rely on transposons, including comparatial for viability contains protein-coding ORFs. Other tive genomics1 and the directed knockout of genes2. essential sequences include regions of regulatory More traditional analysis uses the generation of conDNA, RNA species and the origin of replication. Ap- ditional mutations that affect growth, such as temperproximately 40% of the ORFs defined by computer ature-sensitive (TS) mutations. TS mutations are analysis of completed genomes are of unknown func- generated by chemical mutagenesis and result in untion; many of these have no similarity to any other marked point mutations that allow the identification ORFs in the available databases. of essential genes by growth at a permissive temperaEssentiality is typically defined operationally. It is ture but not at a non-permissive temperature. The usually assayed by testing for colony formation on an location of the mutation is determined by a mapping agar plate containing ‘rich media’, a rich and complex and/or complementation cloning strategy4. The drawmixture of nutrients. The results from this type of ex- backs of this approach are that mapping is not possperiment tell us about the requirements of an organ- ible (without great effort) in many bacterial species ism for growth in a controlled environment and will and that random screens for TS mutants can result in not directly correlate with conditions that the organ- ‘jackpots’ created by repeated isolation of the same ism naturally encounters in vivo. However, there are mutant classes, presumably because such gene products many possible conditions under which one can assay are particularly easy to mutate to TS alleles5. A more for essentiality. One can assay for genes essential for recent approach that builds on comparative genomics1 growth on a defined minimal medium to identify involves the targeted knock-out of specific genes after genes involved in basic metabolism. One can also bioinformatic analysis6. This approach uses the premise assay for growth in liquid culture (rich or minimal), that genes of unknown function that are conserved for growth in an animal model or for growth in the between organisms are more likely to be essential than bacterium’s natural host or environment. Specific as- non-conserved genes. Gene-by-gene PCR-product desays can be used to identify genes essential for particu- sign, cloning and attempted knockout of the gene in lar processes, such as cell division, spore formation, question are used to determine if the gene is essential. motility and drug resistance. Transposon-based methods Thus, when describing a study N. Judson and J.J. Mekalanos* are in the Dept of Transposons provide an alundertaken to identify ‘essenMicrobiology and Molecular Genetics, Harvard ternative method for defining tial’ genes, it is important to Medical School, Boston, MA 02115, USA. essential genes. There are a define how this ‘essentiality’ *tel: 11 617 432 1935, number of different approaches was assayed, as the assay defax: 11 617 738 7664, e-mail:
[email protected] available. The utility of each termines which genes will be 0966-842X/00/$ - see front matter © 2000 Elsevier Science Ltd. All rights reserved. TRENDS
IN
MICROBIOLOGY
521
VOL. 8
NO. 11
PII: S0966-842X(00)01865-5 NOVEMBER 2000
REVIEWS
approach varies, depending on the type of information one is interested in obtaining and what genetic systems are available in the organism of interest. Transposons are segments of DNA that can move (transpose) from one location in a genome to another. Extensive reviews on the many types of transposons and the mechanisms by which they transpose are available (e.g. see Ref. 7). The simplest transposon is a segment of DNA flanked by sequences (often these are inverted repeats) that are recognized by a protein, transposase, which enables the transposon to transpose. The locations to which a transposon can move depend on the sequence that the transposase recognizes and cleaves, although the recognition sequence for some transposons is unclear or has not yet been determined. Transposon mutagenesis results in disruption of the region of the genome where the transposon has inserted. If an insertion within a predicted ORF allows (a) PCR product
Bacterial chromosome
PCR
In vitro transposon mutagenesis
the resulting strain to form a colony, it is unlikely that ORF is essential for viability under those conditions. (This is not the case if there are multiple copies of the same gene; some genes can be disrupted near the 59 or 39 end and can still produce functional protein8.) Conceptually, there are two ways to identify essential genes or regions of the bacterial chromosome: (1) the ‘negative’ approach, which identifies many regions that are not essential and presumes that everything else is essential (Fig. 1); and (2) the ‘positive’ approach, which identifies genes that are essential by generating a conditional mutation and showing that it has a lethal phenotype (Fig. 2). Why two approaches? The obvious problem with trying to identify essential genes is that a knockout of an essential gene is lethal. To get around this limitation, one can use the negative approach and define the location of enough transposon insertions to enable one to say with some certainty that regions in which
Recover viable chromosomal insertions
PCR analysis on the pool of mutants
Pool and map by PCR
Transposition Transform
Chromosomal primers
Run gel Not recovered
Chromosomal primer
Window corresponding to a putative essential gene.
Transposon primers Transposon: Inverted repeat Drug resistance marker Inverted repeat
(b)
Viable chromosomal insertions
Possible chromosomal insertions Bacterial chromosome
PCR products on an agarose gel
Transposition
Sequence many insertions Sequence junctions
Recover
Putative essential gene
Define genes as non-essential. Allows estimation of total number of essential genes in an organism.
Insertion in an essential gene trends in Microbiology
Fig. 1. Negative approaches to identify essential bacterial genes: identify essential genes by identification of regions that are not essential. Regions of the chromosome that cannot be disrupted are presumed essential. (a) PCR-mapping approaches. A specific region of the bacterial chromosome is amplified by PCR and is subjected to in vitro mutagenesis with a transposon containing a drug-resistance marker. The small horizontal arrows signify some of the possible locations of insertions in the PCR product. Genes are designated by vertical black (non-essential) or red (essential) arrows. Insertions that allow viable colonies to form are pooled and analysed by PCR. Upon analysis, those regions that do not allow insertions (crossed out arrows) show up as empty regions on the agarose gel and are presumed to define essential genes. (b) Global transposon mutagenesis. The whole bacterial chromosome is the target for transposon mutagenesis. A large number of viable insertions are analysed by sequencing. This method requires many insertions to be sequenced before statistically significant conclusions can be drawn.
TRENDS
IN
MICROBIOLOGY
522
VOL. 8
NO. 11
NOVEMBER 2000
REVIEWS
Viable chromosomal insertions
Bacterial chromosome
Screen for conditional mutants +/– inducer
+ inducer
Transposition in presence of inducer
Isolate conditionally viable insertion
Sequence junction
Insertion defines gene as essential.
– inducer
Insertion in an essential gene
Transposon: Inverted repeat Drug resistance marker Outward-facing inducible promoter Inverted repeat
trends in Microbiology
Fig 2. The positive approach to identify essential genes: identify directly genes that are essential. Transposition with a transposon containing an outward-facing inducible promoter at one edge in the presence of the inducer results in many possible transposon insertions. The horizontal arrows signify possible insertion locations on the bacterial chromosome. Screening identifies insertions that disrupt the promoter region of an essential gene (red arrow). The strain generated by such an insertion is dependent on the inducer for viability. The insertional junction is sequenced, allowing the identification of the downstream essential gene.
transposon insertions are not observed are likely to be essential. Three recent reports have used this negative approach8–10. Akerley et al.8 and Reich et al.9 described similar PCR-mapping approaches that allow saturating in vitro transposon mutagenesis of PCRamplified chromosomal segments of naturally competent organisms, uptake of these mutagenized fragments and homologous recombination into the chromosome. PCR analysis using one primer within the transposon and one in the chromosome is used to determine the location of the insertions (Fig. 1a). The large number of insertions obtained by these methods allows visualization of ‘windows’ in which no transposon insertions are found. These windows identify essential regions of the chromosome because transposon insertions in essential genes prevent growth of the cells, deleting them from the pool that is analysed by PCR. By aligning the windows with predicted ORFs, one can identify essential ORFs. Their techniques differed in the choice of transposon: Akerley et al. used a transposon (mariner) with a 2-bp recognition sequence and Reich et al. used a transposon (Ty-1) with a 4-bp recognition sequence. For a list of advantages and disadvantages of these and other methods, see Table 1. Another negative approach is to perform largescale transposon mutagenesis (global transposon mutagenesis) and sequence many of the resulting insertions (Fig. 1b)10. This approach is not feasible for most researchers as it requires a large number of insertions before statistically significant conclusions
TRENDS
IN
MICROBIOLOGY
523
can be drawn. Given a large number of insertions and with certain assumptions (e.g. the specificity of the transposon and whether insertions in the 59 or 39 region of an ORF qualifies that ORF as non-essential), one can assign a probability that a specific ORF is essential. This technique has an advantage over the PCR-mapping approaches in that it does not require the organism to be naturally competent, but a drawback is it does not identify any essential genes until saturating mutagenesis is approached. Even then, one can only assign a probability that an ORF is essential. The second conceptual approach – the positive approach – defines genes that are essential instead of defining genes or regions of the chromosome that are non-essential. This approach defines essential genes by replacing the gene’s natural promoter with an inducible one (Fig. 2). Transposition with a transposon that has an outward-facing inducible promoter at one end of the transposon into the promoter region of a gene creates a gene in which the function of the natural promoter is replaced by an inducible promoter. If the gene is essential, the bacterial strain is now dependent on the inducer for growth or survival. This approach was first proposed more than a decade ago11, and modified methods have subsequently been used to identify a number of genes with conditional growth phenotypes12,13. A recent report14 used this approach to define genes essential for growth or survival of Vibrio cholerae on Luria-Bertani (LB) agar (rich medium) and in broth. Judson and Mekalanos14 used an arabinose-inducible
VOL. 8
NO. 11
NOVEMBER 2000
REVIEWS
Table 1. Advantages and disadvantages of transposon-based approaches to identify essential bacterial genesa Type of Name approach
Description
Advantages
Negative
All negative approaches
Identify essential genes by defining non-essential regions and assuming what cannot be disrupted is essential.
Can define sites within an otherwise essential ORF that are permissive for insertions.
PCR-mapping approaches
Mapping non-essential regions by in vitro transposition and PCR on a short, defined segment of DNA.
Global Analysis of a large number transposon of random chromosomal mutagenesis insertions to define regions that cannot be hit.
Positive
All positive approaches
Disadvantages
Refs
Does not identify essential genes. Further strain construction and testing of candidate ORFs is required to confirm essentiality of putative essential genes. Intermediate phenotypes especially require further analysis: 59 or 39 insertions in an ORF may or may not mean that the gene is essential. Operon structure can pose problems for analysis. Analysis can be performed on Large-scale analysis is resource 8,9 a small or large scale: intensive, requires many specific genome regions oligonucleotide primers. Restricted (5–10 kb) are analysed to naturally competent organisms. individually, this makes saturation easy (in a largescale analysis, gives a defined endpoint). Small target sequence for both transposons [(TA dinucleotide for mariner (this occurs, on average 1 in 16 nucleotides), TGTT for Ty-1 (1 in 256)]. No sequencing required. Does not require a Cannot define essential chromosomal 10 naturally competent regions or ORFs as essential unless organism. saturation is approached, therefore can only be used as a method to estimate the number of essential genes. Need to approach saturation before any conclusions can be drawn. Resource intensive.
Identify essential genes by substitution of an essential gene’s natural promoter with an inducible one, generating a conditional mutation.
Analysis can be performed on Insertions upstream of every essential a small or large scale. Every gene might not be possible. gene identified is a gene of Saturating mutagenesis of a genome interest; genes that are not is laborious to achieve. Expression strictly essential can be levels of the inducible promoter will easily tested for essentiality not be broad enough to identify every under other growth essential gene (basal expression of conditions. The conditional the promoter might be too high or strain can be used for maximal expression might not be further biochemical analysis. enough). For operons, the essential Essential ORFs with nongene might not be immediately essential 59 regions that are downstream of the insertion; permissive for transposon further analysis is required and insertions will still generate operons with coupled translation a conditional phenotype. might not be possible to hit. TnAraOut A transposon containing an The arabinose promoter has a 14 outward-facing arabinoselarge induction ratio; small inducible promoter. target sequence (TA dinucleotide); broad host range transposon. Tn5tac1 A transposon containing an There is a higher level of basal 11 outward-facing IPTGexpression and a smaller induction inducible promoter. ratio for the tac promoter as compared with the arabinose PBAD promoter. Divergent transcription from the Larger target sequence for Tn10 12,13 mini-Tn10 transposons tetA and tetR genes in the increases the difficulty of obtaining presence of tetracycline insertions upstream of essential results in tetracycline-inducible genes. conditional phenotypes.
a
Abbreviation: ORF, open-reading frame.
TRENDS
IN
MICROBIOLOGY
524
VOL. 8
NO. 11
NOVEMBER 2000
REVIEWS
promoter (PBAD) in a transposon termed TnAraOut to define 16 insertions that showed an arabinose-dependent growth phenotype. Five of these insertions defined genes essential for colony formation on LB, and 11 defined ORFs whose full expression allowed wild-type levels of growth, but which, in the absence of arabinose, showed a reduced growth rate, with the formation of smaller colonies. The latter insertions defined genes that are not essential but whose expression is required for optimal growth. However, the conditions under which essentiality is tested affect the outcome of the experiment. It was previously known that the gene encoding triose phosphate isomerase (tpi) was essential for growth on minimal medium with glycerol as the sole carbon source15. One of their insertions resulted in a strain with an arabinose-inducible tpi. This gene product is not essential for growth on LB plates but, as expected, is essential (i.e. requires the presence of arabinose in the growth medium) when colony formation is assayed on minimal plates. Therefore, this approach has the potential to identify genes of unknown function that might not be essential on LB, but which could prove to be essential under other conditions. Conclusions Whether using transposons or not, the assay conditions employed to identify the essential regions of a chromosome are crucial. Transposons provide a powerful and potentially rapid approach for identifying essential genes. There are currently two conceptual approaches to defining essential genes, each of which has advantages and disadvantages. Analysis of essential genes on a genomic scale is a large undertaking and, depending on the method, requires extensive sequencing or PCR. A significant advantage of the PCR-mapping approach is that a large number of viable colonies in which transposon mutagenesis was targeted to a specific region of the genome are pooled. Analysis of the resulting PCR products on an agarose gel results in the visualization of ‘windows’ that correspond to putatively essential ORFs. The high degree of saturation allows significant conclusions to be drawn about the essentiality of multiple ORFs within that region. Global transposon mutagenesis, however, does not easily generate the number of insertions that are required to have the same degree of certainty about the absence of any transposon insertions in a given ORF. In the absence of a saturating experiment, further experimentation is required for definitive proof that a specific gene that has not been disrupted is essential. A drawback to the negative approach is that it does not actually show that the gene in question is essential. Although as the number of insertions in surrounding genes increases it becomes increasingly unlikely that a gene that has not been disrupted is not essential, until a biological experiment demonstrating essentiality is undertaken, there is still a formal possibility that the gene is non-essential. The converse is also true when these analyses are applied to essential genes that are permissive for insertions in particular sites. A gene
TRENDS
IN
MICROBIOLOGY
525
that allows insertions in a few locations within the coding region could still be essential8. Consequently, the negative approach identifies genes that should be analysed further by functional studies. The positive approach for identifying essential genes has several advantages: there is a built-in assay for biological function that does not require the construction of complementing plasmids to show that an ORF is essential; any conditional insertion that is generated can be used for biochemical analysis of the gene product in question without further strain construction; and essential ORFs with non-essential 59 regions that are permissive for transposon insertions will still generate a conditional phenotype. It also has some drawbacks: an inducible promoter might not provide enough expression to overcome the defect created by knocking out the natural promoter or, conversely, basal expression of an inducible promoter might be too high to allow identification of essential genes for which only minute amounts of gene product are required. The size of the target in which an insertion can occur varies depending on the gene and will generally be small; it might be difficult or impossible to hit for some genes. Operon structures pose problems for all approaches. Polar effects from the presence of an insertion can be severe, limiting insertions upstream of an essential gene. Although the positive approach might overcome this limitation for some operons, if translation of a downstream essential gene is coupled to a gene that is disrupted by the presence of the insertion, these insertions will not be found. If there is not coupled translation, the essential gene could be several genes downstream of the inducible promoter, requiring confirmation of the proposed essential gene by complementation studies. Although simple in concept, essentiality is a difficult question to address experimentally. It is possible to perform experiments under defined conditions by using defined media or growth conditions; however, this does not simplify the genetic interplay of essential processes that are taking place within a bacterium. Many gene disruptions will result in an intermediate phenotype that makes it hard to classify a gene as ‘essential’ or ‘non-essential’. This is compounded by the fact that many genes have some degree of redundancy, with other pathways able to compensate to a certain extent for the mutation in question. However, transposon-based approaches have identified genes that researchers should focus on to further our understanding of basic biological processes and the way these processes interact. Acknowledgements We thank Jonathan Blum for critical reading of the manuscript. This work was supported by grant AI-26289 from the National Institute of Allergy and Infectious Diseases. References 1 Mushegian, A.R. and Koonin, E.V. (1996) A minimal gene set for cellular life derived by comparison of complete bacterial genomes. Proc. Natl. Acad. Sci. U. S. A. 93, 10268–10273
VOL. 8
NO. 11
NOVEMBER 2000
REVIEWS
2 Winzeler, E.A. et al. (1999) Functional characterization of the S. cerevisiae genome by gene deletion and parallel analysis. Science 285, 901–906 3 Tang, C.M. et al. (1998) Microbial genome sequencing and pathogenesis. Curr. Opin. Microbiol. 1, 12–16 4 Schmid, M.B. et al. (1989) Genetic analysis of temperaturesensitive lethal mutants of Salmonella typhimurium. Genetics 123, 625–633 5 Kaback, D.B. et al. (1984) Temperature-sensitive lethal mutations on yeast chromosome I appear to define only a small number of genes. Genetics 108, 67–90 6 Arigoni, F. et al. (1998) A genome-based approach for the identification of essential bacterial genes. Nat. Biotechnol. 16, 851–856 7 Berg, D.E. and Howe, M.M., eds (1989) Mobile DNA, ASM Press 8 Akerley, B.J. et al. (1998) Systematic identification of essential genes by in vitro mariner mutagenesis. Proc. Natl. Acad. Sci. U. S. A. 95, 8927–8932
9 Reich, K.A. et al. (1999) Genome scanning in Haemophilus influenzae for identification of essential genes. J. Bacteriol. 181, 4961–4968 10 Hutchison, C.A. et al. (1999) Global transposon mutagenesis and a minimal mycoplasma genome. Science 286, 2165–2169 11 Chow, W.Y. and Berg, D.E. (1988) Tn5tac1, a derivative of transposon Tn5 that generates conditional mutations. Proc. Natl. Acad. Sci. U. S. A. 85, 6468–6472 12 Rappleye, C.A. and Roth, J.R. (1997) A Tn10 derivative (T-POP) for isolation of insertions with conditional (tetracycline-dependent) phenotypes. J. Bacteriol. 179, 5827–5834 13 Takiff, H.E. et al. (1992) Locating essential Escherichia coli genes by using mini-Tn10 transposons: the pdxJ operon. J. Bacteriol. 174, 1544–1553 14 Judson, N. and Mekalanos, J.J. (2000) TnAraOut, a transposonbased approach to identify and characterize essential bacterial genes. Nat. Biotechnol. 18, 740–745 15 Irani, M.H. and Maitra, P.K. (1977) Properties of Escherichia coli mutants deficient in enzymes of glycolysis. J. Bacteriol. 132, 398–410
Coming soon with your December issue...
New technologies for the life sciences: A Trends Guide This special issue has been commissioned to celebrate 25 years of Trends publishing. The first Trends journal – Trends in Biochemical Sciences (TiBS) – was published in 1976. Since that time, the family of Trends journals has grown to include 14 journals covering neuroscience, pharmacology, immunology, biotechnology, genetics, parasitology, ecology & evolution, endocrinology & metabolism, cell biology, microbiology, cognitive science, molecular medicine and plant science.
New technologies for the life sciences: A Trends Guide comprises review articles that examine the impact of new technologies on research in the fields covered by each of the Trends journals. In the Foreword, the impact of new technologies on publishing is also examined, and this provides us with the opportunity to speculate on the future of scientific publishing! Articles featured in this exciting supplement: Is publishing to perish? A sea change afoot for life science publishing and information by Johanna McEntyre Recent advances in MRI: novel contrast agents shed light on in vivo biochemistry by Angelique Y. Louie and Thomas J. Meade
High-throughput SNP discovery and typing for genome-wide genetic analysis by Thomas A. Weaver Dynamic substrates: modulating the behaviors of attached cells by Nuhammad N. Yousaf and Milan Mrksich
Why use more than one electrode at a time? by Wolf Singer
Using peptide aptamers to analyse the proteome by Alejandro Colman-Lerner and Roger Bent
Behavioral phenotyping of mutant mice by Jacqueline N. Crawley
Real-time detection of PCR products and microbiology by Jeanne A. Jordan
Watching lymphocytes work: advances in imaging meet immunological challenges by W. Richard Burack and Michael L. Dustin
Tumor classification using gene expression patterns from DNA microarrays by Charles M. Perou, Patrick O. Brown and David Botstein
Identification of intracellular signaling domains by Tamas Balla RNAi - applications in parasitology by Elisabetta Ullu and Christian Tschudi
Studying interaction transcriptomes: co-ordinated analyses of gene expression during plant–microorganism interactions by Paul R.J. Birch and Sophien Kamoun
Reconstructing the Tree of Life by David M. Hillis and Mark T. Holder
Probing the mind with magnetism by Lauren Stewart and Vincent Walsh
TRENDS
IN
MICROBIOLOGY
526
VOL. 8
NO. 11
NOVEMBER 2000