Whole genome sequencing of pathogens: a new era in microbiology

Whole genome sequencing of pathogens: a new era in microbiology

NEWS Refemnces 1 Jenkinson, H.F. (1994) FEMS Microbial. Lett. 121,133-140 2 Jenkinson,H.F. (1994)Trends Microbial. 2,209-212 3 Navarre, W.W. and Sch...

439KB Sizes 0 Downloads 86 Views

NEWS

Refemnces 1 Jenkinson, H.F. (1994) FEMS Microbial.

Lett. 121,133-140 2 Jenkinson,H.F. (1994)Trends Microbial. 2,209-212 3 Navarre, W.W. and Schneewind, 0. (1994) Mol. Microbial. 14,115-121

4 Foster,T.J.and McDevitt,D. (1994) FEMS Microbial. Lett. 118,199-206 5 Cleary, P. and Remoningrum, D. (1994)

Trends Microbial. 2,131-136 6 Schneewind, O., Mihaylova-Petkov, D.

AND

and Model, P. (1993) EMBO ,I. 12,

4803-4811 7 Schneewind, O., Fowler, A. and Faull, K.F. (1995) Science268, 103-106 8 Lee, S.F. (1995) Infect. lmmun. 63, 1940-1946 9 Ghuysen, J-M. (1991) Annu. Rev.

COMMENT

11 Pancholi,V. andFischetti,V.A.(1989) ]. Exp. Med. 170,2119-2133 12 Lee, S.F. (1992) b&t. Immun. 60, 4032-4039 13 Brady, L.J. et al. (1992) Infect.Immrm. 60,1008-1017 14 Hajishengallis, G., Nikolova, E. and Russell, M.W. (1992) Infect. Immun. 60,5057-5064

Microbial. 45,37-67

10 Fischetti, V.A., Horstmann, R.D. and Pancholi, V. (1995) Infect.Immw~ 63,149-153

15 Reinholdt, J., Friman, V. and K&an, M. (1993) Infect. Immun. 61, 3998-4000

Whole genomesequencingof pathogens:a new era in microbiology E. Richard Moxon

S

of DNA has equencing brought about a revolution in biology by making available the immense fund of historical information contained in the genomes of cells. Unicellular organisms appeared some 2 million years before the first primitive algae and over a billion years before the first animals and higher plants. Thus, a sensible and obvious starting point in the sequencing of entire genomes might be to tackle the prokaryotes, the smallest chromosomes that contain all the information that is required for the free-living state. Fred Blattner (University of Wisconsin, Madison, WI, USA) emphasized at the recent Workshop on Bacterial Genome Sequencing (Wellcome Trust Frontiers in Biology Conference) that major funding agencies have been reluctant to support the sequencing of bacterial genomes. However, over the past several months, unheralded and unappreciated by most of the molecular biology world, the status of sequencing bacterial genomes has suddenly changed dramatically. The recent workshop was the forum for a historic event in biology - the first public presentation, to an enthusiastic reception, of the first completed genome sequence of any free-living organism. This milestone was achieved by a team

Workshop on Bacterial Genome Sequencing, Wellcome Trust Frontiers in Biology Conference, Broadway, Worcester&ire, UK, 23-26 April 1995. E.R. Moxon is in the Institute of Molecular Medicine and Dept of Paediatrics, University of Ox ford, UK. tel: +44 1865 221074, fax: +44 1865 220479, e-mail: richard.moxon@ paediatrics.ox.ac.uk

led by Craig Venter and Rob Fleischmann at the Institute for Genomic Research (TIGR) in Gaithersburg, MD, USA, in collaboration with Ham Smith of Johns Hopkins University, Baltimore, MD, USA. The genome sequenced was that of Haemophilus infiuenzae, which was once mistakenly thought to cause epidemics of serious respiratory infections that are now known to be caused by influenza virus. Haemophilus infhenzae is a major cause of invasive childhood infections, including meningitis. The modest size and high ATnucleotide content of H. infruenzae offered the perfect challenge in a bid to sequence the first complete genome. The project used the large-scale sequencing ability of TIGR. In a remarkable feat of expertise and teamwork, the TIGR group assembled the sequence of

1.83 million nucleotides of circular double-stranded DNA in less than a year. A crucial factor in the success of the project was the application of a random sequencing approach and computational methods developed by TIGR for large-scale sequencing. Predictably, most of the H. infltrenzae genome consists of sequences representing potentially functional genes; there are a total of 1749 open reading frames, of which about 1200 have similarity with database sequences. Among these are complete sets of ribosomal proteins with similarity to Escberichia coli, and amino-acyltRNA synthetases. The encoded proteins have been grouped into functional categories to provide a broad picture of the proportion of the genome that is involved in various cell processes. The H. influenzae genome contains 1465 sequence motifs, each of about 29 nucleotides, distributed throughout the genome, which are the recognition sequences for the speciesspecific uptake of Haemopbiltrs DNA in transformation (Ham Smith and Jean-Franqois Tomb at Johns Hopkins University, Baltimore, MD, USA). Another finding was 11 loci consisting of highly repetitive, tandem tetranucleotide repeats; reiterated nucleotides have not

0 199.5 Elsevier Science Ltd

TRENDS

IN MICROBIOI_OC:Y

335

VOL.~

No.9

SEP-I-EMBER

1995

NEWS

AND

COMMENT

been considered to be a common feature of prokaryotic sequences. For example, there are 24 copies of GCAA at the 5’ end of a gene involved in lipopolysaccharide (LPS) biosynthesis. Loss or gain of one of these tetramers through polymerase slippage results in frame-shifting and variable gene expression, and is an adaptive mechanism for coping with the varying microenvironments of the host. Other tetrameric repeats also occur within host-interactive or virulence genes, the expression of which is switched on or off. Thus, searching for repetitive sequences provides a neat short cut for identifying virulence genes (Richard Moxon and Derek Hood, University of Oxford, UK). The availability of the complete genome sequence provides an unprecedented opportunity to examine the complexity of transcriptional signals, and may allow a global overview of gene expression in a given organism to be attempted (Steve Busby, University of Birmingham, UK). Examples include particular target DNA sequences for activator proteins in a protein-secretion regulon and the binding site for a transcriptional factor for genes involved in transformation (JeanFrancois Tomb). Guy Plunkett and Fred Blattner (University of Wisconsin, Madison, WI, USA) described their approach to monitoring the gene activities of E. cob under varying environmental conditions in vitro. In principle, this approach could also be used to analyse the activity of microbial genes of organisms infecting cells in tissue culture or tissues in the animal, thus attacking central issues of gene activity in the pathogenesis of infection. John Mekalanos (Harvard University, Boston, MA, USA) and Brendan Wren (St Bartholomew’s Hospital, London, UK) discussed the several hierarchical signalling systems of pathogenic bacteria that interact with the environment to control and coordinate genes involved in commensal and virulence behaviour. The conservation of the proteins making up these twocomponent signalling systems, such as the histidine protein kinases, makes the availability of the whole genome sequence a powerful source

TRENDSINMICROBIOLOGY

for searching for and inactivating these genes. An exciting possibility is the rapid progress that can now be made in elucidating biosynthetic pathways, especially for genes involved in the biosynthesis of complex macromolecules, which were previously difficult to identify. For example, the genes involved in LPS biosynthesis are scattered around the chromosome, and are elusive as there are few identifiable motifs or conserved DNA sequences to facilitate identification. However, searching the H. infiuenzae database for the deduced amino acid sequences of genes known to encode LPS-biosynthesis enzymes in other Gram-negative bacteria from the general database has been highly successful. Derek Hood described six novel LPS mutants that were obtained using this ‘topdown’ approach, that is, matching the DNA sequences or, better still, the deduced amino acid sequences of genes encoding LPS-biosynthesis enzymes already in the general database to the I-I. influenzae genome, thus giving more progress in the LPS genetics of this organism in 6months than had been achieved in as many years of the ‘bottom-up’ approach. The concept of a minimal set of genes that is required for the freeliving state was discussed at length. TIGR has also sequenced (in only 3months) the genome of Mycoplasma genitalium (Claire Fraser of TIGR and Clyde Hutchison from the University of North Carolina at Chapel Hill, NC, USA). This cell-wall-deficient organism can be grown readily in the laboratory, and has a’genome of only 0.6 Mb and approximately 500 genes. Questions of evolution, horizontal gene transfer, host interactions and the fundamental similarities and differences between organisms can begin to be answered by the comparison of whole genomes. Particularly exciting comparisons will be possible once the first complete sequences of archaebacteria are available later this year. The Dept of Energy of the USA is sponsoring three institutions to archaesequence representative bacteria, and Christoph Sensen

336

VOL.~

No.

9

(Institute for Marine Biosciences, National Research Council, Halifax, Canada) presented work on Sulfolobus, sponsored by the Canadian government. Much can be learned from the study of bacteria that inhabit extreme environments, for example, the thermophiles. Pyrobaculum is an archaebacterium with an optimal temperature of 100°C for growth. Several unanswered questions include how cellular processes occur at high temperatures and whether these organisms could be a source of enzymes for industrial use. From an evolutionary perspective, Pyrobaculum represents a group of organisms that are closest to a common ancestor of bacteria and eukaryotes; its complete genome sequence is expected to be completed within a matter of months (Jeffrey Miller, UCLA, Los Angeles, CA, USA). What of the future? Brian Spratt (University of Sussex, UK) argued that, although the feasibility of sequencing entire genomes and the opportunities that the results might give have been clear for some time, the potential contribution of the information available from microbial genomes has been underestimated. In the future, this ‘top-down’ approach will not only expand the knowledge of fundamental biological issues, such as the evolutionary relationships among the bacteria, archaebacteria and eukaryotes, but will also be the most efficient way to facilitate approaches leading to improved treatment or prevention of major microbial diseases, such as tuberculosis, trachoma and peptic ulcer. Another major message is that it will be increasingly neither sensible nor cost effective for small, typically university-based research laboratories to be involved directly in large-scale genome sequencing. The speed and efficiency of largescale sequencing in dedicated facilities using the latest technology is the most efficient way forward. However, there was lively debate (Doug Berg, Washington University, St Louis, MO, USA) about the complexities of funding, which has often been provided by the government and the commercial

SEPTEMBER

1995

NEWS

sector. In consequence, constraints will occur from the requirement for companies and governments to get appropriate benefits. There is an important opportunity for public charities to be involved, not only to support the projects themselves,

AND

but also to orchestrate interactions between scientists (John Stephenson, Wellcome Trust, London, UK). Perhaps the major message of the meeting was best encapsulated by Stewart Cole (Institut Pasteur, Paris, France): ‘The sequencing of entire

COMMENT

genomes is the most cost-effective way to acquire huge amounts of new knowledge.’ Participants left the meeting convinced of the truth of this statement, and with compelling evidence for the feasibility and utility of the sequencing approach.

Horizons: a selection from recent publications ModeMingmycobacteria Although tuberculosis is an ancient disease, major epidemics of the disease in Europe and the USA only started in the 17th century. These epidemics peaked at the end of the 18th century or the beginning of the 19th century and, paradoxically, began to decline at the beginning of this century, long before effective drugs became available in the 1940s. To study this problem, Blower and colleagues have mathematically modelled the epidemiology of tuberculosis. They consider an immunocompetent population (without reinfection or treatment) in which an individual infected with Mycobacterium tuberculosis may (1) never develop disease (the majority), (2) develop disease soon after infection (primary tuberculosis), or (3) develop disease many years later (reactivation tuberculosis); individuals

Framingthe retrovirus Translational frame-shifting in retroviruses is tightly controlled as it is critical to the expression of the correct ratio of Gag to Pol for assembly. This control involves well-characterized viral sequence motifs and RNA tertiary structures, as well as cellular factors. Lee et al. have studied the cellular factors involved in a yeast model, using a construct containing the mouse mammary tumour virus gag-pro -1 frame-shift site linked to a reporter gene. They have identified two

with the disease may recover and may later relapse. The model shows that tuberculosis epidemics can be broken down into a series of three linked s&epidemics due to primary, reactivation and relapse disease; ‘young’ epidemics have a high proportion of primary cases in younger individuals, while ‘mature’ epidemics are shifted towards reactivation tuberculosis in older age groups. Interestingly, the model predicts that epidemics last on average between one and a few hundred years, suggesting that the last epidemic simply died out naturally. Furthermore, the resurgence of cases over the past decade has all the characteristics of the start of a new young epidemic. Blower, S.M. etal. (1995) The intrinsic transmission dynamics of tuberculosis epidemics Nature Med. 1, 815-821

yeast genes, IFS1 (also termed UPF2 and NMDZ) and IFS2, mutation of either of which doubles the frequency of -1 frame-shifting. Such cellular products are novel targets for antiretroviral drugs that might disrupt this tightly regulated stage in the retroviral life cycle. Lee, !%I., Umen, J-G. and Varrnus, H.E. (1995) A genetic screen identifies cellular factors involved in retroviral -1 frameshifting Proc. Nutf Acad. Sci. USA 92,6587-6591

T cells too Epstein-Barr virus (EBV), better known for its effects on B cells, has been shown to infect T cells transiently in culture and to be associated with several T cell neoplasia, including peripheral T cell lymphoma; however, little is known about the interaction between the virus and T cells. Yoshiyama and coworkers have infected MT-2, a human T cell line infected with human T cell leukaemia virus type 1, with a recombinant EBV with a selectable neo’ marker; however, other T cell lines could not be infected, possibly due to differences in the level of CD21, the EBV receptor. Selection led to the isolation of persistently EBV-infected MT-2 cells, which had episomal EBV DNA and which expressed EBNAl (from the BamHl F promoter) and LMPl, but not EBNA2. These features are characteristic of the poorly characterized latency II pattern of gene expression, previously only seen in nasopharyngeal carcinoma, Hodgkin’s lymphoma and certain other non-B cell tumours. This first in vitro model of latency-II-type EBV infection should facilitate study of the role of EBV in T cell tumours and other non-B cell tumours. Yoshiyama, H., Shimizu, N. and Takada, K. (1995) Persistent EpsteinBarr

virus

infection

virus expression 3706-3711

0 1995 Elsevier Science Ltd

TRENDS

IN

L&&CROBIOLOGY

337

VOL.

3

No.

9

in

a

human

T-cell line: unique program of latent

SEPTEMBER

1995

EMBO

J.

14,