Inferences from whole-genome sequences of bacterial pathogens

Inferences from whole-genome sequences of bacterial pathogens

719 Inferences from whole-genome sequences of bacterial pathogens Thomas S Whittam and Alyssa C Bumbaugh Genomic sequencing of bacterial pathogens ha...

74KB Sizes 0 Downloads 45 Views

719

Inferences from whole-genome sequences of bacterial pathogens Thomas S Whittam and Alyssa C Bumbaugh Genomic sequencing of bacterial pathogens has recently moved from the study of distantly related organisms to withinspecies comparisons of multiple strains. Strains often differ in their ability to cause disease, and comparative genomics is uncovering novel virulence determinants, hidden aspects of pathogenesis, and new targets for vaccine development. DNA microarrays and other gene-survey techniques are being used to quantify variability in gene content within bacterial populations, and to reveal the strain-specific basis for diversity and severity of pathology. Addresses Microbial Evolution Laboratory, National Food Safety and Toxicology Center, Michigan State University, East Lansing, Michigan 48824, USA Correspondence: Thomas S Whittam; e-mail: [email protected] Current Opinion in Genetics & Development 2002, 12:719–725 0959-437X/02/$ — see front matter © 2002 Elsevier Science Ltd. All rights reserved. Abbreviations DFR difference region GlpD glycerol-3-phosphophate dehydrogenase ORFs open reading frames PFGE pulsed-field gel electrophoresis Sla streptococcal phospholipase A SpeA streptococcal pyrogenic exotoxin A SpeK phage-associated streptococcal pyrogenic exotoxin K SSA streptococcal superantigen SSH suppressive subtractive hybridization Stx Shiga toxin

Introduction In common with other branches of genetics, the study of pathogenic microbes is undergoing a paradigm shift. The enormous influx of information from genome sequencing projects is revolutionizing the science of bacterial pathogenesis, ranging from understanding the most basic aspects of gene content and pathogen genome organization, to elucidating the regulatory networks of virulence gene expression, to investigating the global patterns of host response to infection. The sequencing of >200 bacterial genomes has been completed and many more projects are on the horizon. The sequencing of bacterial pathogens has recently moved from the study of distantly related organisms to withinspecies comparisons of multiple strains. The purpose of our review is to highlight some of the recent findings derived from the comparison of closely related genomes, usually strains from the same bacterial species. These comparisons are providing fundamental insights into the nature of virulence and the molecular basis of pathogenesis. The within-species variability in gene content, genomic organization, and gene expression may account for variation in the severity of disease and for the diverse clinical

outcomes of infection. Finally, the comparative study of closely related genomes is improving our understanding of the evolutionary pressures on small genomes and of factors involved in the emergence of new infectious diseases.

Genomic basis of the variation in virulence In many bacterial infections there can be a variety of disease manifestations and clinical outcomes, the basis of which is often not well understood. One factor that can contribute to differences in severity is the variation in virulence that exists among strains of a bacterial population. Recent applications of methods from comparative genomics have made headway in addressing this issue for specific human pathogens. For example, Helicobacter pylori is a bacterium that colonizes the human stomach where it can cause a wide spectrum of disease ranging from asymptomatic gastritis to ulcers to gastric cancer [1]. The most severe disease is associated with strains harboring a specific DNA segment, called a pathogenicity island, that specifies a cytotoxin together with a bacterial type IV secretion system that translocates the toxin into host cells [1]. A comparison of the entire genomes of two H. pylori strains revealed a high level of similarity in the gene content and order. For example, of the 1495 genes in strain J99, 84.5% have adjacent genes on both sides which are homologous to those flanking the same gene in the other strain’s genome. However, 6%–7% of the genes in each genome are specific to that strain, and most of the strain-specific genes occur in a single hyper-variable region [2]. To characterize more fully the diversity of clinical H. pylori strains, Salama et al. [3] used a whole-genome DNA microarray and found that ~20% of the genes among 15 clinical isolates were strain-specific. Some of the strainspecific genes encode cell-surface proteins whose variability may contribute to persistence of bacteria during long-term infections [3].

Genomic comparisons reveal pathogen-specific genes Escherichia coli O157:H7 is a cause of food and water borne illness that was first linked to human infections in 1982 and is now a public health problem worldwide. The full extent to which this pathogen differs from the normally harmless E. coli of the gut was revealed through genome sequencing. Hayashi et al. [4••] completed the first pathogenic E. coli genome sequence for an O157:H7 strain that was originally isolated from a major food-borne disease outbreak in Sakai, Japan in 1996. At about the same time, a second, near-complete sequence was reported for a different isolate of E. coli O157:H7 [5•], a strain that was originally recovered from hamburger meat implicated in the first outbreaks of the novel intestinal disease in North America [6]. The two O157:H7 genomes are very

720

Genomes and evolution

similar to each other and dramatically different from the genome of E. coli K-12 [7], the widely used laboratory workhorse that was originally isolated as a commensal bacterium in the 1920s. The O157:H7 chromosome is 5.5 Mb in size, >1.4 Mb larger than the K-12 genome — which also contains ~0.5 Mb of DNA that is not found in the genome of the pathogen [4••,5•]. The O157:H7 specific DNA has the potential to encode >1500 distinct proteins, ~10% of which are assumed to have virulence-related functions [4••]. The O157:H7 strain-specific DNA is organized into ~180 discrete regions around the genome that are referred to as O-islands [5•]. Several of these O-islands include virulence determinants that were well studied previously and that were known to be subject to lateral gene transfer. These include the Shiga toxin (Stx) genes, which are often associated with functional bacteriophages. The Stx-phages are a diverse family of bacteriophages that can transmit toxin genes from one E. coli strain to another [8]. A second O-island is a complex locus, called the LEE (locus of enterocyte effacement) pathogenicity island [9], that specifies ~40 gene products required for the close attachment of bacterial cells to the intestinal epithelium. It has been hypothesized that the acquisition of the LEE island and the Stx genes were two of the crucial steps in the evolution of E. coli O157:H7 from a commensal ancestor [10].

Phage-mediated recombination and genomic evolution Perhaps the most dramatic example of the contribution of phage-mediated gene transfer is seen in the emergence of highly virulent clones of group A streptococci. These bacteria infect humans only, and are a common cause of pharyngitis (‘strep throat’) and serious invasive diseases (e.g. necrotizing fasciitis or ‘flesh-eating’ disease). A central component of streptococcal virulence is the production of an antigenically variable surface protein, called the M protein, which protects the organism from nonspecific opsonization and phagocytosis. Of the 130 or so M protein serotypes, M3-type strains are most often associated with the most severe infections and with high mortality. This fact motivated Beres et al. [11••] to determine the entire genome sequence of a Streptococcus pyogenes M3 strain. In comparison to the genomes of M1 and M18 strains [12••,13], M3 contained six chimeric phage elements, two of which were integrated into sites identified in the two other genomes. The phage elements encode several superantigen-like extracellular proteins, which can elicit a massive nonspecific immunological response in the host and can contribute to pathogenesis. These proteins include streptococcal pyrogenic exotoxin A (SpeA) and streptococcal superantigen (SSA), and the newly described secreted proteins, SpeK (phage-associated streptococcal pyrogenic exotoxin K) and streptococcal phospholipase A (Sla). The genomic comparisons support the hypothesis that the contemporary M3 clone evolved by a mutation in the speA gene and by the stepwise accumulation of phageborne toxin genes: first, the acquisition of an SSA-encoding

phage, and second, the acquisition of a phage encoding SpeK and Sla. A similar association of virulence elements and phages is seen in the genome of a streptococcal M1 strain in which at least six potential SSAs and other factors are encoded in four different prophages [12••]. Because phages are ubiquitous in group A streptococci [14], it is clear that the horizontal transfer of toxin genes and other virulence determinants by bacteriophages is one of the principal mechanisms generating variation in virulence among clones in natural streptococcal populations.

Phylogenetic accumulation of virulence genes Pathogenic clones of bacteria have often evolved from nonpathogenic ancestors by acquiring new virulence genes via horizontal gene transfer mediated by plasmids and bacteriophages [10]. In the case of Salmonella — the causative agent of food-borne gastroenteritis and typhoid fever — the presence of virulence genes on numerous pathogenicity islands has conferred the ability of these organisms to invade and survive in eukaryotic cells, and to cause disease. The Salmonella genus is divided into two species, S. bongori and S. enterica. S. enterica is further subdivided into seven subspecies, of which subspecies I, which includes the serovars Typhi and Typhimurium, is responsible for 99% of human infections. Hansen-Wester and Hensel [15] used the fact that known Salmonella pathogenicity islands are associated with tRNA genes, which are generally highly conserved among bacterial species, to detect novel elements acquired by horizontal gene transfers. They interrogated the tRNA genes and associated regions in the genomes of two S. enterica pathogens, serovar Typhi and serovar Typhimurium, and used for comparison two related E. coli genomes: the non-pathogen K-12 strain [7] and the pathogenic O157:H7 strain [5•]. The survey identified four new tRNA gene-associated elements that were specific to the salmonellae [15]. Within Salmonella, the distribution of the tRNA-associated elements is variable. The tRNAProL-associated element has a heterogenous distribution, whereas the tRNAArgUassociated element is present in all Salmonella apart from S. bongori and the tRNAArgW-associated element is present in all Salmonella apart from S. enterica IV. The authors speculate that the differences in the distribution of tRNAassociated elements may contribute to the unique pathogenic potentials of each subspecies, and specifically that they play a role in the restriction of host range or in the type of disease. In a second study to identify Salmonella-specific gene acquisitions, Porwollik et al. [16••] assessed the presence of homologues of Typhimurium genes among a phylogenetic series of salmonellae. The genomes of 22 strains representing the seven subspecies of S. enterica and S. bongori were hybridized to a genomic microarray of PCR-amplified open reading frames (ORFs) that comprised 97% of the assigned coding regions of the completed Typhimurium LT2 genome [17•]. The microarray hybridizations identified 56 genes that were specific to Salmonella with 34 having no

Inferences from whole-genome sequences of bacterial pathogens Whittam and Bumbaugh

721

Figure 1 (a) 600

Total Number of acquired genes

(a) Accumulated number of acquired genes during the radiation of Salmonella bongori and S. enterica subspecies. The numbers were estimated from phylogenetic inference of the presence or absence of genes detected on a DNA microarray of 97% of the 4596 annotated ORFs in the Typhimurium genome [16••]. (b) Linearized neighbor-joining tree based on dS for mdh, gapA, and icd. Diphasic Salmonella strains, which have a phase variable system for expression of one of two flagella, are indicated by an asterisk.

500

400

300

200

100 0.00

0.05

0.10

0.15

0.20

Synonymous divergence per site

Typhimurium*

(b)

Salmonella enterica sv. Typhi Salmonella enterica VI* Salmonella enterica II* Salmonella enterica VII Salmonella enterica IV Salmonella enterica IIIb* Salmonella enterica sv. Arizonae (IIIa) Salmonella bongori 0.00

0.05

0.10

0.15

0.20 Current Opinion in Genetics & Development

assigned function [16••]. A phylogeny inferred from the microarray data was largely in agreement with previous studies derived from sequence information from several housekeeping and invasion loci. On the basis of this phylogeny, Porwollik et al. inferred the number and timing of gene acquisition events in the lineage that was destined to evolve into S. enterica Typhimurium. They posit that 513 genes were gained as Salmonella split from its closest relative, and as the genus diversified, 111 genes were acquired at the S. bongori / S. enterica split [16••]. An additional 105 genes were acquired as the diphasic phenotype — a phase variable system for the expression of either of two antigenically distinct flagella — arose in the most recent common ancestor of subspecies I, II, IIIb, and VI. Presumably it was the ability to shift flagellar antigens that gave a key fitness advantage

for adaptation to warm-blooded vertebrates. The presence of the genes encoding the diphasic switch was variable and in Typhi there is a reversion to the monophasic phenotype. Finally, the last steps in the divergence of S. enterica subspecies 1 included an addition of 216 genes followed by the acquisition of 144 more genes that are unique to Typhimurium. Some gene loss was also observed, mostly of genes that were of plasmid or phage origin. To put the gene assimilation on an evolutionary time scale, we constructed a phylogeny (Figure 1) based on the synonymous rate of substitution (dS) calculated for three Salmonella genes, mdh [18], gapA [19], and icd [20]. These loci have been shown to be phylogenetically congruent [21••] and to retain a common evolutionary history that is

722

Genomes and evolution

Figure 2

Orientalis

Diagram of a minimal network that connects the genotypic classification of Yersinia pestis strains, the causative agent of bubonic plague. The paired designations refer to the DFR profiles for six genomic regions detected by subtractive hybridization [25••] and the IS-100 fingerprint designation from PCRbased genomic mapping [24••]. Genotypes are separated into three biovars: orientalis (glycerol negative, nitrate positive), antiqua (glycerol positive, nitrate positive), and medievalis (glycerol positive, nitrate negative). Red arrows designate deletions of DFR and the green arrow indicates the gain of the DFR5 region in the pathway to orientalis.

A, O2a P1178 VN

B, O1 Stavropol

∆R2

A, O1 CO92

+R5

∆R3

C, O1 D14/Salazar

glpD+ A, O1

Antiqua

Nicholisk 51

J, A4 Angola

∆R3

M, P2 Pestoides E

∆R1

K, P2 Pestoides F

∆R4

E, A3 A16

∆R3

D, A3 Antiqua

∆R1

F, A2

F, A1b

Nairobi

D94 Kuma

F, A1a D15 Yokohama

Medievalis

D, P1

F, M2

Pestoides A

Harbin 35

∆R3

G, M2 Nicholisk 41

∆R4 H, M1b 366

∆R3

I, M1a Kim

Current Opinion in Genetics & Development

indicative of a whole-genome phylogeny for Salmonella. The accumulated estimates of the number of gene acquisitions at the major branch points described above (Figure 1a) can be placed on a time scale from the dendrogram (Figure 1b). The comparison shows that the rate of gene acquisition within Salmonella has not been linear but instead has accelerated in recent times (Figure 1). Whether this reflects a true increase in the rate of gene acquisition or is an artefact of an underestimation of past gene flux is problematic. In either case, further investigation of the relationship between the rates of divergence by point mutations and in gene content is warranted.

Microevolution and the parallel loss of genomic regions Host shifts or changes in transmission pathways can impose strong evolutionary pressures on pathogen populations to adapt to the new niche. One example of such a niche

shift is seen in the evolution of the causative agent of bubonic plague, Yersinia pestis. This flea-transmitted pathogen of rodents and humans originated from the closely related pathogen Y. pseudotuberculosis, which is an enteric organism spread by the fecal–oral route. The genome sequence of a Y. pestis strain revealed that the chromosome is rich in insertion sequences, pseudogenes, and assimilated genes from other bacteria, suggesting that the pathogen has undergone an extensive genetic flux [22•]. Historically, strains of Y. pestis have been divided into three biovars based on their ability to ferment glycerol and to reduce nitrate — antiqua, medievalis and orientalis — and each biovar has been linked to a historical pandemic of plague. Isolates of the biovar antiqua (glycerol positive and nitrate positive) are believed to be relics of the plague pandemic of the 6th century whereas strains of the medievalis (glycerol positive and nitrate negative) are thought to have caused the second pandemic of Europe (the ‘Black Death’)

Inferences from whole-genome sequences of bacterial pathogens Whittam and Bumbaugh

that began in the 14th century. Strains of the biovar orientalis (glycerol negative and nitrate positive) are responsible for the present plague pandemic that began in the late 19th century. Restriction fragment length polymorphisms of the locations of the IS100 insertion sequence support the hypothesis that the three biovars form distinct branches of a phylogenetic tree [23]. In addition, sequence analysis of 36 strains representing the global diversity of Y. pestis revealed no nucleotide polymorphisms in any of five conserved housekeeping genes [23]. The Y. pestis sequences were virtually identical to those of Y. pseudotuberculosis, suggesting that Y. pestis originated as a clone of Y. pseudotuberculosis shortly before the first known pandemics of human plague, several thousand years ago. The results of two recent surveys of genomic variability among plague isolates have enabled us to begin to elucidate details of the radiation and microevolution of the plague bacillus. With information from the complete genome sequence of Y. pestis strain CO92 [22•], Motin et al. [24••] designed 27 locus-specific primer pairs to amplify fragments between the end of an IS100 element and its neighboring gene. The method resolved 13 IS100-based genotypic fingerprints among ~90 isolates of the three biovars. Their findings indicate that glycerol-negative (orientalis) isolates from a variety of geographic sources are relatively homogeneous in their IS100 fingerprints, whereas strains of the glycerol-positive antiqua biovar are more diverse. In addition, glycerol-positive medievalis strains grouped with the antiqua isolates from Southeast Asia, suggesting a close phylogenetic relationship between strains of these biovars. Radnedge et al. used suppressive subtractive hybridization (SSH) to detect genomic differences between strains of the three Y. pestis biovars and to uncover strain-specific regions of genomic DNA [25••]. Briefly, SSH identifies DNA sequences that are specific to one genome (designated the ‘tester’) and absent in the other genome (designated the ‘driver’). In this method, genomic DNA from the tester and driver are cut with a frequent cutting restriction enzyme, oligonucleotide adapters are ligated to the digested tester DNA, and the DNAs are hybridized [26]. SSH then uses PCR to amplify and enrich for unique segments of tester DNA and to simultaneously limit non-target amplification using suppression PCR. The population of enriched strain-specific fragments can be cloned and characterized. Radnedge et al. mapped fragments produced by SSH to six regions of the Y. pestis CO92 complete genome sequence. These difference regions (DFRs) range from 4.6–19 kb in size and their sequences show similarity to bacterial genes encoding proteins for flagellar synthesis, ABC transport, and bacteriophage functions. They then surveyed for the presence and absence of these DFRs among 78 geographically diverse strains of Y. pestis and used this information to examine the evolutionary relationships among the three biovars.

723

In Figure 2, we have combined and summarized the findings in a diagram of a minimal network that connects the Y. pestis strains common to the two genomic studies. The paired designations refer to the DFR profile [25••] and the IS100 fingerprint designation [24••]. For example, the orientalis strain CO92, for which the complete genome sequence is known, is designated as (A, O1). The shortest path involving either a gain or loss of a DFR, or changes in the IS100 profile, connects the combined genotypes. The results reveal that certain regions have been lost repeatedly in the radiation of the Y. pestis subtypes. In particular, the DFR3 region may have been lost as many as five times in the radiation of the genotypes in each biovar. In three of these cases, the IS100 subtype remained the same when the DFR profile changed, indicating that loss of specific segments can occur rapidly relative to changes in the IS100 profiles. Similarly, DFR1 was lost in parallel from the (M, P2) and (D, A3) antiqua biotypes. Interestingly, there are two examples where the historical biotypes appear to have evolved repeatedly. In the first case, the antiqua biotype of an (SA, O1) strain Nicholisk 51 is a consequence of the reversion of the glycerol negative of an (A, O1) orientalis strain by lateral transfer and recombination. Motin et al. gathered evidence supporting this scenario by examining the genes involved in glycerol metabolism [24••]. They found that all glycerol negative strains have a defective glycerol-3-phosphophate dehydrogenase gene (glpD) with a 93 bp in-frame deletion. The glpD locus of the glycerol-positive Nicholisk 51, however, has an intact glpD gene in an orientalis background suggesting that recombination had restored the ability to ferment glycerol in this strain. In the second case, it appears that the medievalis biotype has evolved in parallel from two branches of the antiqua genotype: one from the (D, A3) genotype, giving rise to the (D, P1) type represented by Pestoides A, and a second one from the (F, A1a) genotype, giving rise to the (F, M2) type represented by Harbin 35. Subsequent loss of DFR3 and DFR4 produced additional variants of the medievalis in this second pathway. Further study of the nitrate-reduction pathway can be used to test this hypothetical scenario for the parallel evolution of the medievalis biovar.

Tracing genomic diversification of E. coli O157:H7 In addition to DNA microarrays and subtractive hybridization, there are a variety of methods for quantifying genomic diversity within a bacterial species. One method, called octamer-based genome scanning (OBGS), makes use of the fact that the E. coli K-12 genome contains >150 different octamers that are over-represented on the leading strand of each replichore. These octamers are located where the complementary sequence provides priming sites for discontinuous DNA replication [27]. Kim et al. devised fluorescently labeled PCR primers based upon specific octamers and used the size distribution of labeled amplicons (determined with an automated sequencer) to uncover genomic polymorphisms [27]. The data were scored as

724

Genomes and evolution

binary characters for the presence and absence of OBGS amplicons that were 200–1500 bp in length. To investigate the historical dissemination of E. coli O157:H7, Kim et al. [28••] compared the genomic profiles of human isolates representing two main E. coli O157:H7 lineages from North America to the profiles of strains originally collected from humans and cattle in Australia. They were able to score a total of 1159 characters, of which 258 (23%) were polymorphic and 163 (14%) were phylogenetically informative. The inferred phylogeny indicates that divergence of the two E. coli O157:H7 lineages was an ancestral event that preceded the geographic spread and expansion of the distinct subpopulations onto different continents. In addition, there is evidence that genomic diversity accumulated rapidly in separate geographic regions through random drift and bacteriophage-mediated events.

Conclusions It has long been recognized that horizontal transfer of genetic material has played a significant role in the evolution of bacterial pathogens, particularly under conditions of adapting to new hosts or responding to strong selection pressures (such as those imposed by antibiotic treatment). However, the extent to which bacterial chromosomes rapidly gain and lose genes is coming to light from studying human pathogens at the genomic level. Recent findings reveal that differences in gene content often involve hundreds of genes and proteins, even between closely related strains and serovars of the same species. These differences may reflect a large class of dispensable genes that are not required for survival or they may comprise co-adapted gene complexes specialized to exploit specific pathogenic niches. In either case, it is clear that gene acquisition and loss contributes substantially to the variation in virulence and to the overall genetic variation harbored in pathogen populations. Within-species genomic variability also has practical implications. For infectious disease surveillance, it forms the basis of PulseNet, the national molecular subtyping network for identifying outbreaks of food-borne disease [29]. PulseNet was established in 1996 by the Centers for Disease Control and Prevention and now comprises 46 state public health laboratories and the food safety laboratories of the US Food and Drug Administration and the US Department of Agriculture [29]. This network was built upon standardized DNA fingerprinting of E. coli O157:H7 and works because of the genetic variability among strains detected by pulsed-field gel electrophoresis (PFGE) of XbaI digestion of whole genomic DNA. Indeed, Kudva et al. [30•] showed recently that the strain-to-strain differences in O157:H7 fingerprints result from insertions and deletions of DNA with polymorphic XbaI sites, not single-nucleotide polymorphisms in the XbaI sites themselves [30•]. These findings underscore the dynamic nature of genomic flux that underlies pathogen evolution.

References and recommended reading Papers of particular interest, published within the annual period of review, have been highlighted as:

• of special interest •• of outstanding interest 1.

Censini S, Lange C, Xiang Z, Crabtree JE, Ghiara P, Borodovsky M, Rappuoli R, Covacci A: cag, a pathogenicity island of Helicobacter pylori, encodes type I- specific and disease-associated virulence factors. Proc Natl Acad Sci USA 1996, 93:14648-14653.

2.

Alm RA, Ling LS, Moir DT, King BL, Brown ED, Doig PC, Smith DR, Noonan B, Guild BC, deJonge BL et al.: Genomic-sequence comparison of two unrelated isolates of the human gastric pathogen Helicobacter pylori. Nature 1999, 397:176-180.

3.

Salama N, Guillemin K, McDaniel TK, Sherlock G, Tompkins L, Falkow S: A whole-genome microarray reveals genetic diversity among Helicobacter pylori strains. Proc Natl Acad Sci USA 2000, 97:14668-14673.

4. ••

Hayashi T, Makino K, Ohnishi M, Kurokawa K, Ishii K, Yokoyama K, Han CG, Ohtsubo E, Nakayama K, Murata T et al.: Complete genome sequence of enterohemorrhagic Escherichia coli O157:H7 and genomic comparison with a laboratory strain K-12. DNA Res 2001, 8:11-22. In 1996, a large outbreak of hemorrhagic colitis caused by E. coli O157:H7 infection occurred in Sakai City, Japan. This study reports the entire genome sequence of an O157:H7 isolate from the Sakai outbreak. For the first time, the positions of the Stx phages encoding the Stx1 and Stx2 toxins were located in the complete O157:H7 genome. In total, 24 prophage and prophage-like elements were identified, accounting for more than half of the O157:H7-specific genomic sequence. 5. •

Perna NT, Plunkett G III, Burland V, Mau B, Glasner JD, Rose DJ, Mayhew GF, Evans PS, Gregor J, Kirkpatrick HA et al.: Genome sequence of enterohaemorrhagic Escherichia coli O157:H7. Nature 2001, 409:529-533. By examining the genome sequence of E. coli O157:H7 as compared to the genome sequence of K-12 (see [7]), the authors determine that lateral gene transfer is far more extensive than was previously anticipated. They identify 1387 strain-specific clusters of genes encoding candidate virulence factors, alternative metabolic capacities, and several prophages. 6.

Riley LW, Remis RS, Helgerson SD, McGee HB, Wells JG, Davis BR, Hebert RJ, Olcott ES, Johnson LM, Hargrett NT et al.: Hemorrhagic colitis associated with a rare Escherichia coli serotype. N Engl J Med 1983, 308:681-685.

7.

Blattner FR, Plunkett G III, Bloch CA, Perna NT, Burland V, Riley M, Collado-Vides J, Glassner JD, Rode CK, Mayhew GF et al.: The complete genome sequence of Escherichia coli K-12. Science 1997, 277:1453-1462.

8.

O’Brien AD, Tesh VL, Donohue-Rolfe A, Jackson MP, Olsnes S, Sandvig K, Lindberg AA, Keusch GT: Shiga toxin: biochemistry, genetics, mode of action, and role in pathogenesis. Curr Top Microbiol Immunol 1992, 180:65-94.

9.

McDaniel TK, Kaper JB: A cloned pathogenicity island from enteropathogenic Escherichia coli confers the attaching and effacing phenotype on E. coli K-12. Mol Microbiol 1997, 23:399-407.

10. Reid SD, Herbelin CJ, Bumbaugh AC, Selander RK, Whittam TS: Parallel evolution of virulence in pathogenic Escherichia coli. Nature 2000, 406:64-67. 11. Beres SB, Sylva GL, Barbian KD, Lei B, Hoff JS, Mammarella ND, •• Liu MY, Smoot JC, Porcella SF, Parkins LD et al.: Genome sequence of a serotype M3 strain of group A Streptococcus: phageencoded toxins, the high-virulence phenotype, and clone emergence. Proc Natl Acad Sci USA 2002, 99:10078-10083. The genome sequence was determined for a group A Streptococcus strain with serotype M3 (MGA315), isolated from a patient with toxic shock syndrome. The 1.9 Mb genome shares ~1.7 Mb of related sequence to the M1 (see [12••]) and M18 (see [13]) genomes with phage-like elements accounting for much of the variation. The authors devise a model for the stepwise acquisition of the specific virulence factors including SpeA, SpeK, SSA, and Sla contributing to the emergence of this highly virulent clone. 12. Ferretti JJ, McShan WM, Ajdic D, Savic DJ, Savic G, Lyon K, •• Primeaux C, Sezate S, Suvorov AN, Kenton S et al.: Complete genome sequence of an M1 strain of Streptococcus pyogenes. Proc Natl Acad Sci USA 2001, 98:4658-4663. The genome of an M1 group A Streptococcus was sequenced using a whole-genome shotgun approach. The 1.8 Mb genome contained 1752

Inferences from whole-genome sequences of bacterial pathogens Whittam and Bumbaugh

predicted ORFs with 83% having an assignable function or homolog. The sequence analysis identifies 46 virulence factors and complete or remnant sequences of four bacteriophage genomes encoding at least six new superantigen like proteins in addition to the eight that have been previously identified. These super-antigen like proteins have homologs in other Grampositive bacteria and posit a dispersal via horizontal gene transfer. 13. Smoot JC, Barbian KD, Van Gompel JJ, Smoot LM, Chaussee MS, Sylva GL, Sturdevant DE, Ricklefs SM, Porcella SF, Parkins LD et al.: Genome sequence and comparative microarray analysis of serotype M18 group A Streptococcus strains associated with acute rheumatic fever outbreaks. Proc Natl Acad Sci USA 2002, 99:4668-4673. 14. Hynes WL, Hancock L, Ferretti JJ: Analysis of a second bacteriophage hyaluronidase gene from Streptococcus pyogenes: evidence for a third hyaluronidase involved in extracellular enzymatic activity. Infect Immun 1995, 63:3015-3020. 15. Hansen-Wester I, Hensel M: Genome-based identification of chromosomal regions specific for Salmonella spp. Infect Immun 2002, 70:2351-2360. 16. Porwollik S, Wong RM, McClelland M: Evolutionary genomics of •• Salmonella: gene acquisitions revealed by microarray analysis. Proc Natl Acad Sci USA 2002, 99:8956-8961. PCR-amplified ORFs from the published Typhimurium LT2 genome (see [17•]) were used for microarray construction. Hybridization using 22 Salmonella isolates from S. enterica and S. bongori revealed 56 genes of LT2 were found in all salmonellae, and surprisingly 34 of these have no known function. Data from the array analysis predict high levels of gene acquisition and loss among the lineages, and comparison to a subspecies phylogeny indicated the order and timing of the major acquisition events. 17. •

McClelland M, Sanderson KE, Spieth J, Clifton SW, Latreille P, Courtney L, Porwollik S, Ali J, Dante M, Du F et al.: Complete genome sequence of Salmonella enterica serovar Typhimurium LT2. Nature 2001, 413:852-856. The 4.8 Mb chromosome and 94 kb virulence plasmid (pSLT) were sequenced in Salmonella typhimurium LT2. Comparisons of sequence homologs were made to eight strains of various Enterobacteriaceae by sample sequencing, array hybridization or examination of published genome sequences. The data identified the absence of 11% of the LT2 genes from serovar Typhi and 29% from E. coli K-12 indicating frequent lateral gene transfer events among these organisms. 18. Boyd EF, Nelson K, Wang F-S, Whittam TS, Selander RK: Molecular genetic basis of allelic polymorphism in malate dehydrogenase (mdh) in natural populations of Escherichia coli and Salmonella enterica. Proc Natl Acad Sci USA 1994, 91:1280-1284. 19. Nelson K, Whittam TS, Selander RK: Nucleotide polymorphism and evolution in the glyceraldehyde-3-phosphate dehydrogenase gene (gapA) in natural populations of Salmonella and Escherichia coli. Proc Natl Acad Sci USA 1991, 88:6667-6671. 20. Wang FS, Whittam TS, Selander RK: Evolutionary genetics of the isocitrate dehydrogenase gene (icd) in Escherichia coli and Salmonella enterica. J Bacteriol 1997, 179:6551-6559. 21. Brown EW, Kotewicz ML, Cebula TA: Detection of recombination •• among Salmonella enterica strains using the incongruence length difference test. Mol Phylogenet Evol 2002, 24:102-120. Brown et al. use six housekeeping loci including mdh, gapA, and icd [18–20] to examine incongruence among these genes. Their findings show that the nucleotide sequence data for mdh, gapA, and icd provide nearly identical phylogenies in Salmonella and thus retain a common evolutionary history. 22. Parkhill J, Wren BW, Thomson NR, Titball RW, Holden MT, • Prentice MB, Sebaihia M, James KD, Churcher C, Mungall KL et al.: Genome sequence of Yersinia pestis, the causative agent of plague. Nature 2001, 413:523-527. This paper reports a wealth of new information about one of the most destructive infectious agents, the cause of bubonic plague. This bacterium’s genome contains genes that specify an arsenal of virulence factors and

725

surface molecules, many of which appear to have been acquired from other bacterial species. The genome sequence is also replete with insertion sequence elements and pseudogenes, whose presence may reflect the reshaping of the genome as this organism has adapted to insect transmission. 23. Achtman M, Zurth K, Morelli G, Torrea G, Guiyoule A, Carniel E: Yersinia pestis, the cause of plague, is a recently emerged clone of Yersinia pseudotuberculosis. Proc Natl Acad Sci USA 1999, 96:14043-14048. 24. Motin VL, Georgescu AM, Elliott JM, Hu P, Worsham PL, Ott LL, •• Slezak TR, Sokhansanj BA, Regala WM, Brubaker RR et al.: Genetic variability of Yersinia pestis isolates as predicted by PCR- based IS100 genotyping and analysis of structural genes encoding glycerol-3-phosphate dehydrogenase (glpD). J Bacteriol 2002, 184:1019-1027. The authors use a PCR-based genotyping method that examines the different locations of IS100 elements in the genomes of 116 Yersinia isolates. A major finding of this study is the identification of an antiqua strain (Nicholisk 51) that had an identical IS100 fingerprint to an orientalis isolate. The Nicholisk 51 isolate is suggested to have repaired the GlpD mutation characteristic of the other orientalis isolates. 25. Radnedge L, Agron PG, Worsham PL, Andersen GL: Genome •• plasticity in Yersinia pestis. Microbiology 2002, 148:1687-1698. These authors use subtractive hybridization to identify segments of the plague bacillus genome that differ between strains. Their analysis uncovered multiple so-called difference regions (DFRs) that range in size from 4.6–19 kb. By comparing either the presence or absence of the various DFRs, the authors infer the origin and evolutionary relationships among the historically important biovars (antiqua, medievalis, and orientalis) of the causative agent of bubonic plague. 26. Diatchenko L, Lau YF, Campbell AP, Chenchik A, Moqadam F, Huang B, Lukyanov S, Lukyanov K, Gurskaya N, Sverdlov ED et al.: Suppression subtractive hybridization: a method for generating differentially regulated or tissue-specific cDNA probes and libraries. Proc Natl Acad Sci USA 1996, 93:6025-6030. 27.

Kim J, Nietfeldt J, Benson AK: Octamer-based genome scanning distinguishes a unique subpopulation of Escherichia coli O157:H7 strains in cattle. Proc Natl Acad Sci USA 1999, 96:13288-13293.

28. Kim J, Nietfeldt J, Ju J, Wise J, Fegan N, Desmarchelier P, Benson AK: •• Ancestral divergence, genome diversification, and phylogeographic variation in subpopulations of sorbitol-negative, beta-glucuronidase-negative enterohemorrhagic Escherichia coli O157. J Bacteriol 2001, 183:6885-6897. In this paper, the authors apply a highly sensitive, PCR-based system to address the hypothesis about the emergence and geographic spread of E. coli O157:H7, a food and water-borne pathogen that is a growing public health problem. This system, called octamer-based genome scanning has the ability to uncover numerous polymorphisms and in this application revealed >150 informative markers. The results extend a stepwise model for the evolution of E. coli O157:H7 that indicates that the divergence of two genetically distinct lineages preceded intercontinental spread. 29. Swaminathan B, Barrett TJ, Hunter SB, Tauxe RV: PulseNet: the molecular subtyping network for foodborne bacterial disease surveillance, United States. Emerg Infect Dis 2001, 7:382-389. 30. Kudva IT, Evans PS, Perna NT, Barrett TJ, Ausubel FM, Blattner FR, • Calderwood SB: Strains of Escherichia coli O157:H7 differ primarily by insertions or deletions, not single-nucleotide polymorphisms. J Bacteriol 2002, 184:1873-1879. PFGE fingerprinting of E. coli O157:H7 has uncovered extensive variation in the genome-wide XbaI restriction patterns between strains from different outbreaks. This study examines nucleotide changes in the regions adjacent to the restriction sites and demonstrates that differences in the profiles are attributable to insertions or deletions of DNA that contains XbaI sites as opposed to single-nucleotide polymorphisms in the XbaI sites themselves. Of the 40 identified regions, 22 are unique to the O157:H7 chromosome, designated as O islands (see [5•]), and it is the variability in the presence and absence of these O islands that accounts for most differences in DNA fingerprints among O157:H7 isolates.