Predictive modelling of fluorescent AFLP: a new approach to the molecular epidemiology of E. coli

Predictive modelling of fluorescent AFLP: a new approach to the molecular epidemiology of E. coli

Res. Microbiol. 150 (1999) 33−44 © Elsevier, Paris Predictive modelling of fluorescent AFLP: a new approach to the molecular epidemiology of E. coli ...

278KB Sizes 1 Downloads 37 Views

Res. Microbiol. 150 (1999) 33−44 © Elsevier, Paris

Predictive modelling of fluorescent AFLP: a new approach to the molecular epidemiology of E. coli Catherine Arnold*, Lou Metherell, Jonathan P. Clewley, John Stanley Molecular Biology Unit, Central Public Health Laboratory, 61 Colindale Avenue, London NW9 5HT, UK (Submitted 7 July 1998; accepted 20 October 1998)

Abstract — Amplified fragment length polymorphism (AFLP) permits simultaneous sampling of multiple loci distributed throughout a genome, using restriction site/adaptor-specific primers under stringent conditions. Fluorescent detection instrumentation further refines this methodology, permitting internal size standards and accurate, reproducible sizing of amplified fragments. We have evaluated the potential of fluorescent AFLP (FAFLP) as a potentially definitive genotyping method for bacteria, by comparing MseI/EcoRI fragments derived experimentally from the Escherichia coli K12 MG1655 genome with those predicted by analysis of its published sequence. In silico, MseI/EcoRI digestion of this sequence produced 1200 fragments from 36 and 2151 base pairs (bp) in size. Fragment subsets which would be amplified by seven different selective (1–2 bases added to the 3' end of the core primer sequence) primer combinations were modelled. Depending on the primer pair, three to 54 fragments (range 70–400 bp) were predicted, while all seven primer pair combinations together generated 121 predicted fragments. When genomic DNA of strain MG1655 was subjected to experimental FAFLP with these seven primers, 111 correctly sized fragments were observed (± 1 bp) out of the 121 predicted (92% accuracy). Twenty-five unpredicted fragments were obtained; an average of four per primer pair. The size and number of fragments in FAFLP, and their gel distribution, were dictated by the choice of restriction endonucleases and the degree of primer selectivity. Our data show that FAFLP is accurate, discriminatory, reproducible and capable of standardisation. Under agreed conditions, this method shows considerable promise as a generally applicable standardised bacterial genotyping method. The fragments predicted in silico to result from amplification of MseI/EcoRI-digested DNA with the seven primer pairs described are here used to define a prototypic FAFLP analysis of E. coli. © Elsevier, Paris AFLP / genotyping / Escherichia coli / fluorescence / primer

1. Introduction Strain identification in molecular clinical microbiology requires a precise and reproducible way of comparing the genomes of bacterial isolates. This is important for accurate epidemiological investigation of outbreaks of infec* Correspondence and reprints Tel.: 44 181 200 4400; fax: 44 181 200 1569; [email protected] Abbreviations: AFLP, amplified fragment length polymorphism; AP-PCR, arbitrary primed PCR; bp, base pair; FAFP, fluorescent AFLP; PCR, polymerase chain reaction; PFGE, pulse field gel electrophoresis; RAPD, random amplified polymorphic DNA; RFLP, restriction fragment length polymorphism.

tion. Many methods have been employed for bacterial typing. They include ribotyping [7], pulse field gel electrophoresis (PFGE) [1] and more rapid PCR-based methods such as arbitrary primed PCR (AP-PCR or RAPD) [14, 19] and PCR-restriction fragment length polymorphism (RFLP). Although ribotyping and PFGE provide a valid basis for typing several pathogenic bacterial species [6, 17], these methods are time-consuming and may inherently lack discriminatory power. Arbitrary primed PCR is insufficiently robust or reproducible for interlaboratory comparison [13], while PCR-RFLP is limited to small regions of polymorphism such as single genes. Amplified fragment length

34

Arnold et al.

polymorphism (AFLP) [18], by contrast, a technique based on the selective amplification of restriction fragments from a digest of genomic DNA, has the potential to produce reproducible, discriminatory profiles of bacterial genomes. Radioactively labelled AFLP has already been used to investigate genotyping of several bacterial genera [5, 8–12]. For AFLP analysis, two restriction enzymes, usually a rare cutter and a more frequent cutter, are used to digest the DNA. The number of fragments generated by cleaving DNA at rare cut sites is increased by cleavage with a frequent cutter and creates DNA fragments of a size suitable for resolution on polyacrylamide (sequencing) gels. Double−stranded linkers, specific to one or the other restriction site, are ligated to the cohesive ends of the DNA fragments, generating templates for amplification. The sequences of the linkers and restriction sites serve as primer binding sites. The primer specific for the ’rare cutter’ enzyme site is labelled, so only fragments cut with that enzyme will be visible. The majority of fragments, those produced by digestion at both ends with the frequent cutter enzyme, will not be labelled. ’Selective’ amplification of a subset of the total restriction enzyme fragments can be achieved by the use of primers containing extra bases at their 3′ end, beyond the sequence complementary to the restriction site. Thus, only those fragments are amplified for which the additional primer bases complement the target sequence adjacent to the restriction sites, the complexity of the fragment pattern being decreased as the number of primer bases used is increased [18]. Assuming random base distribution, there is a four-fold reduction for each additional selective base, and adding one extra base to one of the primers results in a simplified pattern in which all fragments are a subset of the original non-selective pattern. Theoretically, AFLP has significant advantages compared with other bacterial typing methods. Firstly, AFLP analyses the whole genome, whereas ribotyping or RFLP analyse only genes or operons. Secondly, AFLP markers are more informative than RFLPs, providing 10–50

times more useful data points [2]. Practically, AFLP offers speed and high throughput. We set out to test the accuracy and reproducibility of fluorescent AFLP (FAFLP) and to evaluate its potential as an exact typing method by comparing the observed fragments with those predicted by computer analysis of the whole genome sequence of Escherichia coli K12. Margins of experimental error due to extraneous fragments, missing fragments or fragments not of the predicted size, and problems with PCR reproducibility, could be examined by this approach. The expected number and sizes of fragments generated by AFLP with a mixture of seven pairs of selective primers was calculated for E. coli K12 MG1655 [3] and corresponding experiments were done so that in silico predictions and experimental data could be directly compared.

2. Materials and methods 2.1. Computer methods

The complete genome sequence of E. coli MG1655 (accession nos. ECAE000111ECAE000510) was analysed with Lasergene (DNAStar, Madison, WI, USA) and MacVector (Oxford Molecular, UK). The data concerning the size and number of fragments predicted following an MseI/EcoRI digest of the genome were imported into a spreadsheet. The MseI/MseI and EcoRI/EcoRI fragment data were deleted from the spreadsheet. The fragment size data were then adjusted to allow for the addition of primers during PCR, and those fragments predicted to be amplified with each of the chosen selective primers were identified. 2.2. FAFLP

Genomic DNA was extracted from E. coli K12 MG1655 plate cultures using the Boom method [4] and 500 ng was digested in a total volume of 22 µL, consisting of 5 U of × MseI (New England Biolabs), 2 µL of 10 MseI buffer, 0.2 µL 10 × BSA, and 1.0 µL of DNase free RNase A (10 µg µL–1) for 1 h at 37 °C. To this digest was

Predictive modelling of fluorescent AFLP

35

Table I. Selective primer combinations used for E. coli MG1655 AFLP. Primer set MseI + TA /EcoRI + A MseI + TA/EcoRI + G MseI + TA/EcoRI + C MseI + TA/EcoRI + T MseI + CG/EcoRI + AG MseI + CA/EcoRI + AG MseI + TA/EcoRI + 0

No.of predicted bands

Total no. bp in predicted bands

Percentage of 4.7 Mbp genome represented

19 15 15 13 3 5 62

3241 2742 3281 2134 486 879 11398

0.07 0.06 0.07 0.05 0.01 0.02 0.25

added 5 U (1.0 µL) of EcoRI (Life Technologies), 1.68 µL 0.5 M × Tris-HCl (pH 7.6), 2.1 µL 0.5 M NaCl (total volume 26 µL), and the reactions were incubated for a further hour at 37 °C. Endonucleases were inactivated at 65 °C for 10 min prior to ligation. To the double-digested DNA was added 25 µL of a solution containing 40 U T4 DNAligase (New England Biolabs, NEB), 5 pMol EcoRI adaptor, 50 pMol MseI adaptor, 5 µL 10 × T4 ligase buffer. The reaction was incubated at 12 °C for 17 h, heated at 65 °C for 10 min to inactivate ligase and stored at –20 °C. The non-selective forward primer for the EcoRI adaptor site (EcoRI + 0) was labelled with the blue fluorescent dye, FAM (Genosys Biotechnologies). Other forward primers used were obtained from an AFLP kit (PE Applied Biosystems), as were the reverse primers for the MseI adaptor site, which contained two selective bases (see table I for the primer combinations used). PCR reactions were performed in 25 µL volumes containing 2.5 µL ligated DNA, 16.6 pMol labelled EcoRI primer, 100 pMol MseI primer, 2.5 µL 10 × Taq polymerase buffer, 10 mM of each of the four dNTPs, 1.0 µL 100 × BSA (NEB), 1.5 mM MgCl2 and 0.625 U Taq DNA polymerase. Touchdown PCR cycling conditions were used for amplifying the fragments: 2 min denaturation step at 94 °C (one cycle), followed by 30 cycles of denaturation at 94 °C for 20 s, 30 s annealing step (see below), and a 2-min extension step at 72 °C. The annealing temperature for the first cycle was 66 °C; for the next nine cycles, the temperature was decreased by one degree at each cycle. The annealing temperature for the remaining 20 cycles was

56 °C. This was followed by a final extension at 60 °C for 30 min. PCR was performed in a PE-9600 thermocycler (Perkin-Elmer Corp., Norwalk, CT, USA). Reactions were stored at –20 °C. For multiplexed reactions, primers labelled with either blue (’FAM’), green (’JOE’) or yellow (’NED’) PE Biosystems’ fluoresceinbased dyes were combined in the same tube at the same concentration as singly labelled primer reactions. 2.3. Gel analysis

Amplification products were separated on a 5% denaturing (sequencing) polyacrylamide gel on an ABI Prism 377 DNA automated sequencer (Perkin-Elmer Corp., Norwalk, CT, USA). The gel was prepared by using 5% acrylamide (Amresco and FMC LongRanger), 6.0 M urea in 1 × TBE (89 mM tris, 89 mM boric acid, 2 mM EDTA). To 50 mL of gel solution was added 250 µL of 10% ammonium persulphate and 35 µL of TEMED (Amresco). Spacers and sharks-tooth combs were 0.2 mm in thickness. Gels were poured using an Applied Biosystems 377 casting frame and gel pourer, and allowed to polymerise at room temperature for at least 2 h. The sample (1.5 µL) was added to 1.5 µL of loading dye which was a mixture containing 1.25 µL formamide, 0.25 µL blue dextran/ 50 mM EDTA loading solution and 0.5 µL of the internal lane standard, Genescan 2500, labelled with the red rhodamine-based fluorescent dye, ’ROX’ (PE Applied Biosystems). The sample mix was heated at 95 °C for 2 min, cooled on ice and immediately loaded onto the gel. Electrophoresis conditions were 2.5 kV, 51 °C, 7 h using 1 × TBE as buffer.

36

Arnold et al.

2.4. Data capture and analysis

The Genescan collection software automatically sized and quantified individual fragments using the internal lane standards. The results were viewed in the form of a gel image, an electropherogram, tabular data or a combination of all three. The genotyper software automatically interpreted the Genescan data after the analysis parameters were set to medium smoothing and the baseline fluorescence set to 150 units. The category used to select peaks was ’highest 25 peaks’ for three-base selective primer pairs and ’highest 70 peaks’ for two-base selective primer pairs. The software ’filter’ used to remove PCR and background noise was: ’remove labels from peaks preceded by higher (at least 5%), labelled peak within 0–2.5 bp, and remove labels from peaks followed by higher (at least 5%), labelled peak within 0–2.5 bp’. The results were transferred to spreadsheets for further analysis.

3. Results The complete genome sequence of E. coli K12 MG1655 was analysed in silico to predict the number and size of the fragments expected when AFLP was carried out on it with any one of seven primer combinations shown in table I. This table also shows the percentage of genome represented by each primer pair. The sizes of predicted fragments and their location in the genome are given in table II. A diagram showing the location in the E. coli K12 genome of fragments amplified with one of these primer combinations is shown in figure 1. The fragments depicted in the figure were amplifed with a primer for the MseI adaptor site which contained two further selective nucleotides (MseI + TA) and a non-selective primer for the EcoRI adaptor (EcoRI + 0), designated MseI + TA/EcoRI + 0. To determine the accuracy and reproducibility of AFLP, reactions using the seven selective primer pair combinations were performed independently five times on the same DNA extract.

The number of times each fragment was observed and their size, together with the number of times unpredicted fragments were seen, is also shown in table II. The number of fragments observed compared to those predicted is summarised for all primer combinations in table III. For the MseI + TA/EcoRI + A primer combination (predicted fragments shown as a subset within table II), 18 out of the 19 predicted fragments were observed, and these 18 fragments were within one bp of the predicted size. Sixteen of the 18 predicted fragments were observed in all five experiments and two were seen in three of the five experiments. A 207-bp predicted fragment was not observed in any of the five experiments. Four non-predicted fragments were occasionally seen. For the MseI + TA/EcoRI + G primer combination (predicted fragments shown as a subset within table II), 14 out of 15 predicted fragments were observed. Thirteen fragments were detected within one bp, and two fragments within four bp. Fourteen of the 15 predicted fragments were observed in all five experiments; the other was seen in four of five experiments. Eight non-predicted fragments were occasionally seen. For the MseI + TA/EcoR I + C primer combination (predicted fragments shown as a subset within table II) all 15 predicted fragments were observed; 14 within one bp and one within four bp of the predicted size. Twelve of the 15 predicted fragments were observed in all five experiments, the others were seen in two of the five experiments (one fragment) and in four of the five experiments (two fragments). Seven non-predicted fragments were seen once or twice. For the MseI + TA/EcoRI + T primer combination (predicted fragments shown as a subset within table II), all 13 predicted fragments were observed; 12 within one bp and one within five bp of the predicted size. Twelve of the 13 predicted fragments were observed in all five experiments, while the other one was seen in three of the five experiments. Six non-predicted fragments were seen once or twice.

Predictive modelling of fluorescent AFLP

37

Table II. Predicted fragment sizes (in bold) following theoretical AFLP of E. coli MG1655 with an MseI + TA/EcoRI + 0 primer pair, with the percentage (in brackets) out of five FAFLP experiments that the fragment was observed experimentally (±1 bp). Enzymes and fragment cut site position in genome (selective base required for amplification on 3'end of EcoRI primer in brackets) MseI 3510180 > 3511625 EcoRI EcoRI 4381625 > 4382435 MseI EcoRI 2131180 > 2131690 MseI MseI 1604798 > 1605306 EcoRI EcoRI 337128 > 337586 MseI EcoRI 3867813 > 3868212 MseI MseI 3452284 > 3452673 EcoRI EcoRI 4102368 > 4102741 MseI MseI 1648315 > 1648672 EcoRI EcoRI 339976 > 340329 MseI – MseI 1256069 > 1256391 EcoRI EcoRI 107921 > 108244 MseI MseI 231580 > 231893 EcoRI EcoRI 331598 > 331890 MseI MseI 2250987 > 2251274 EcoRI EcoRI 639143 > 639430 MseI MseI 2573523 > 2573787 EcoRI – EcoRI 3941180 > 3941444 MseI MseI 3831203 > 3831461 EcoRI EcoRI 393910 > 394159 MseI MseI 1983188 > 1983430 EcoRI – – EcoRI 2036658 > 2036879 MseI MseI 2758654 > 2758869 EcoRI – EcoRI 2868790 > 2869000 MseI – EcoRI 3647444 > 3647649 MseI MseI 1676244 > 1676442 EcoRI MseI 2972774 > 2972969 EcoRI MseI 4350813 > 4351006 EcoRI MseI 2317505 > 2317698 EcoRI – – EcoRI 582007 > 582193 MseI EcoRI 3529879 > 3530059 Mse I EcoRI 715349 > 715526 MseI EcoRI 1521607 > 1521781 MseI MseI 1909651 > 1909818 EcoRI EcoRI 4568398 > 4568562 Mse I MseI 2992848 > 2993010 EcoRI MseI 676135 > 676295 EcoRI MseI 1886038 > 1886197 EcoRI EcoRI 4010831 > 4010990 MseI MseI 1462569 > 1462726 EcoRI –

(A) (A) (T) (C) (C) (A) (T) (A) (T) (C) – (C) (G) (A) (A) (G) (G) (C) – (G) (C) (T) (C) – – (C) (C) – (A) – (A) (A) (G) (T) (A) – – (A) (A) (A) (G) (G) (T) (C) (G) (T) (A) (T) –

Predicted fragment size in bp (%, out of 5 FAFLP experiments, fragment observed experimentally in brackets)

Non–predicted fragment size in bp (%, out of 5 FAFLP experiments, fragment observed experimentally in brackets)

Locus

1474 n.a. 837 n.a. 537 n.a. 537 n.a. 485 n.a. 426 n.a. 418 n.a. 400 n.a. 386 (0)a 380 (100%) – 351 (100%) 350 (100%) 342 (100%) 319 (100%) 316 (100%) 314 (100%) 293 (100%) – 291 (20%) 287 (100%) 276 (100%) 271 (100%) – – 248 (100%) 244 (100%) – 237 (100%) – 232 (100%) 227 (100%) 224 (100%) 222 (100%) 222 (100%) – – 213 (100%) 207 (40%) 204 (100%) 201 (100%) 196 (100%) 191 (100%) 191 (100%) 189 (60%) 188 (100%) 186 (100%) 186 (100%) –

– – – – – – – – 390 (60%) – 370 (20%) – – – – – – – 292 (20%) – – – – 267 (20%) 250 (20%) – – 240 (20%) – 233 (20%) – – – – – 219 (40%) 217 (20%) – – – – – – – – – – – 183 (40%)

n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. relB b – b secA, prRD, azi, pea JS0718c S15178c nfo ahpF S40535c – S30660c d yaiH uspA – – lpxB, pgsB yfjL – cysN – IEEC5D, IEEC5B ebrE1 lysA yjdL IS2 – – ORF194, hypothetical envZ, ompB, perA, tpo pgm b b yjiT d b caiT b b –

38

Arnold et al.

Enzymes and fragment cut site position in genome (selective base required for amplification on 3'end of EcoRI primer in brackets) – MseI 427372 > 427517 EcoRI EcoRI 274271 > 274414 MseI MseI 610761 > 610900 EcoRI EcoRI 1029011 > 1029151 MseI – MseI 393795 > 393909 EcoRI MseI 1648840 > 1648953 EcoRI MseI 152734 > 152846 EcoRI EcoRI 237338 > 237450 MseI – EcoRI 1845917 > 1846022 MseI EcoRI 1002961 > 1003065 MseI – EcoRI 228920 > 229009 MseI MseI 4398430 > 4398516 EcoRI MseI 3096710 > 3096795 EcoRI MseI 2985145 > 2985229 EcoRI MseI 986171 > 986253 EcoRI EcoRI 2321483 > 2321565 MseI MseI 3439037 > 3439114 EcoRI EcoRI 2077808 > 2077885 MseI MseI 2387726 > 2387798 EcoRI EcoRI 3834258 > 3834330 MseI – MseI 3136238 > 3136296 EcoRI MseI 1561858 > 1561916 EcoRI MseI 749215 > 749269 EcoRI EcoRI 4262697 > 4262753 MseI EcoRI 3707268 > 3707323 MseI EcoRI 2986871 > 2986926 MseI EcoRI 2985230 > 2985285 MseI EcoRI 871960 > 872014 MseI EcoRI 2793575 > 2793626 MseI EcoRI 1242506 > 1242557 MseI – MseI 2782951 > 2782988 EcoRI EcoRI 2573788 > 2573823 MseI EcoRI 4580858 > 4580889 MseI EcoRI 2762095 > 2762117 MseI MseI 1817151 > 1817171 EcoRI EcoRI 2028806 > 2028827 MseI EcoRI 3561783 > 3561801 MseI MseI 1515134 > 1515147 EcoRI EcoRI 2656341 > 2656352 MseI EcoRI 2555758 > 2555768 MseI EcoRI 1289352 > 1289361 MseI

– (C) (A) (A) (C) – (T) (G) (G) (A) – (C) (C) – (G) (A) (C) (G) (T) (A) (T) (A) (G) (C) – (A) (T) (A) (C) (G) (T) (G) (A) (T) (T) – (A) (G) (A) (G) (T) (A) (C) (A) (T) (T) (C)

Predicted fragment size in bp (%, out of 5 FAFLP experiments, fragment observed experimentally in brackets)

Non–predicted fragment size in bp (%, out of 5 FAFLP experiments, fragment observed experimentally in brackets)

Locus

– 174 (100%) 170 (100%) 168 (100%) 167 (100%) – 143 (60%) 142 (100%) 141 (100%) 139 (80%) – 132 (100%) 131 (100%) – 116 (100%) 115 (100%) 114 (100%) 113 (100%) 111 (60%) 109 (100%) 106 (60%) 104 (80%) 101 (100%) 99 (60%) – 87 (100%) 87 (100%) 83 (100%) 83 (100%) 82 (100%) 82 (100%) 82 (100%) 81 (40%) 78 (100%) 78 (100%) – 66 n.a. 62 n.a. 58 n.a. 49 n.a. 49 n.a. 48 n.a. 45 n.a. 42 n.a. 38 n.a. 37 n.a. 36 n.a.

177 (40%) – – – – 152 (20%) – – – – 136 (40%) – – 119 (40%) – – – – – – – – – – 95 (20%) – – – – – – – – – – 71 (20%) – – – – – – – – – – –

– secD b fepA, fep, feuB b – yaiH d htrE d – gdhA yraK – yafB hflC, hflA yggL d ompF, tolF, cmlB, coa, cry atoS rplO aidA–1 d nlpA – hybC dppC d tyrB yhjY b d nikB d d – n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a. n.a.

n.a., not analysed. Observed fragment was 390 bp. b Denotes an uncharacterised locus in E. coli bearing sequence similarity to characterised loci in other bacteria. c Denotes hypothetical genes, uncharacterised as yet. d Denotes no matches found, indicating a non-coding region. a

Predictive modelling of fluorescent AFLP

39

Figure 1. Diagram showing the relative positions in the E. coli K12 genome of fragments amplified with the MseI + TA/EcoRI + 0 FAFLP primer pair. In total, they constitute 0.25% of the genome. θ denotes an uncharacterised locus in E. coli bearing sequence similarity to characterised loci inother bacteria. u denotes no matches found, indicating a non-coding region.

For the MseI + CG/EcoR I + AG primer combination (data not shown) all three predicted fragments were observed and were within one bp of the predicted size. All three predicted fragments were observed in all five experiments. Fourteen unpredicted fragments were seen on several occasions. For the MseI + CA/EcoRI + AG primer combination (data not shown) all five predicted fragments were observed and were within one bp of the predicted size. All five predicted

fragments were observed in all five experiments. Seventeen unpredicted fragments were seen on several occasions. For the MseI + TA/EcoRI + 0 primer combination (table II), all 62 predicted fragments were observed; 61 fragments were within one bp of their predicted size and one fragment was within 4 bp of the predicted size. Fifty-one of the 62 predicted fragments were observed in all five experiments; the others were seen in one of the five experiments (one fragment), in two of

40

Arnold et al.

Table III. Summary of predicted and observed bands for the seven primer sets. Number of predicted bands seen Primer set 1 (MseI+TA/EcoRI+A) 2 (MseI+TA/Eco I+G) 3 (MseI+TA/EcoRI+C) 4 (MseI+TA/EcoRI+T) 5 (MseI+CG/EcoRI+AG) 6 (MseI+CA/EcoRI+AG) 7 (MseI+TA/EcoRI+0)

Total no.of bands predicted

5/5

4/5

3/5

2/5

1/5

%5/5

19 15 15 13 3 5 62

16 13 12 12 3 5 51

– 1 2 – – – 2

2 – – 1 – – 6

– – 1 – – – 2

– – – – – – 1

84 87 80 92 100 100 82

the five experiments (two fragments), in three of the five experiments (six fragments) and in four of the five experiments (two fragments). Sixteen unpredicted fragments were seen once or twice (table II). When the primer sets MseI + CG/EcoRI + AG and MseI + CA/EcoRI + AG were used, as well as the predicted number of fragments, a large number of unpredicted fragments were observed. Therefore, data from the use of these primers were considered insufficiently informative for further analysis. Out of a total of 113 fragments expected for five replicates of the other five selective primer pair reactions, carried out independently, 107 fragments of the expected size were seen. By dividing the number of observed bands of the correct size by the number of predicted bands of the correct size, this allows the accuracy of this AFLP to be calculated as 95%. The mean number of unpredicted fragments per reaction was 2.8. The mean number of expected fragments per reaction was 22.6. The amplified fragments seen on the gel originate from partial coding and non-coding regions throughout the genome (figure 1) and a list of the genes or operons that the fragments come from is given in table II. A diagram showing the location in the E. coli genome of MseI + TA/EcoRI + 0 fragments is shown in figure 1. An example of the electropherograms generated in fluorescent AFLP (using primer set MseI + TA/EcoRI + A) is shown in figure 2.

4. Discussion The availability of complete genome sequences for certain bacterial species allows pre-

diction of the DNA fragments that would be generated by AFLP. It is also possible to calculate the distribution of these fragments across the genome and to establish what they encode. Our data show that the fragments observed following AFLP with MseI + TA /EcoRI + A, MseI + TA/EcoRI + G, MseI + TA/EcoRI + C and MseI + TA/EcoRI + T primers represent 0.07, 0.06, 0.07 and 0.05% of the genome respectively (table I).The sum of these percentages, which represent the four subsets of the primer pair MseI + TA/EcoR I + 0, gives the percentage genome representation achieved with this primer pair as 0.25%. Our work also shows that the fragments are generated from coding and non-coding regions throughout the genome (figure 1, table II) including both conserved genes such as those coding for universal proteins, and variable regions such as those containing mobile elements like insertion sequences. This combination of fragments originating from conserved and variable regions, together with the percentage representation of the genome, indicates that AFLP satisfies theoretical requirements for strain identification and that it can potentially be used to type a broad range of bacterial genomes. Computer prediction of AFLP using different enzyme combinations other than MseI/EcoRI (unpublished data) shows that the choice of restriction enzymes is an important determinant of the discriminatory power of AFLP analysis, and that the best combination of restriction enzymes can and should be modelled for each species from known whole genome sequence where this has already been established.

Predictive modelling of fluorescent AFLP

41

Figure 2. Example of a composite of electropherograms showing the five individual MseI TA/EcoRI + A E. coli K12 MG1655 FAFLP reactions referred to in table II. The fragment sizes attributed to the peaks recognised by Genotyper software are shown. Fragments < 70 and > 400 bp are screened out, or ’filtered’ by the software. The predicted fragment sizes for this selective primer combination are given in table II. The Genotyper software filter used to remove PCR and background noise was: ’remove labels from peaks preceded by higher (at least 5%), labelled peak within 0–2.5 bp, and remove labels from peaks followed by higher (at least 5%), labelled peak within 0–2.5 bp’.

42

Arnold et al.

The results we obtained indicate that, following experimental AFLP analysis of E. coli K12 MG1655 with 5 MseI/EcoRI selective primer pairs, 95% accuracy was achieved for fragments with in silico predicted sizes between 70 and 400 bp. In addition to this unique precision, AFLP is faster and offers higher throughput than other molecular methods for bacterial strain typing, since it does not depend on time-consuming or labour-intensive steps such as Southern blotting or careful cell lysis in agarose. Our results show that more unpredicted fragments occurred with primer combinations having two selective nucleotides, MseI + CG/EcoRI + AG and MseI + CA/EcoRI + AG. These combinations generated a mean of 5.6 and 5.2 unpredicted fragments per reaction respectively, compared with a mean of 2.8 for other, less selective primer pair combinations. Our finding may be compared with that of Vos et al. [18] who showed that, for radioactively labelled AFLP, primer specificity is good for primers with one or two selective nucleotides and still acceptable with primers having three selective nucleotides, but that specificity is lost with the addition of the fourth nucleotide. In contrast, less selective primer pairs generate more closely adjacent fragments towards the lower end of the gel. These may be difficult to discriminate, even though FAFLP has the advantage of using internal lane standards so that very accurate sizing of at least the larger fragments can be achieved. Doublets (more than one fragment of the same size) occur more frequently at the lower end of the size scale (below 90 bp). Associated signal strengths tend to be higher than that of a single fragment, but this is not always the case. We observed that identical fragments were sized slightly differently (± 1 bp) if different internal size markers were used, and slight sizing variation occurred from gel to gel for identical fragments. This inaccuracy within one bp could be due to the sporadic addition of an extra nucleotide by Taq polymerase, unevenly spaced size markers in the internal lane standard (which would create regions of inaccurate sizing where there were few markers), or non-

specific PCR artefacts. One or two other bands were reproducibly observed within ± 5 bp of their expected size. This could be due to inaccurate sizing, mutations in the E. coli strain MG1655, or passage. As mentioned above, comparison between adjacent fragments can sometimes be confounded due to the inability to discriminate single bp differences at the lower end of the gel. More selective primers will eliminate this problem by improving the overall distribution of fragments, so that few or no fragments are within a few bp of each other in size. However, this improvement in resolution may lower the discriminatory power of the technique, due to the decrease in the number of usable data points. Some of the problems associated with adjacent or identically sized bands could be overcome by multiplexing of differently labelled primers with three different fluorophores in one reaction tube. This greatly increases the number of useful data points obtained in a specific size range without producing regions of ’closely-parked’ fragments in the gel with the same label. Fragments of identical size, amplified from different regions of the genome, are also likely to be discriminated if a multiplex approach using multiple fluorophores is adopted. As well as the taxonomic applications of FAFLP, it is envisaged that the discriminatory power of the technique can be applied to the resolution of outbreaks, especially those caused by organisms which are apparently clonal by less discriminatory methods. Important examples among the Enterobacteriaceae include Salmonella enteriditis PT4, S. typhimurium DT 104 or verocytotoxigenic E. coli 0157 [15, 16]. We expect FAFLP conditions optimised for E. coli in this report to be generically applicable for related enteric bacteria, and to provide a coherent basis for strain genotyping. The data generated by FAFLP are suitable for rapid electronic dissemination, manipulation and inter-laboratory comparison. They could be stored in national or international epidemiological databases for further analysis.

Predictive modelling of fluorescent AFLP

The FAFLP method generates accurately sized bands that are a representative sample of the whole genome (figure 1). We envisage that it may be useful in discriminating between pathogenic and non-pathogenic strains of a species. The presence or absence of specified fragments of precisely known size could, for example, identify strains bearing toxin genes. It might thus be possible to apply FAFLP to search for thus far uncharacterised genetic elements involved in toxin production, virulence or other determinants of pathogenicity. In summary, we have compared virtual and experimental FAFLP data and determined the requirements for accurate identification of individual genome fragments (and therefore individual strains) with a singly-labelled AFLP reaction, i.e., approximately 10–75 fragments distributed across the size range 70–400 bp. Below 10 fragments, FAFLP reactions are less reliable and produce more bands than theoretically predicted. Above 75 fragments, there is a larger proportion of small fragments and these are difficult to size. The predicted FAFLP fragments of strain MG1655 documented in this study are suitable for standardising and calibrating E. coli FAFLP profiles. Such a standardised method would be particularly useful for applications in molecular clinical microbiology and epidemiology. Résumé — Modélisation prédictive de l’AFLP (polymorphisme par amplification sélective de fragments de restriction) fluorescente: une nouvelle approche de l’épidémiologie moléculaire de E. coli. Le polymorphisme de l’amplification sélective de fragments de restriction (AFLP) permet d’étudier simultanément plusieurs loci situés sur un génome bactérien, avec l’aide d’amorces spécifiques de séquences comprenant sites de restriction et adaptateurs dans des conditions restrictives. Un appareil détectant la fluorescence affine cette méthodologie, permettant d’inclure des marqueurs de poids moléculaire défini et donc de mesurer avec précision et reproductibilité la taille des fragments amplifiés. Nous avons évalué l’AFLP fluorescente (FAFLP) comme méthode de typage moléculaire ’absolue’ des bactéries, en comparant les fragments MseI/EcoRI

43

obtenus expérimentalement du génome de Escherichia coli K12 MG1655 avec ceux prédits par la séquence publiée du génome in silico. In silico, la digestion MseI/EcoRI de cette séquence produit 1.200 fragments dont la taille varie de 36 à 2151 paires de base (pb). Des sous-ensembles de fragments qui seraient amplifiés par sept combinaisons d’amorces sélectives différentes (1 à 2 nucléotides ajoutés à l’extrémité 3' de la séquence de base de l’amorce) ont été modélisés. Dépendant de la paire d’amorces, 3 à 54 fragments (de 70 à 400 pb) ont été prédits, tandis que toutes les 7 combinaisons de paires d’amorces ont généré au total 121 fragments. Lorsque l’ADN génomique de la souche MG1655 est soumis à la FAFLP expérimentale avec ces 7 amorces, 111 fragments de taille correctement déterminée sont observés (± 1 pb) en dehors des 121 prédits (92 % de précision). Vingt-cinq fragments imprévus ont été obtenus, en moyenne 4 par paire d’amorces. La taille et le nombre de fragments dans la FALP, et leur distribution sur gel, ont été imposés par le choix des endonucléases de restriction et le degré de sélectivité de l’amorce. Nos résultats montrent que la FALP est précise, discriminante, reproductible et capable d’être standardisée. Dans des conditions acceptées par tous, cette méthode a un intérêt potentiel considérable comme méthode de typage moléculaire des bactéries qui serait universelle et standardisée. Les fragments prédits in silico, résultant de l’amplification de l’ADN digéré par MseI/EcoRI avec sept paires d’amorces, sont utilisés ici pour mettre au point une analyse FAFLP de E. coli. © Elsevier, Paris AFLP / typage moléculaire / fluorescence / Escherichia coli / amorce

Acknowledgments The authors would like to thank Meeta Desai, Jon Goulding, Julie Logan and Nick Andrews for their help in this study.

References [1] Arbeit R., Arthur M., Dunn R., Cheung K., Selander R.K., Goldstein R., Resolution of recent divergence among Escherichia coli from related lineages: the application of pulsed field gel electrophoresis to molecular epidemiology, J. Infect. Dis. 161 (1990) 230–235. [2] Bates S.R.E., Knorr D.A., Weller J.W., Ziegle J.S., Instrumentation for automated molecular marker acquisition and analysis, in: Sobral B.W.S. (Eds), The impact of plant molecular genetics, Birkhauser, 1996, pp. 239–255.

44

Arnold et al.

[3] Blattner F.R., Plunkett III G., Bloch C.A., Perna N.T., Burland V., Riley M., Collado-Vides J., Glasner J.D., Rode C.K., Mayhew G.F., Gregor J., Davis N.W., Kirkpatrick H.A., Goeden M.A., Rose D.J., Mau B., Shao Y., The complete genome sequence of Escherichia coli K12, Science 277 (1997) 1453–1462. [4] Boom R., Sol C.J.A., Salimans M.M., Jansen C.L., Wertheim-van Dillen P.M.E., van derNoordaa J., Rapid and simple method for purification of nucleic acids, J. Clin. Microbiol. 28 (1990) 495–503. [5] Dijkshoorn L., Aucken H., Gerner-Smidt P., Janssen P., Kaufmann M.E., Garaizar J., Ursing J., Pitt T.L., Comparison of outbreak and non-outbreak Acinetobacter baumanni strains by genotypic and phenotypic methods, J. Clin. Microbiol. 34 (1996) 1519–1525. [6] Fitzgerald C., Owen R.J., Stanley J., A comprehensive ribotyping scheme for the heat stable serotypes of Campylobacter jejuni, J. Clin. Microbiol. 34 (1996) 265–269. [7] Grimont F., Grimont P.A.D., Ribosomal ribonucleic acid restriction patterns as potential taxonomic tools, Ann. Inst. Pasteur. 137B (1986) 165–175. [8] Huys G., Coopman R., Janssen P., Kersters K., High-resolution genotypic analysis of the genus Aeromonas by AFLP fingerprinting, Int. J. Syst. Bacteriol. 46 (1996) 572–580. [9] Huys G., Kersters I., Coopman R., Janssen P., Kersters K., Genotypic diversity among Aeromonas isolates recovered from drinking water production plants as revealed by AFLPY analysis, Syst. Appl. Microbiol. 19 (1996) 428–435. [10] Janssen P., Coopman R., Huys G., Swings J., Bleeker M., Vos P., Zabeau M., Kersters K., Evaluation of the DNA fingerprinting method AFLP as a new tool in bacterial taxonomy, Microbiology 142 (1996) 1881–1893. [11] Janssen P., Dijkshoorn L., High resolution DNA fingerprinting of Acinetobacter outbreak strains, FEMS Microbiol. Lett. 142 (1996) 191–194.

[12] Kiem P., Kalif A., Schupp J., Hill K., Travis S.E., Richmond K., Adair D.M., Hugh-Jones M., Kuske C.R., Jackson P., Molecular evolution and diversity in Bacillus anthracis as detected by amplified length polymorphism markers, J. Bacteriol. 179 (1997) 818–824. [13] Meunier J.R., Grimont P.A.D., Factors affecting reproducibility of random amplified polymorphic DNA fingerprinting, Res. Microbiol. 144 (1993) 373–379. [14] Rafalski J.A., Tingey S.V., Williams J.G.K., RAPD markers - a new technology for genetic mapping and plant breeding, AgBiotech. News Inform. 3 (1991) 645–648. [15] Stanley J., Baquar N., Threlfall E.J., Genotypes and phylogenetic relationships of Salmonella typhimurium are defined by molecular fingerprinting of IS20 and 16S rrn loci, J. Gen. Microbiol. 139 (1993) 1133–1140. [16] Stanley J., Jones C.S., Threlfall E.J., Evolutionary lines among Salmonella enteriditis phage types are identified by insertion sequence IS200 distribution, FEMS Microbiol. Lett. 82 (1991) 83–90. [17] Tenover F.C., Arbeit R.D., Goering R.V., Mickelsen P.A., Murray B.E., Persing D.H., Swaminathan B., Interpreting chromosomal DNA restriction patterns produced by pulsed-field gel electrophoresis: criteria for bacterial strain typing, J. Clin. Microbiol. 33 (1995) 2233–2239. [18] Vos P., Hogers R., Bleeker M., Reijans M., Van DeLee T., Hornes M., Frijters A., Pot J., Peleman J., Kulper M., Zabeau M., AFLP: a new technique for DNA fingerprinting, Nucleic Acids Res. 23 (1995) 4407–4414. [19] Welsh J., McClelland M., Fingerprinting genomes using PCR with arbitrary primers, Nucleic Acids Res. 18 (1990) 7213–7224.