Repeatability and reproducibility of ribotyping and its computer interpretation

Repeatability and reproducibility of ribotyping and its computer interpretation

Research in Microbiology 155 (2004) 154–161 www.elsevier.com/locate/resmic Repeatability and reproducibility of ribotyping and its computer interpret...

227KB Sizes 0 Downloads 83 Views

Research in Microbiology 155 (2004) 154–161 www.elsevier.com/locate/resmic

Repeatability and reproducibility of ribotyping and its computer interpretation Gwénola Lefresne a,b , Eric Latrille a , Françoise Irlinger a , Patrick A.D. Grimont b,∗ a UMR Génie et Microbiologie des Procédés Alimentaires INRA-INAPG, Institut National de la Recherche Agronomique,

78850 Thiverval-Grignon, France b Unité de Biodiversité des Bactéries Pathogènes Emergentes, INSERM U 389, Institut Pasteur, 75724 Paris Cedex 15, France

Received 11 January 2002; accepted 28 November 2003 First published online 2 December 2003

Abstract Many molecular typing methods are difficult to interpret because their repeatability (within-laboratory variance) and reproducibility (between-laboratory variance) have not been thoroughly studied. In the present work, ribotyping of coryneform bacteria was the basis of a study involving within-gel and between-gel repeatability and between-laboratory reproducibility (two laboratories involved). The effect of different technical protocols, different algorithms, and different software for fragment size determination was studied. Analysis of variance (ANOVA) showed, within a laboratory, that there was no significant added variance between gels. However, between-laboratory variance was significantly higher than within-laboratory variance. This may be due to the use of different protocols. An experimental function was calculated to transform the data and make them compatible (i.e., erase the between-laboratory variance). The use of different interpolation algorithms (spline, Schaffer and Sederoff) was a significant source of variation in one laboratory only. The use of either Taxotron (Institut Pasteur) or GelCompar (Applied Maths) was not a significant source of added variation when the same algorithm (spline) was used. However, the use of Bio-Gene (Vilber Lourmat) dramatically increased the error (within laboratory, within gel) in one laboratory, while decreasing the error in the other laboratory; this might be due to automatic normalization attempts. These results were taken into account for building a database and performing automatic pattern identification using Taxotron. Conversion of the data considerably improved the identification of patterns irrespective of the laboratory in which the data were obtained.  2003 Elsevier SAS. All rights reserved. Keywords: Between-laboratory validation; Databases; Analysis of variance; Computer identification; Coryneform bacteria

1. Introduction Ribosomal ribonucleic acid (rRNA) gene restriction pattern determination was the first universal typing method proposed [8]. The method was subsequently renamed ribotyping [24]. The basic idea was to associate the Southern technique [23] with a universal probe such as 16 + 23S rRNA [8]. Later, radioactive labeling was replaced by chemical labeling [9,14]. Recently, a digoxigenin-labeled 5-oligonucleotide mixture has been proposed as a probe [18]. The resolution of ribotyping differs according to the bacterial species and endonucleases [8]. Therefore, ribotyping * Corresponding author.

E-mail address: [email protected] (P.A.D. Grimont). 0923-2508/$ – see front matter  2003 Elsevier SAS. All rights reserved. doi:10.1016/j.resmic.2003.11.010

has been used either for epidemiological typing [3,11,14,24] or for taxonomic identification [1,2,10,13,16]. As molecular typing (not limited to ribotyping) began to be used worldwide, it became obvious that visual comparison of patterns might be sufficient when a limited number of patterns were to be compared, but was impractical when many patterns in different gels or membranes had to be compared. Software packages are available for data extraction from gel/membrane images and for data comparison [1,6,12]. Within-laboratory repeatability and between-laboratory reproducibility have been defined by standard ISO5725. These studies are essential to identify sources of variation and allow identification against databases [20]. Unfortunately, such studies have rarely been applied to molecular typing [4,5,15,26].

G. Lefresne et al. / Research in Microbiology 155 (2004) 154–161

In the process of building a ribotype database for coryneforms, we found it impossible to interpret the data without a knowledge of error and source of variation in fragment size determination. The purposes of this study were to (i) evaluate within-geland between-gel variations, (ii) evaluate between-laboratory variation and derive a conversion function to make data compatible, and (iii) evaluate between-software fragment size variations. As a result, a tolerance of 5% in fragment size was found to be optimal for pattern identification.

2. Materials and methods Experiments were carried out by the same worker in two different laboratories: the Unité de Biodiversité des Bactéries Pathogènes Emergentes, Institut Pasteur, Paris (IP), and the Laboratoire de Génie et Microbiologie des Procédés Alimentaires, INRA, Thiverval-Grignon (INRA). 2.1. Bacterial strains Sixteen reference strains representing 15 coryneform species served as a source of DNA (Table 1). Citrobacter koseri CIP 101177 MluI-digested DNA served as fragment size marker. 2.2. DNA preparation IP and INRA used their current methods to prepare bacterial DNA. Methodological differences are summarized in Table 2. 2.2.1. Protocol used at IP Genomic bacterial DNA of C. koseri was prepared using “Autogen 540” (AutoGen Instruments, Beverly, MA) as described by Regnault et al. [18]. Strains of coryneform bacteria were grown at 30 ◦ C for 18–24 h in tubes containing 10 ml tryptocasein soy

155

broth (Bio-Rad, Marnes-la-Coquette, France) in aerobic conditions with mixing. Cultures were harvested at 6000 g for 10 min. Lysis protocol for Gram-positive bacteria [18] was used with the following modifications. Amounts of reagents were as described except that cells were collected from 10-ml broth (instead of 2-ml broth) and proteinase K and mutanolysin were added and incubated simultaneously. 2.2.2. Protocol used at INRA C. koseri cells were grown at 30 ◦ C overnight in an Erlenmeyer flask containing 7 ml of brain heart infusion broth (BHI, Difco Laboratories, Detroit, MI, USA). Lysis protocol was that of IP, except that DNA was manually extracted [2]. Table 1 Sources of DNA and endonucleases used in this study Species Citrobacter koseri Arthrobacter citreus A. citreus A. globiformis A. ilicis A. mysorens A. pascens A. protophormiae A. sulfureus A. uratoxydans Brachybacterium paraconglomeratum Corynebacterium variabilis Micrococcus varians Rhodococcus rhodochrous Cellulomonas cellulans Clavibacter michiganense subsp. nebraskense Curtobacterium pv. flaccumfaciens

Collection number

Endonuclease

CIP 105177 CIP 102363 T LMG 16124 CIP 81.84 T LMG 3659 T LMG 16219 T LMG 16255 T ATCC 21040 LMG 16694 T CIP 102367 T CIP 104398 T G152 CIP 81.73 T CIP 64.30 T CIP 103404 T CIP 105406

MluI PvuII BglI PvuII BglI PvuII BglI PvuII PvuII Pvu II Pvu II BglI BglI BglI BglI Bgl I

LMG 3645 T

Pvu II

ATCC = American Type Culture Collection, Rockville, MD; CIP = Collection de l’Institut Pasteur, Paris, France; LMG = Laboratorium voor Microbiologie, Universiteit Gent, Ghent, Belgium; G = Collection du LGMPA, INRA, Thiverval-Grignon, France.

Table 2 Differences in protocol and material between IP and INRA laboratories Steps Bacterial cultures Bacterial lysis DNA extraction DNA digestion Electrophoresis

Vacuum transfer Hybridization

IP 18–24 h in TSB Protocol from Regnault et al. [18] Automatic extraction with Autogen TBE Agarose standard (Quantum Biotechnologies) 50 V for 16 h Unit Model A2 (OWL) Migration at 20 ◦ C 2016 Blotting unit Probe synthesized and labeled by Genset Oven HB-1D Techne Hybridization of 8 membranes simultaneously

INRA 24–48 h in BHI Protocol modified from De Buyser et al. [2] Manual extraction, protocol from De Buyser et al., 1989a Different water-baths at 37 ◦ C TAE LE agarose (Appligene), then agarose standard (Quantum Biotechnologies) 30 mA then 30 V for 20 h Unit E-C (Maxicell) Migration at 13 ◦ C XL Blotting unit Probe synthesized and labeled by Sciencetec Oven Hybaid MIDI 14 Hybridization of 4 membranes simultaneously Different water-baths at 41 ◦ C

156

G. Lefresne et al. / Research in Microbiology 155 (2004) 154–161

Fig. 1A. Example of electrophoresis gel with replicates of strains M: Standard lane; 1: A. citreus; 2: C. variabilis T; 3: M. varians T; 4: R. rhodochrous T; 5: C. cellulans T; 6: C. subsp. nebraskense T; 7: A. mysorens T; 8: C. pv. flaccumfaciens T. Sizes of fragments in standard lanes are 16752, 12482, 7330, 6552, 5752, 5098, 4405, 3023, 2778, 1696, 1444, and 1171 base pairs.

Strains of coryneform bacteria were grown at 30 ◦ C for 24–48 h in Erlenmeyer flasks containing 7 ml of BHI broth in aerobic conditions with mixing. Cultures were harvested at 6000 g for 20 min. Then, 800 µl of TE buffer (Tris– HCl, 10 mM, pH 8.0; EDTA, 1 mM), 25 µl of lysozyme (200 mg/ml TE), and 25 U of mutanolysin were added, mixed and incubated at 37 ◦ C for 2 h. Subsequently, 80 µl of 10% sodium dodecyl sulfate (SDS) and 20 µl of proteinase K (15–20 mg/ml) were added, mixed and incubated at 55 ◦ C for 2 h. Finally, 140 µl of NaCl 5 M and 150 µl cetyl trimethylammonium bromide (CTAB) 270 mM/NaCl 0.7 M were added, mixed and incubated at 65 ◦ C for 20 min. Subsequent steps were as published [2]. 2.3. Gel electrophoresis Endonucleases MluI and PvuII were from Pharmacia and BglI was from GibcoBRL. In each laboratory, bacterial DNA was digested at 37 ◦ C for 4 h by one restriction endonuclease (Table 1), according to the supplier’s instructions. Digested DNA was used for a single lane in a single gel. After digestion, fragments ranging from 1 to 20 kb were separated by horizontal agarose gel electrophoresis, using 0.8% (w/v) agarose gel (agarose standard, Quantum Biotechnologies, Montreuil-sous-Bois, France) in 1× Trisborate buffer (TBE: Tris-base, 0.089 M; boric acid, 0.089 M, EDTA-Na2, 2.5 mM, pH 7.2) for 16 h at 50 V at IP and in 1× Tris-acetate buffer (TAE: Tris-acetate, 40 mM; EDTA, 1 mM, pH 8.2) for 20 h, at 30 mA or at 30 V at INRA. Different horizontal electrophoresis units were used (ModelA2, OWL, Portsmouth, NH, USA, containing 20 lanes at IP and E-C 360 Maxicell, Savant Instruments, Holbrook, NY, USA, containing 22 lanes at INRA).

Fig. 1B. Graph showing the fragments included in the ANOVA.

DNA of C. koseri was included in each gel. Four digested C. koseri DNA samples were placed in lanes 1, 7, 13 and 20 (IP) and in lanes 2, 8, 15, 21 (INRA) with lanes 1 and 22 left empty. A typical electrophoresis gel is shown in Fig. 1A. 2.4. Vacuum transfer and hybridization DNA restriction fragments were transferred to a positively charged nylon membrane (Hybond-N+, Pharmacia, Uppsala, Sweden) using the following “VacuGene” vacuum transfer systems, 2016 blotting unit (Pharmacia) in IP and XL blotting unit (Pharmacia) in INRA as described by Regnault et al. [18]. Hybridization of DNA fragments on nylon membranes with a digoxigenin-labeled OligoMix5 probe was done as published [18], except that washes after hybridization with DIG Easy were performed at 41 ◦ C. Hybridization ovens were different in the two laboratories (HB-1D Techne, Cambridge, UK, for IP and Hybaid MIDI 14, Schleicher et Schuell Céra-Labo, Hettich, France, for INRA). 2.5. DNA fragment size determination The membranes were scanned using “One-scanner” (Apple Computers, Cupertino, CA, USA) for IP and “Arcus II” (Agfa Gevaert, Rueil Malmaison, France) for INRA, generating a TIFF (tagged image file format) image. Interpretation of patterns was done using three different systems: Taxotron (version 98, Taxolab, Institut Pasteur, Paris, France), Bio-Gene (version 97.02, Vilber Lourmat, Marne-La-Vallée, France) and GelCompar (version 4.2, Applied Maths, Kortrijk, Belgium). The three systems have automatic band-finding options as well as molecular size de-

G. Lefresne et al. / Research in Microbiology 155 (2004) 154–161

termination, similarity/dissimilarity calculation and cluster analysis features. With the Taxotron package, the RestrictoScan module automatically detected bands in each lane without any normalization. To do this, a density curve along a lane is drawn taking into account the average (or median) pixel value across the lane width. Peak tips above a given threshold are taken to indicate bands and their positions in pixels are recorded. When all lanes have been examined for bands, a “Mig” file containing migration data is generated (one file per gel). Fragment sizes were estimated by interpolation from migration data (in Mig file) by RestrictoTyper using the cubic spline [17], or Schaffer and Sederoff [19] algorithm (S&S), depending on the analysis. Fragment size data are saved in a “MW” file (one file per gel). With the RestrictoTyper module, the interpolation of fragment sizes uses both standard lanes that enclose each pattern. It was possible to check the relation between distance migration and molecular weight (function “fitting curve”). One example of fragment size determination is given in Fig. 1B. With the Bio-Gene software, the first step is migration correction. In each gel, the front and the starting line of all standard lanes were indicated manually and the program aligned those lanes modifying the whole image proportionally. Lanes and bands were automatically detected, the positions of bands was estimated from the average of pixel values under the peaks of the pixel intensity curve. Fragment sizes were interpolated using the cubic spline algorithm. With the GelCompar software, a step of normalization was required to standardize migration. All bands of a standard are used in the process of normalization. The first standard of the first gel at IP was used as a reference for all other gels, which means that all standards were aligned (following a spline function) according to this reference pattern. Lanes and bands were then automatically detected. Positions of bands were estimated by taking the most intense point of each scan, each point in a scan being the median value of all pixels on the lane width. Fragment sizes were interpolated using the cubic spline algorithm (other algorithms are available). For statistical analysis, we retained bands which were common to all replicates and all computer analyses. Some weak bands which were not found on all gels and bands corresponding to fragments below 1171 bp (smallest standard fragment) were excluded from the comparison. 2.6. Analysis of variance (ANOVA) An ANOVA with several factors including repeated experiment, gel, laboratory, algorithm, or software package was performed [22] using the TaxAnova module of the Taxotron package at IP. Since TaxAnova was not commercially available at the time of the study, all calculations were duplicated using Matlab, version 5-Statistics Toolbox, ver-

157

sion 2.0 (The MathWorks, Natick, MA, USA). To determine whether a given factor generated an added variance component, F -ratios (Fisher coefficients) were calculated as mean squares (MS) of the factor to be studied divided by the MS of repeats (MS-within). The error is the square root of MSwithin, often expressed in this work as percent of mean fragment size values. 2.7. Regression analysis Homologous fragment sizes in IP and INRA data (obtained with Taxotron) were subjected to regression analysis in order to obtain a method for converting IP data into INRA data. Conversion of fragment sizes using a regression formula was done with a program (ConvertMW) written by P.A.D. Grimont. 2.8. Automatic identification Taxotron (RestrictoTyper module) was used for automatic identification of ribotypes. A “MW” file was built to contain the diversity of observed ribotypes. Each ribotype was represented by a set of fragment size values, each value being the average of 4 size determinations obtained at INRA. The MW file was saved as a database (“bank” file). For automatic identification, RestrictoTyper compares a test pattern (set of fragment sizes) with the collection of reference patterns in a selected database [7]. A test fragment and a reference fragment are considered identical when their size difference does not exceed a preset percent tolerance value. Two patterns are considered identical when all fragments match according to these criteria. Nonmatching fragments are considered to be different. A distance coefficient (complement of the Dice index) is calculated. Distance values are 0 when all fragments match and 1 when no fragments match. Automatic identification was performed with a fragment size tolerance of 5%.

3. Results and interpretations A nested analysis of variance was designed for the purpose of documenting the magnitude of fragment size variation within a gel (error), between gels within a laboratory, and between laboratories. Analysis of variance uses the hypothesis that variances are all equal. In order to check whether fragment sizes (large, medium, or small) influenced the interpretation of ANOVA, ANOVA computations on sets of fragment sizes calculated by RestrictoTyper with the Schaffer and Sederoff algorithm and subdivided into 3 size groups were done and compared to the ANOVA on global sets. The results led to the same interpretations (results not shown).

158

G. Lefresne et al. / Research in Microbiology 155 (2004) 154–161

3.1. Constant voltage versus constant current settings in electrophoresis

Table 3 Analysis of variance of ribotyping data Taxotron

Repeatability of fragment size determination was affected by electric settings. Preliminary work done at INRA included electrophoresis with constant current or constant voltage settings with 4 repeats. Within-setting error on fragment size was 1.57% and F -ratio (setting contribution to variance over within-setting variance) was 36.78 (significant, P < 0.001) with 1 and 588 degrees of freedom (df). Error was 2.01% for constant current setting and 0.94% for constant voltage setting. Therefore, all experiments reported below were done with the constant voltage setting. 3.2. Schaffer and Sederoff versus cubic spline algorithms (Taxotron software) The choice of S&S versus spline algorithm had a questionable influence on fragment size determination by the Taxotron software. A global ANOVA with two laboratories, two algorithms, and four repeats showed the error to be 1.15%. F -ratio (algorithm contribution to variance over within-laboratory within-algorithm variance) was 11.24 with 1 and 1176 df (significant at P < 0.01). Between-laboratory reproducibility will be examined below. When ANOVA was limited to INRA data, the F -ratio was 16.38 with 1 and 588 df (significant at P < 0.01). Error with spline algorithm alone was 1.09% and error with S&S was 0.94%. However, when ANOVA was limited to IP data, the F -ratio was 1.11 with 1 and 588 df (not significant at P = 0.05). Error with spline algorithm alone was 1.27% and error with S&S was 1.25%. For some reason, repeatability was better at INRA compared to IP, hence the significant between-algorithm variation (compared to repeatability) with INRA data. 3.3. Repeatability and reproducibility using different software packages ANOVA was performed on the fragment sizes obtained with the 3 software packages with the cubic spline algorithm, the only algorithm common to all three software packages. ANOVA results using each of the three software packages are given in Table 3. Repeatability of fragment size determination was assessed by within-gel duplicates of restriction patterns. The error ranged from 1.18 (Taxotron) to 1.46% (Bio-Gene) of fragment sizes. Thus, to be safe in automatic identification experiments, percent tolerance can be taken as three times the percent error, i.e., 3.5–4.4% of fragment size, corresponding to a confidence of 99.73%. Between-gel repeatability was also studied. When Taxotron or GelCompar was used, between-gel variance was not significantly different from within-gel variance (Table 3).

Global ANOVA MS-within Error Percent error MS-gel F gel/within (df 2, 392) MS-lab F lab/within (df 1, 392)

GelCompar

Bio-Gene

2726.09 52.21 1.18 913.15 0.33 NS

3886.01 62.34 1.40 331.89 0.09 NS

4331.92 65.82 1.46 71757.55 16.56∗∗∗

468540.25 171.87∗∗∗

700449.43 180.25∗∗∗

667547.36 154.10∗∗∗

ANOVA of IP data MS-within Error Percent error MS-gel F gel/within (df 1, 196)

2840.82 53.30 1.19 1300.50 0.46 NS

3864.53 62.17 1.39 251.52 0.07 NS

4364.33 66.06 1.46 143494.90 32.88∗∗∗

ANOVA of INRA data MS-within Error Percent error MS-gel F gel/within (df 1, 196)

2611.37 51.10 1.16 525.81 0.20 NS

3907.48 62.51 1.42 412.26 0.11 NS

4299.50 65.57 1.46 20.21 0.00∗∗∗

Factors were 98 restriction fragments, 2 replicates per gel, 2 gels per laboratory, and 2 laboratories. Analyses were done separately with three different software packages. MS, mean squares; error, square root of MS-within (repeatability); F gel/within, ratio of MS-gel and MS-within; F lab/within, ratio of MS-lab and MS-within; ∗∗∗ P < 0.001; NS, nonsignificant.

However, when Bio-Gene was used, between-gel variance was very significantly higher than within-gel variance with IP data and significantly lower with INRA data (Table 3). The between-laboratory reproducibility study pointed to problems. When each of the software packages was used, between-laboratory variance was significantly higher than within-gel variance (Table 3). This had also been observed in S&S versus spline experiments described above. Since the use of different gels did not significantly increase the variance when Taxotron or GelCompar were used, within- and between-gel repeats were pooled for each laboratory and each software (4 replicates per laboratory and per software). An ANOVA with two laboratories, two software packages, and 4 replicates was performed. MS-within was 3295.10 (error, 1.29%), MSsoftware was 2196.90 with F (software/within) = 0.667 with 1 and 1176 df (nonsignificant), and MS-laboratory was 1,157,372 (F = 351.2 with 1 and 1176 degrees of freedom, P < 0.001). Thus, whether GelCompar or Taxotron was used had no influence on size determination reproducibility in this dataset. More fragments would need to be studied to determine whether the slight advantage on the Taxotron side (apparent lower error) is in fact significant.

G. Lefresne et al. / Research in Microbiology 155 (2004) 154–161

3.4. Regression models between the two laboratories As the values of the fragment sizes were not identical for the two laboratories, a relationship between them was investigated. The fragment sizes, obtained with Taxotron in each laboratory, were compared and correlated. First, a linear regression was done on all fragment sizes (INRA values = IP values × slope + shift), producing the regression coefficients and the residual values. Plotting residuals against molecular sizes should show random dots with no structure if the regression model took all significant factors into account. Instead, the plot of residuals (Fig. 2) showed a specific structure that separated fragment sizes at 4500 bp. Based on these observations, a nonlinear relationship was investigated. A segmented regression with two linear regressions was established with a joint point at 4500 bp. The results of the parameter values are presented in Table 4. For fragment sizes lower than 4500 bp, there was only a shift in values since the slope was close to 1. For sizes larger than 4500 bp, the slope was lower than 0.94. Coefficients of regression allowed us to convert IP data. An ANOVA using IP converted data and INRA data showed insignificantly different between-laboratory variances (F = 1.47 with 1 and 392 df, P > 0.20). 3.5. Automatic identification Automatic identifications obtained with Restrictotyper are shown in Table 5. The database was that obtained from INRA data. With the set of initial IP data, the type strains of A. citreus and A. pascens were not identified and that of A. globiformis was identified only when tolerance was

Fig. 2. Plot of residuals in linear regression.

159

above 5% of fragment size. With the set of converted data, all strains are identified with a maximum fragment size variation (with respect to sizes given in the database) ranging between 0.6 and 4.1%.

4. Discussion Reproducibility can have a major effect on classification and identification schemes [20,21]. Multicenter studies have shown that computer interpretation of pulse-field gel electrophoresis data [26] or arbitrarily primed PCR data [25] exchanged among laboratories was a major problem, although visual examination of patterns and local interpretation of data produced by a single laboratory was generally satisfactory. Most multilaboratory studies have stressed the necessity for methodological standardization. At a given point in time, standardization is possible by defining given equipment, reagents, and parameters. However, standardization of molecular typing methods across time is virtually impossible. In the space of a few years, equipment may have improved, equipment companies may disappear or merge with other companies, and in any active laboratory, the electrophoretic equipment used today was not there five or ten years ago. Therefore, computer interpretation of laboratory data should be robust enough to maintain data compatibility across time. This is a reason for comparing restriction patterns by fragment size rather than track density. The fragment size approach attempts to estimate an absolute value (fragment size) which is written in the genome sequence, whereas track density is a relative, experiment-related measurement. Knowledge of a genome sequence can generate exact fragment size data but would only very approximately predict track density data. In the present exercise, we tried to identify sources of variation in DNA fragment length measurements using ribotyping done in two laboratories using different protocols. Within-gel- and between-gel repeatability and betweenlaboratory reproducibility were assessed using three different software packages. When comparing different electrophoretic settings such as constant current versus constant voltage, we clearly demonstrated that a constant voltage setting reduced the error in fragment size determination by half. Different approaches are used by software for extracting migration data. Migration normalization is used by GelCompar and Bio-Gene and not by Taxotron. Normalization

Table 4 Parameters of the segmented regression of fragment size data : INRA data = IP data × slope + constant Fragment sizes

Lower than 4500 bp Upper than 4500 bp

Determination coefficient R 2

Coefficients

Standard error

Intervals of confidence at 95%

Slope

Constant

Slope

Constant

Slope

Constant

0.9982 0.9985

1.005 0.9250

38.15 320.7

0.005826 0.005743

17.49 39.89

[0.9930–1.016] [0.9133–0.9366]

[3.080–73.21] [240.0–401.3]

160

G. Lefresne et al. / Research in Microbiology 155 (2004) 154–161

Table 5 Automatic identification of the 16 coryneform strains studied in IP against the database Tested organism (from IP) : A. citreus CIP 102363T A. citreus A. globiformis A. ilicis A. mysorens A. pascens A. protophormiae A. sulfureus A. uratoxydans B. paraconglomeratum C. variabilis M. varians R. rhodochrous C. cellulans C. michiganense subsp. nebraskense C. pv. flaccumfaciens

Matched type in database when data from tested organisms are raw Not identified Not identified A. globiformis A. ilicis A. mysorens Not identified A. protophormiae A. sulfureus A. uratoxydans B. paraconglomeratum C. variabilis M. varians R. rhodochrous C. cellulans C. michiganense subsp. nebraskense C. pv. flaccumfaciens

Matched type in database when data from tested organisms are converted

[5.2%] [2.9%] [2.8%] [2.8%] [3.3%] [3.4%] [3.6%] [2.1%] [2.0%] [3.2%] [1.5%] [1.9%] [3.3%]

A. citreus CIP 102363T A. citreus A. globiformis A. ilicis A. mysorens A. pascens A. protophormiae A. sulfureus A. uratoxydans B. paraconglomeratum C. variabilis M. varians R. rhodochrous C. cellulans C. michiganense subsp. nebraskense C. pv. flaccumfaciens

[2.6%] [4.1%] [0.6%] [1.3%] [1.1%] [2.2%] [1.1%] [1.5%] [2.3%] [1.7%] [3.1%] [0.9%] [2.5%] [1.6%] [0.8%] [2.0%]

[]: Maximum size difference between homologous fragments of test and reference patterns.

with GelCompar uses a spline function across several gels, whereas normalization by Bio-Gene is linear and for a single gel. Normalization by Bio-Gene can be so efficient on a given gel that identical migration values can be obtained in different lanes, thus enabling within-gel variance to be very low or null. If normalization is less efficient between gels, then a large between-gel variance is obtained. This is what happened with IP data. The company providing Bio-Gene (Vilber Lourmat) was contacted and asked to repeat our data extraction. It was shown that we had correctly used the software. We would thus recommend that Vilber Lourmat examine this problem and correct the normalization algorithm implemented in Bio-Gene software. The useful finding provided by this work is that the contribution of between-gel variance to within-laboratory variance is insignificant when Taxotron or GelCompar is used. Therefore, when patterns are replicated, these may be on the same gel or not. The study has also documented the error associated with fragment size measurement, enabling the setting of a tolerance of 3.5–4.4% on fragment size values for deciding whether two fragment may be considered to have the same size. A safe 5% tolerance is thus recommended for ribotype comparison. Raising the tolerance from 4 to 5% increases the risk that different fragments will be considered identical. Fortunately, Taxotron indicates for each matching patterns the largest percent size variation found between bands considered homologous (Table 5). The fact that different ribotyping protocols (in particular two different electrophoretic buffers) were used in two different laboratories (IP and INRA) probably caused the significant between-laboratory added variance component. However, results from the two laboratories were correlated and a regression formula was calculated to transform IP data into INRA-compatible data. Converting IP data reduced the between-laboratory variance to an insignificant component of the total variance. This conversion approach will be used

further to merge data obtained in different laboratories and is expected to ease multicenter interpretation of ribotyping data. A regression formula may be needed for each laboratory (except for the reference laboratory). The results reported here show that data conversion considerably improves automatic identification of ribotype patterns obtained in different laboratories. Interpolation of fragment size from migration distance can be done using different algorithms. Comparison between the variance components contributed to by cubic spline and Schaffer and Sederoff algorithms showed questionable significance (P = 0.05). This study was focused on repeatability and reproducibility. However, to conclusively choose the best algorithm, we would need to study accuracy (i.e., the ability to yield interpolated fragment size values which are as close as possible to exactly known sizes). Although the error in fragment size determination associated with the use of Taxotron was lower than that associated with the use of GelCompar, ANOVA indicated that variances associated with these two software package were not significantly different. Comparison of a larger set of data might be needed before a significant difference can be observed. The lessons learnt from this repeatability and reproducibility exercise are presently being used for building a ribotype database for the identification of cheese-associated coryneform bacteria. Such a database should be used by different laboratories.

Acknowledgements We thank Rémi Perrin (SOREDAB Recherche, La Boissière Ecole, France) for his precious help in handling the GelCompar software and Charles Bahout (Vilber Lourmat, Marne-La-Vallée, France) for looking into our results with Bio-Gene software.

G. Lefresne et al. / Research in Microbiology 155 (2004) 154–161

The authors are indebted to Arilait Recherches (Paris) for constructive discussions and financial support. Part of the work done at the Institut Pasteur was supported by European Union contract QLK2-2000-01404 (Network for Automated Bacterial Strain Fingerprinting in Europe). References [1] R. Brosch, M. Lefèvre, F. Grimont, P.A.D. Grimont, Taxonomic diversity of pseudomonads revealed by computer-interpretation of ribotyping data, System. Appl. Microbiol. 19 (1996) 541–555. [2] M.-L. De Buyser, A. Morvan, F. Grimont, N. El Solh, Characterisation of Staphylococcus species by ribosomal RNA gene restriction patterns, J. Gen. Microbiol. 135 (1989) 989–999. [3] A. De Zoysa, A. Efstratiou, R.C. George, M. Jahkola, J. VuopioVarkila, S. Deshevoi, G. Tseneva, Y. Rikushin, Molecular epidemiology of Corynebacterium diphtheriae from northwestern Russia and surrounding countries studied by using ribotyping and pulse-field gel electrophoresis, J. Clin. Microbiol. 33 (1995) 1080–1083. [4] V. Foissaud, J.M. Puyhardy, J.C. Chapalain, H. Salord, J.J. Depina, M.C. Morillon, P. Nicolas, J.D. Perrier-Gros-Claude, Reproductibilité inter-laboratoire de l’électrophorèse en champ pulsé dans l’étude de 12 souches de Pseudomonas aeruginosa, Path. Biol. 47 (1999) 1053– 1059. [5] P. Gerner-Smidt, L.M. Graves, S. Hunter, B. Swaminathan, Computerized analysis of restriction fragment length polymphormism patterns: Comparative evaluation of two commercial software packages, J. Clin. Microbiol. 36 (1998) 1318–1323. [6] L.M. Graves, B. Swaminathan, M.W. Reeves, S.B. Hunter, R.E. Weaver, B.D. Plikaytis, A. Schuchat, Comparison of ribotyping and multilocus enzyme electrophoresis for subtyping of Listeria monocytogenes isolates, J. Clin. Microbiol. 32 (1994) 2936–2943. [7] P.A.D. Grimont, Taxotron User’s Manual, Institut Pasteur, Paris, 2000. [8] F. Grimont, P.A.D. Grimont, Ribosomal ribonucleic acid gene restriction patterns as potential taxonomic tools, Ann. Inst. Pasteur/Microbiol. 137 B (1986) 165–175. [9] F. Grimont, D. Chevrier, P.A.D. Grimont, M. Lefèvre, J.L. Guesdon, Acetylaminofluorene-labelled ribosomal RNA for use in molecular epidemiology and taxonomy, Res. Microbiol. 140 (1989) 447–454. [10] F. Grimont, M. Lefèvre, E. Ageron, P.A.D. Grimont, rRNA gene restriction paterns of Legionella species: A molecular identification system, Res. Microbiol. 140 (1989) 615–626. [11] K. Irino, F. Grimont, I. Casin, P.A.D. Grimont, the brazilian purpuric fever study group, rRNA gene restriction patterns of Haemophilus influenzae biogroup aegyptius strains associated with brazilian purpuric fever, J. Clin. Microbiol. 26 (1988) 1535–1538.

161

[12] F. Irlinger, J.L. Bergère, Use of conventional biochemical tests and analyses of ribotype patterns for classification of micrococci isolated from dairy products, J. Dairy Res. 66 (1999) 91–103. [13] S. Jorks, Differentiation of Rhodococcus species by ribotyping, J. Basic Microbiol. 36 (1996) 399–406. [14] S. Koblavi, F. Grimont, P.A.D. Grimont, Clonal diversity of Vibrio cholerae O1 evidenced by rRNA gene restriction patterns, Res. Microbiol. 141 (1990) 645–657. [15] J. Machado, F. Grimont, P.A.D. Grimont, Computer identification of Escherichia coli rRNA gene restriction patterns, Res. Microbiol. 149 (1998) 119–135. [16] S. Pignato, G.M. Giammanco, F. Grimont, P.A.D. Grimont, G. Giammanco, Molecular characterization of the genera Proteus, Morganella and Providencia by ribotyping, J. Clin. Microbiol. 37 (1999) 2840–2847. [17] W.H. Press, B.P. Flannery, S.A. Teukolsky, W.T. Vetterling, Numerical Recipes, Cambridge Univ. Press, Cambridge, 1986. [18] B. Regnault, F. Grimont, P.A.D. Grimont, Universal ribotyping method using a chemically labelled oligonucleotide probe mixture, Res. Microbiol. 148 (1997) 649–659. [19] H.E. Schaffer, R.R. Sederoff, Improved estimation of DNA fragment lengths from agarose gels, Anal. Biochem. 115 (1981) 113–122. [20] P.H.A. Sneath, V.G. Collins, A study in test reproducibility between laboratories: Report of a Pseudomonas working party, Antonie van Leeuvenhoek 40 (1974) 481–527. [21] P.H.A. Sneath, R. Johnson, The influence on numerical taxonomic similarities of errors in microbiological tests, J. Gen. Microbiol. 72 (1972) 377–392. [22] R.R. Sokal, F.J. Rolf, Biometry, Freeman, San Francisco, 1969. [23] E.M. Southern, Detection of specific sequences among DNA fragments separated by gel electrophoresis, J. Mol. Biol. 98 (1975) 503– 517. [24] T. Stull, J.J. LiPuma, T.H. Pennington, A broad-spectrum probe for molecular epidemiology of bacteria: Ribosomal RNA, J. Infect. Dis. 157 (1988) 280–286. [25] A. van Belkum, J. Kluytmans, W. van Leeuwen, R. Bax, W. Quint, E. Peters, A. Fluit, C. Vandenbroucke-Grauls, A. van den Brule, H. Koeleman, et al., Multicenter evaluation of arbitrarily primed PCR for typing of Staphylococcus aureus strains, J. Clin. Microbiol. 33 (1995) 1537–1547. [26] A. van Belkum, W. van Leeuwen, M.E. Kaufmann, B. Cookson, F. Forey, J. Etienne, R. Goering, F. Tenover, C. Steward, F. O’Brien, W. Grubb, P. Tassios, N. Legakis, A. Morvan, N. El Sohl, R. De Ryck, M. Struelens, S. Salmenlinna, J. Vuopio-Varkila, M. Kooistra, A. Talens, W. Witte, H. Verbrugh, Assessment of resolution and intercenter reproducibility of results of genotyping Staphylococcus aureus by pulse-field gel electrophoresis of SmaI macrorestriction fragments: A multicenter study, J. Clin. Microbiol. 36 (1998) 1653–1659.