FT-IR spectroscopy for identification of closely related lactobacilli

FT-IR spectroscopy for identification of closely related lactobacilli

Journal of Microbiological Methods 59 (2004) 149 – 162 www.elsevier.com/locate/jmicmeth FT-IR spectroscopy for identification of closely related lact...

331KB Sizes 1 Downloads 33 Views

Journal of Microbiological Methods 59 (2004) 149 – 162 www.elsevier.com/locate/jmicmeth

FT-IR spectroscopy for identification of closely related lactobacilli Astrid Ousta,b,*, Trond Mbretrbb, Carolin Kirschnerb Judith A. Narvhusa, Achim Kohlerb Department of Chemistry, Biotechnology and Food Science, Agricultural University of Norway, P.O. Box 5003, N-1432 A˚s, Norway b Norwegian Food Research Institute, Matforsk AS, Osloveien 1, N-1430 A˚s, Norway

a

Received 3 February 2004; received in revised form 13 June 2004; accepted 14 June 2004 Available online 11 August 2004

Abstract Fourier Transform Infrared (FT-IR) spectroscopy was used to analyse 56 strains from four closely related species of Lactobacillus, L. sakei, L. plantarum, L. curvatus and L. paracasei. Hierarchical Cluster Analysis (HCA) was used to study the clusters in the data, but in the dendrogram, the spectra were not differentiated into four separate clusters corresponding to species. When the data were analysed with Partial Least Squares Regression (PLSR), the strains were differentiated into four clusters according to species. It was also possible to recognise strains that were incorrectly identified by conventional methods prior to the FT-IR analysis. PLSR was used to identify strains from three of the species, and the results were compared to two other multivariate methods, Soft Independent Modelling of Class Analogy (SIMCA) and K-Nearest Neighbour (KNN). The three methods gave equally good identification results. The results show that FT-IR spectroscopy in combination with PLSR, or other multivariate methods, is well suited for identification of Lactobacillus at the species level, even in quite large data sets. D 2004 Elsevier B.V. All rights reserved. Keywords: FT-IR spectroscopy; Lactic acid bacteria; Identification; Partial Least Squares Regression (PLSR)

1. Introduction Rapid and reliable identification and characterisation of lactic acid bacteria are important since these microorganisms are widely used as starter cultures in fermentation processes in the food industry. In the * Corresponding author. Mailing address: Norwegian Food Research Institute, Matforsk AS, Osloveien 1, N-1430 2s, Norway. Tel.: +47 64 97 01 00; fax: +47 64 97 03 33. E-mail address: [email protected] (A. Oust). 0167-7012/$ - see front matter D 2004 Elsevier B.V. All rights reserved. doi:10.1016/j.mimet.2004.06.011

food industry, phenotypic methods like API (BioMerieux, MO, USA) are commonly used for identification of Lactobacillus sp. Major disadvantages with these methods are that they may give misleading results if applied to unknown bacteria, and that they are quite time-consuming. For identification of Lactobacillus at the subspecies level, genetic fingerprinting techniques like Amplified Fragment Length Polymorphism (AFLP) can be used. These methods are tedious and need highly skilled staff, which limits their use in the industry.

150

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

Fourier Transform Infrared (FT-IR) spectroscopy has been used to analyse bacteria since the 1980s. The technique measures the total composition of bacterial cells in a nondestructive manner, producing an IR spectrum with bands from all cellular components, e.g., membrane and cell wall components, proteins and nucleic acids. The IR spectrum can therefore be considered as a fingerprint of the microorganism under study (Naumann et al., 1991). Various studies have shown that FT-IR spectra can be used to differentiate and identify a number of microorganisms at different taxonomic levels (Mariey et al., 2001). Because FT-IR spectroscopy is rapid, easy to use and cost-effective, the technique has a great potential for identification of Lactobacillus sp. in the food industry, but there are few reports on this subject. Curk et al. (1994) studied lactobacilli found in breweries while Amiel et al. (2000) analysed lactic acid bacteria involved in the soft cheese industry. Both studies gave good results, but the analysis was based on only a few strains from each of many species. Obviously, it is more difficult to identify species if there are many strains within each species than if there are few. A study of many strains from each species will therefore provide a more realistic view of the usefulness of FT-IR spectroscopy for identification of lactic acid bacteria in the food industry. The previous studies of identification of lactic acid bacteria with FT-IR spectroscopy focused on microorganisms from beer and soft cheese. Lactobacilli are also found in meat, and L. sakei and L. curvatus are two typical lactobacilli naturally found in meat which are also used as starter cultures in meat products (Hugas et al., 1993). These species have not been studied with FT-IR spectroscopy before. Since the amount of data is large, identification of microorganisms based on their FT-IR spectra requires use of multivariate statistical methods. To study the clustering of FT-IR spectra from microorganisms, Hierarchical Cluster Analysis (HCA) (Helm et al., 1991), Principal Component Analysis (PCA) (Kansiz et al., 1999) and Correspondence Analysis (CA) (Naumann et al., 1988) have been used. For identification of bacteria based on their FT-IR spectra, comparison of spectral similarity based on calculation of the Pearson’s product

moment correlation coefficient (Naumann et al., 1991), Canonical Variate Analysis (CVA) (Lefier et al., 1997), Linear Discriminant Analysis (LDA) (Maquelin et al., 2003), Soft Independent Modelling of Class Analogy (SIMCA) (Kansiz et al., 1999), K-Nearest Neighbour (KNN) (Kansiz et al., 1999) and Artificial Neural Networks (ANN) (Goodacre et al., 1996) have been used. So far, there are no studies of the use of multivariate regression techniques, like Partial Least Squares Regression (PLSR), for identification of microorganisms based on their FT-IR spectra. One big advantage with PLSR is that it is searching for the variation in the data matrix X that is best suited for prediction of the reference data Y. That is, PLSR will search for the variation in the FT-IR spectra that is best suited for separation of the various microorganisms. The main purpose of this study was to evaluate the potential of using FT-IR spectroscopy in combination with PLSR for identification of closely related lactobacilli from meat and cheese at the species level. To evaluate the quality of the identification with PLSR, the results were compared to the identification results from several other multivariate methods, SIMCA and KNN.

2. Materials and methods 2.1. Strains Fifty-six strains of Lactobacillus, mostly isolated from meat and cheese, were analysed with FT-IR spectroscopy (Table 1). Twenty-eight of the strains were L. sakei, 13 strains were L. plantarum, 10 strains were L. curvatus and five were L. paracasei. Prior to this study, eight of the strains were incorrectly identified (Table 1). The FT-IR analysis led to reidentification of these strains. 2.2. Growth conditions The strains were cultivated aerobically on MRS agar (Oxoid, Basingstoke, UK) at 30 8C for 48 h. The strains were streaked out using the three-quadrant streak pattern. Three independent parallels of each strain were prepared.

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

151

Table 1 The 56 strains of Lactobacillus used in this study Species

Strain

Runa

Isolated from

Identified by

Reference

L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L. L.

23K 64F MF473 MF458 MF468 MF378 MF419 MF1058 MF1060 MF1180 MF1186 Lb790c Lb790Xc Lb706c Lb706Xc MF1053 Lb16 LTH673 MF459 FCd MF1265 DSMZ20017 VB286A MF1234 Lb674 MI401 HJ5h CCUG34545 h NC7 NC8 L2-1 LM-3 ATCC8014 ATCC14917 j DSMZ20174 j NCDO1752j PPLX k L-74 LM-1h 2756 INFLb15D MF368 MF379 MF411 MF432 NCDO2739 l DSMZ20019 l NCIMB9710 l CCUG31332 LS-3 INFLb532 INFLb59m

1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2 2 2 2 2 2 2 2 1, 1, 1, 1, 1, 2 2 2 2 2 2 2 2 1, 1, 1, 1, 2 2 2 2 2 2 2

Meat Meat Loin of pork Loin of pork Loin of pork Loin of pork Loin of pork Cooked meat Cooked meat Smoked salmon Smoked salmon Meat Meat Meat Meat Fermented fish Meat Meat Meat Starter culture Cooked meat Culture collection Vacuum-packed meat Minced meat Vacuum-packed lamb Sourdough Starter culture Culture collection Starter culture silage Silage Starter culture Starter culture Culture collection Culture collection Culture collection Culture collection Starter culture Starter culture Starter culture Cheese Cheese Meat Meat Meat Meat Culture collection Culture collection Culture collection Culture collection Commercial starter culture Cheese Cheese

AFLPb/API AFLP/API AFLP/16S rDNA/API AFLP/16S rDNA/API AFLP/16S rDNA/API AFLP/16S rDNA/API AFLP/16S rDNA/API AFLP/API AFLP/API AFLP/API AFLP/API AFLP/API AFLP/API AFLP/API AFLP/API AFLP/API AFLP/API AFLP/API AFLP/16S rDNA/API AFLP/API AFLP/API –e

(Berthier et al., 1996) (Berthier et al., 1996) (Nissen and Dainty, 1995) (Nissen and Dainty, 1995) (Nissen and Dainty, 1995) (Nissen and Dainty, 1995) (Nissen and Dainty, 1995) (Bredholt et al., 1999) (Bredholt et al., 1999) – – (Schillinger and Lu¨cke, 1989) – (Axelsson et al., 1993) (Axelsson et al., 1993) – (Holck et al., 1994) (Tichaczek et al., 1992) (Nissen and Dainty, 1995) – (Bredholt et al., 1999) –

f



(Schillinger and Lu¨cke, 1987) (Schillinger and Lu¨cke, 1987)

(Holck et al., 1994) (Holck et al., 1994) (Larsen et al., 1993) – – – (Aukrust and Blom, 1992) – – – – – – – – – – – (Nissen and Dainty, 1995) (Nissen and Dainty, 1995) (Nissen and Dainty, 1995) (Nissen and Dainty, 1995) – – – – – – –

sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei sakei plantarum plantarum plantarum plantarum plantarum plantarum plantarum plantarum plantarum plantarum plantarum plantarum plantarum curvatus curvatus curvatus curvatus curvatus curvatus curvatus curvatus curvatus curvatus paracasei

2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2

2 2 2 2 2

2 2 2 2

g

PCRi PCR AFLP/API AFLP/API AFLP/API AFLP/API – – – – AFLP/API RAPD 16S rDNA 16S rDNA/PCR/API 16S rDNA/PCR/API 16S rDNA/API 16S rDNA/API 16S rDNA/API 16S rDNA/API – – – – – PCR/API PCR

(continued on next page)

152

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

Table 1 (continued) Species L. L. L. L.

paracasei paracasei paracasei paracasei

Strain m

INFLb90 INFLb458m INFLb486m INFLb530m

Runa

Isolated from

Identified by

Reference

2 2 2 2

Cheese Cheese Cheese Cheese

PCR PCR PCR 16S rDNA

– (astlie et al., 2004) (astlie et al., 2004) –

a

1:analysed in run 1; 2:analysed in run 2. AFLP: Amplified Fragment Length Polymorphism. c L. sakei 790X and L. sakei 706X are L. sakei 790 and L. sakei 706, respectively, cured for plasmids. d FC=Flora-Carn L2, Chr. Hansen, Hbrsholm, Denmark. e Identification method is not given for strains from culture collections or strains from commercial starter cultures. f VB286A from J. Coventry, Food Science Australia, Verribee, VIC, Australia. g Identification method not known. h Previously classified as L. curvatus. i Species-specific PCR. j Originally same isolate which has been distributed to several strain collections. k PPLX is originally a culture consisting of three species, including L. plantarum. L. plantarum PPLX is a pure culture isolated from the PPLX mixture. l Originally same isolate which has been distributed to several strain collections. m Previously classified as L. curvatus by API. b

2.3. Experimental setup Two separate experiments were carried out: Run 1: Analysis of 29 strains from three species. Run 2: Analysis of 56 strains from four species. Twenty-nine of these strains had been analysed previously in run 1. In addition, 27 new strains were analysed in run 2. Information about the strains analysed in the different runs is given in Table 1. The experiments were carried out ca 4 months apart. 2.4. Sample preparation and FT-IR measurements The sample preparation was carried out as described previously (Helm et al., 1991), with some modifications. After the cultivation on agar plates, a platinum loop was used to remove bacterial biomass from the third quadrant of each agar plate, and the biomass was dissolved in 100 Al distilled water. The concentration of the suspension was adjusted so that the intensity of the amide I band (1655 cm-1) in the IR spectrum was between 0.35 and 1.25, which is within the linear range of the DTGS detector. Of each suspension, 35 Al was transferred to an IR transparent optical crystal (ZnSe) in a multisample cuvette (Bruker Optics, Germany). The samples

were dried under moderate vacuum (0.1 bar) using anhydrous Silica Gel (Prolabo, France) in a desiccator to form films suitable for FT-IR analysis. The FT-IR measurements were performed with a Biomodule (Bruker Optics, Germany), specially designed for measurements of microorganisms, coupled to an Equinox 55 spectrometer (Bruker Optics, Germany). The spectra were recorded in the region between 4000 and 500 cm 1 with a spectral resolution of 6 cm 1 and an aperture of 5.0 mm. For each spectrum, 64 scans were averaged. 2.5. Preprocessing of data The quality of each spectrum was assessed using a quality test in the Opus 4.0 software (Bruker Optics, Germany). The main purpose of the quality test is to make sure that the absorption in the spectrum is within the linear range of the DTGS detector, and that the signal to noise ratio in the spectrum is acceptable. In order to remove variations due to baseline shifts, the first derivative of every spectrum was calculated using a Savitzky–Golay algorithm with nine smoothing points (Helm et al., 1991) (Savitzky and Golay, 1964). In addition, each spectrum was vector normalised in the region from 1780 to 720 cm 1. This was done in order to remove variation due to differences in the biomass between the samples (Helm and Naumann, 1995). Preprocessing of the spectra by deriva-

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

tion and vector normalisation was chosen because this method is much used and well suited for FT-IR spectra of microorganisms used for differentiation and identification (Mariey et al., 2001). 2.6. Methods for data analysis In order to evaluate the natural clusters in the spectral data, Hierarchical Cluster Analysis (HCA) was used. The HCA was carried out using Ward’s algorithm (Ward, 1963) with scaling to first range. This algorithm is much used for HCA of bacterial FTIR spectra (Mariey et al., 2001). The main method for identification of Lactobacillus at the species level based on the FT-IR spectra was Partial Least Squares Regression (PLSR). The results from identification with PLSR were compared to identification results from Soft Independent Modelling of Class Analogy (SIMCA) and K-Nearest Neighbour (KNN), since SIMCA and KNN have been used for identification of microorganisms based on FT-IR spectroscopy before (Kansiz et al., 1999). In PLSR, PLS components are extracted from a data matrix X in order to explain as much as possible of the variation in a reference matrix Y while at the same time accounting for the variation in the data matrix X (N&s et al., 2002). In this case, the data matrix X contained the bacterial FT-IR spectra while the reference matrix Y contained 0 and 1 as identifiers for the different Lactobacillus species. When PLSR is used for discrimination, it is also called discriminant PLSR. The optimal number of PLS components is found by calculating the minimum Root Mean Square Error (RMSE) for each PLS component and controlling subsequently if the change in RMSE for the preceding components is substantial. RMSE is calculated by cross-validation. The significant variables in the regression are found by use of jack-knifing (Westad and Martens, 2000). To identify new samples, the PLSR model is used to predict the y-value of the new samples based on their x-data. The y-value for each class is calculated separately. In this study, a sample was assigned to a class if the y-value of the new sample was above 0.5. If the y-value was below 0.5, it did not belong to the class. In SIMCA, a Principal Component Analysis (PCA) for every class in a calibration set is performed (Martens and Martens, 2001). The sample

153

that is to be identified is compared to each of the PCA models, and it is decided to which class or classes the new sample belongs. The decision of whether a sample belongs to a PCA model or not is based on the distance from the new sample to the center of the model and the distance from the new sample to the model. In KNN, the distance from the unknown sample to every calibration sample is calculated (Ripley, 1996). Thereafter, one finds a defined number, K, of nearest calibration samples. The classes of the K nearest samples are used to decide to which class the unknown sample belongs. That is, if the calibration set consists of two classes, I and II, and K is 3, and two of the nearest neighbours to the unknown sample belong to class I, while one belongs to class II, the unknown sample is assigned to class I. In this study, K equalled 3. The identification was performed using the principal components of the calibration set. Note that in PLSR and SIMCA, one sample may be assigned to more than one species, or it may not be assigned to any species at all. In this study, only the percentage of correct and the percentage of incorrect identified samples are given. Therefore, in PLSR and SIMCA, the percentage of correctly and the percentage of incorrectly identified samples do not always add up to 100%. In order to determine how well the models can predict future samples, they were validated. Validation of the PLSR models for identification of the species was carried out by cross-validation and test set validation (N&s et al., 2002). In cross-validation, objects are removed from the total data set and a submodel based on the remaining objects is calculated and used to predict the objects that were left out. Full cross-validation consists of removing only one object at a time. In this study, block-wise validation, where all parallels of one strain are removed at a time, was used. Full cross-validation would be more optimistic, since parallels from the same strain would be used to predict the object that was left out. Test set validation is performed by making a model on one set of strains, the calibration set, and then predicting another set of strains, the test set. Preferably, the strains in the test set should be sampled at another time than the strains in the calibration set. An even more stringent test set consists of strains that are different from the strains in the calibration set. In this study, two different

154

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

combinations of calibration and test sets were used for identification of strains at the species level: (A)

(B)

Calibration set: the 29 strains (three parallels) analysed in run 1. Test set: the same 29 strains (three parallels) analysed in run 2. Calibration set: the 29 strains (three parallels) analysed in run 1. Test set: the 27 new strains (three parallels) analysed in run 2.

Note that the identification was tested for only three of the four species. Since there were no L. paracasei strains in run 1, it was impossible to test the identification of this species. The PCA, PLSR and SIMCA analyses were carried out in Unscrambler 7.6 (Camo, Norway) while the HCA was carried out in Opus 4.0 (Bruker Optics, Germany). The KNN was performed using an inhouse developed algorithm according to Ripley (1996). Prior to PCA, PLSR, and SIMCA, the spectral data were weighted by dividing them by the standard deviation of the data for each wavenumber. This was

done because weighted data were shown to give more stable models and approximately the same number of correlation coefficients as unweighted data. 2.7. Species-specific PCR and 16S rDNA analysis The species-specific PCR (Berthier and Ehrlich, 1998; astlie et al., 2004) and the 16S rDNA analysis (Rudi et al., 2002; astlie et al., 2004) were performed as described previously.

3. Results In this study, all together 56 strains from four species of Lactobacillus were analysed with FT-IR spectroscopy for identification purposes. A typical FTIR spectrum of lactobacilli is shown in Fig. 1. The assignment of the spectral regions is given in the legend. In this work, the region between 1400 and 720 cm 1 was found to be best suited for identification of the Lactobacillus species based on their FT-IR spectra.

Fig. 1. A FT-IR spectrum of L. sakei 116. As indicated in the figure, FT-IR spectra of microorganisms are usually divided into five regions (Naumann et al., 1991). These regions contain information from different cell components: (1) 3000–2800 cm 1: fatty acids in the bacterial cell membrane; (2) 1800–1500 cm 1: amide bands from proteins and peptides; (3) 1500–1200 cm 1: mixed region: proteins and fatty acids; (4) 1200–900 cm 1: polysaccharides within the cell wall; (5) 900–500 cm 1: btrueQ fingerprint region containing bands which cannot be assigned to specific functional groups.

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

3.1. Differentiation of species In order to evaluate the possibility for identification of Lactobacillus at the species level based on FT-IR spectroscopy, the spectra were analysed with PLSR and HCA. When PLSR was used to analyse the FT-IR spectra of the 29 strains in run 1, the score plot showed that the strains formed three separate clusters corresponding to the three species in this run (results not shown). When the FT-IR spectra of the three parallels of the 56 strains in run 2 were first analysed with PLSR, the data set contained five strains of L. paracasei that was misidentified as L. curvatus by API. The score plot (Fig. 2) showed that the L. curvatus cluster was roughly divided into two subclusters. The correctly identified L. curvtaus strains clustered in the subcluster to the left in the figure while the misidentified strains clustered in the subcluster to the upper right in the figure. Because the five strains in the latter subcluster, INFLb59, INFLb90, INFLb458, INFLb486 and INFLb530, had only been identified with API prior to the FT-IR

155

analysis, they were selected for further identification with species-specific PCR. This led to reidentification of all the strains as L. paracasei. From the score plot, it could also be seen that one L. curvatus strain, LM-1, clustered together with the L. plantarum strains and that two L. curvatus strains, HJ5 and CCUG34545, clustered far into the L. sakei cluster, and because of this, these strains were also subject to further identification. L. curvatus LM-1 was analysed by 16S rDNA sequencing and then reidentified as L. plantarum with 99% identity. Lastly, speciesspecific PCR was used to reidentify L. curvatus HJ5 and CCUG34545 as L. sakei. Score plots from PLSR of the FT-IR spectra of the 56 strains after reidentification of the eight strains showed that the FT-IR spectra from the four Lactobacillus species were separated into four separate clusters (Fig. 3A and B). In this model, Principal Component (PC) 1 separated L. plantarum from the three other species, PC2 separated L. paracasei from the three other species while PC3 separated L. curvatus from the three other species.

Fig. 2. Score plot from the PLSR of the FT-IR spectra of the 56 strains that were analysed in run 2 before reidentification of eight strains. All three parallels of each strain were included in the regression. The L. curvatus cluster is roughly divided into two subclusters. The five strains in the subcluster to the upper right were reidentified as L. paracasei. The L. curvatus strain that clustered together with L. plantarum was reidentified as L. plantarum. The two L. curvatus strains that clustered far into the L. sakei cluster were reidentified as L. sakei. The explained variances in X by PC1 and PC2 were 28% and 9%, respectively.

156

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

Fig. 3. (A) PC2 plotted against PC1 and (B) PC3 plotted against PC2 for PLSR of the FT-IR spectra of the 56 strains that were analysed in run 2 after reidentification of eight strains. All three parallels of each strain were included in the regression. The explained variances in X by PC1, PC2 and PC3 were 28%, 12% and 7%, respectively.

In order to investigate whether unsupervised methods would also lead to differentiation of the FTIR spectra into species clusters, the data were analysed with HCA. The dendrogram from HCA of the average

spectra of the 29 strains in run 1 and the average spectra of the same 29 strains in run 2 showed that in this case, the three species, L. sakei, L. plantarum and L. curvatus, clustered according to species (Fig. 4).

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

157

Fig. 4. Dendrogram from the HCA of the average spectra of the 29 strains in run 1 and the average spectra of the same 29 strains in run 2. Three spectral regions were used in the clustering: 1400–1200, 1200–900 and 900–720 cm 1. The regions were equally weighted.

Furthermore, the dendrogram showed that the two average spectra of the same strain, which were sampled ca four months apart, clustered together for almost all strains. When the average spectra from the 56 strains in run 2 were analysed with HCA, the four species were not separated (results not shown). In the latter dendrogram, all the L. plantarum strains clustered together in one group while all the L. sakei strains clustered together in another group. Of the L. curvataus strains, some clustered together in a separate

group, while some clustered together with the L. sakei strains. All five L. paracasei strains clustered together with the L. sakei strains. 3.2. Identification of species Three methods, PLSR, SIMCA and KNN, and two different combinations of test and calibration sets, A and B, were used to identify Lactobacillus at the species level based on their FT-IR spectra. The number

158

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

Table 2 The number of principal components used in the PLSR models and the PCA models for SIMCA PLSR

L. sakei L. plantarum L. curvatus

SIMCA

Run 1

Run 2

5 5 5

4 4 4

Run 1

Run 2

First model

Improved model

First model

Improved model

7 5 4

6 – 3

10 5 6

6 – 5

One way to improve the identification of L. sakei and L. curvatus with SIMCA was to run PLSR with jack-knifing for these two species in order to select the variables that were most important for differentiation of the species. A PCA of the significant variables for each of the two species was carried out, and the PCA models were used to identify the samples with SIMCA. When the same two combinations of calibration and test sets as above were used to validate the models, the results were as follows (Table 3A and B):

of PCs used in the PLSR models and the PCA models for SIMCA is given in Table 2. The results from the identifications using the two combinations of test and calibration sets, A and B, are given in Table 3A and B, respectively. Note that only the percentages of correctly and incorrectly identified samples are given. (A)

(B)

By use of PLSR and KNN, all the samples were assigned to the correct species. SIMCA enabled identification of most strains of L. plantarum and L. curvatus. The identification result for L. sakei was also satisfactory, but some of the samples were assigned to more than one species. For either of the methods, none of the samples was misidentified. All three methods allowed identification of many samples of both L. sakei and L. plantarum. For L. curvatus, only a few strains were identified correctly. In PLSR and KNN, most of the L. curvatus samples were identified as L. sakei. Using SIMCA, the L. curvatus samples were identified as L. sakei or as both L. sakei and L. curvatus.

(A)

(B)

The percentage of correctly identified samples was increased for both species, and no samples were misidentified. L. plantarum was not identified as L. sakei or L. curvatus with the new models. For L. sakei, the percentage of correctly identified samples increased and no samples were misidentified. For L. curvatus, none of the samples was identified correctly, and the same number of samples was misidentified as with the old models. L. plantarum was not identified as L. sakei or L. curvatus with the new models.

Table 3 Results from identification of Lactobacillus with PLSR, SIMCA and KNN [Table A and B give the results from the two different combinations of calibration and test sets, A and B, respectively] Number of PLSR KNN SIMCA SIMCAa samples Correct [%] Incorrect [%] Correct [%] Incorrect [%] Correct [%] Incorrect [%] Correct [%] Incorrect [%] A Total L. sakei L. plantarum L. curvatus

87 60 15 12

100 100 100 100

0 0 0 0

100 100 100 100

0 0 0 0

78 72 93 92

0 0 0 0

– 98 – 100

– 0 – 0

B Total L. sakei L. plantarum L. curvatus

66 24 24 18

82 100 100 33

18 0 0 67

80 100 100 28

20 0 0 72

67 75 96 17

9 0 0 33

– 92 – 0

– 0 – 33

a

Identification results using PCA models based on significant variables from a PLSR for L. sakei and L. curvatus.

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

4. Discussion A typical FT-IR spectrum of lactobacilli and assignment of the various spectral regions are given in Fig. 1. In the data analysis, only those parts of the spectra with the most relevant variation for the given purpose should be used. The choice of spectral regions depends both on the purpose of the analysis and on the type of microorganism under examination. In this study, the spectral region between 1400 and 720 cm 1, which covers the mixed region, the polysaccharide region and the btrueQ fingerprint region, was found to be best suited for differentiation and identification of the four Lactobacillus species based on their FT-IR spectra. This is more or less in accordance with a range of previous studies where a number of microorganisms have been differentiated and identified based on their FT-IR spectra (Helm et al., 1991; Curk et al., 1994; Amiel et al., 2000). In order to evaluate the natural clusters in the data, the FT-IR spectra were analysed with HCA. When HCA was used to differentiate between the average spectra of the 29 strains analysed in both run 1 and run 2, the dendrogram (Fig. 4) showed that the FT-IR spectra clustered in groups corresponding to the three species. The dendrogram also showed that the average spectra from run 1 and the average spectra from run 2, which were recorded ca 4 months apart, clustered together for almost all strains. This indicates that the reproducibility of the methods is good, which is important if FT-IR spectroscopy is to be used for identification of Lactobacillus at the species level. When the number of strains was increased from 29 to 56 in run 2, the clusters in the dendrogram from the HCA (result not shown) consisted of strains from different species. HCA is suited for analysis of data when the variation in the data reflects the variation between the different groups of samples. The dendrogram from the latter HCA in this study indicates that HCA is not suited for analysis of FT-IR spectra of microorganisms with respect to differentiation of species, when the main variation in the spectra is not coupled to variation between the species. Another drawback with HCA is that it is difficult to validate the results. The results using HCA are in accordance with previous studies, where it has been shown that FT-IR spectroscopy and unsupervised methods were unable to differ-

159

entiate clearly between 14 species of Lactobacillus (Curk et al., 1994). One way to differentiate between the 56 strains in run 2 is to use PLSR, a supervised method that searches for the variation in the spectra that is best suited for differentiation of the species. The score plot from PLSR of the 56 strains in run 2 (Fig. 3A and B) indicates that it is possible to use FT-IR spectra to differentiate between the four species, L. sakei, L. plantarum, L. curvatus and L. paracsei. PLSR even made it possible to differentiate between L. sakei and L. curvatus, which are known to be very closely related, both genetically and phenotypically (Torriani et al., 1996). Amiel et al. (2000) used Discriminant Analysis (DA), which is another supervised method, to clearly differentiate between dairy lactic acid bacteria. The latter data set consisted of 20 species from five genera, but a maximum of three strains within each species, and because of this, the results are not directly comparable to the results in the present study, where the data set consisted of many strains from few species that are closely related. In addition to differentiation of the species, the score plot from the PLSR (Fig. 2) was also used to identify strains that were incorrectly identified by conventional methods prior to the FT-IR analysis. Five of the eight strains that were reidentified had only been identified with API prior to the FT-IR analysis. Today, API is widely used for identification of lactobacilli in the food industry. The results in the present study indicate that FT-IR spectroscopy, which is more reliable than API, may become an alternative to the use of API for identification of Lactobacillus. For identification of the Lactobacillus species with PLSR based on their FT-IR spectra, the data set was divided into two combinations of calibration and test sets, A and B. The results from the identification with PLSR were compared to identification results from SIMCA and KNN. PLSR and KNN gave good identification results for all samples except for identification of L. curvatus in combination B (Table 3B). The reason why the identification of L. curvatus failed was probably that the calibration set, which consisted of only four strains, represented only a small part of the natural variation within the L. curvatus species. Since the test set

160

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

consisted of samples from new strains, they fell outside the calibration models. The calibration set for L. plantarum was also small, consisting of only five strains, but in this case, all new samples were correctly identified. In this study, there are at least two reasons why the identification of L. curvatus failed while the identification of L. plantarum was successful. First, the L. plantarum calibration set covered the natural variation in the species better than the L. curvatus calibration set. Second, the problem caused by the small calibration set for L. curvatus was amplified because the L. curvatus and the L. sakei clusters bordered on each other while the L. plantarum cluster was well separated from the other species clusters (Fig. 3A and B). Using SIMCA, the identification of all species was satisfactory in most cases, but it was problematic to differentiate between L. sakei and L. curvatus. The identification of these two species was improved using selected variables (Table 3). The selection of variables also led to a reduction of the number of principal components, and thus to more stable PCA models. The improvement of the results shows that a careful selection of variables in the calibration set is advantageous. The identification results for all species will probably be improved if more strains are included into the calibration set. The identification results show that in this case, PLSR was well suited for identification of closely related Lactobacillus at the species level. KNN gave equally good results as PLSR, but one disadvantage with KNN, compared to PLRS, is that the identification is not validated. That is, KNN will always identify the unknown sample, but the method does not provide any information about how good the identification is. SIMCA did not give as good identification results as PLSR. One advantage of SIMCA compared to PLSR is that each species has its own model. This means that if new species are introduced, new models can be made without changing the models for the existing species. In previous studies of identification of lactic acid bacteria with FT-IR spectroscopy, the data sets have consisted of few strains from each of many species. Curk et al. (1994) identified 53 strains from 14 species isolated from beer while Amiel et al. (2000) studied 17 strains from five species from soft cheese.

In both studies, the samples were identified using the spectral distance between new FT-IR spectra and FTIR spectra in a spectral library. The results in the current study show that it is possible to use FT-IR spectroscopy for identification of closely related Lactobacillus even if there are many strains within each species. Amiel et al. (2000), also tested the identification of strains that were not included in the library. The identification was correct for 69% of the samples. In the current study, the identification of new samples with PLSR was correct for 82% of the samples. This identification result for PLSR will probably be improved when the calibration set for L. curvatus is increased. One advantage with PLSR compared to spectral distances is that identification is based on the variation in the data set that is best suited for differentiation between the species, and not the whole spectral range. The results from the present study were based on strains of lactic acid bacteria isolated mainly from meat and cheese, while other authors have used strains from beer and soft cheese. If the food industry wants to use FT-IR spectroscopy as a means for identification of lactobacilli, it is necessary to build libraries with reference spectra. These libraries will probably expand over time, and as more strains and species are included in the library, many of the species will probably be closely related. In this study, PLSR, SIMCA and KNN were shown to be well suited for identification of closely related species even if there are many strains within each species.

Acknowledgements The authors want to thank Hilde astlie (Agricultural University of Norway, Norway) for technical assistance and for providing us with strains, and Tove Maugesten (Matforsk AS, Norway) for technical assistance. We also want to thank Frank Westad, Kristine Naterstad and Lars Axelsson (Matforsk AS, Norway) for useful scientific discussions. Finally, we want to thank Dr. Herbert Seiler and Mareike Wenning (Technische Universit7t Mqnchen, Germany) for introducing us to the methods used for FT-IR analysis of microorganisms. This work was supported by The Fund for the Research Levy on Agricultural Products.

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

References Amiel, C., Mariey, L., Curk-Daubie, M.C., Pichon, P., Travert, J., 2000. Potentiality of Fourier transform infrared spectroscopy (FT-IR) for discrimination and identification of dairy lactic acid bacteria. Lait 80, 445 – 459. Aukrust, T., Blom, H., 1992. Transformation of Lactobacillus strains used in meat and vegetable fermentations. Food Res. Int. 25, 253 – 261. Axelsson, L., Holck, A., Birkeland, S.E., Aukrust, T., Blom, H., 1993. Cloning and nucleotide-sequence of a gene from Lactobacillus sake Lb706 necessary for sakacin A production and immunity. Appl. Environ. Microbiol. 59, 2868 – 2875. Berthier, F., Ehrlich, S.D., 1998. Rapid species identification within two groups of closely related lactobacilli using PCR primers that target the 16S/23S rRNA spacer region. FEMS Microbiol. Lett. 161, 97 – 106. Berthier, F., Zagorec, M., Champomier-Verges, M., Ehrlich, S.D., Morel-Deville, F., 1996. Efficient transformation of Lactobacillus sake by electroporation. Microbiol.-UK 142, 1273 – 1279. Bredholt, S., Nesbakken, T., Holck, A., 1999. Protective cultures inhibit growth of Listeria monocytogenes and Escherichia coli O157:H7 in cooked, sliced, vacuum- and gas-packaged meat. Int. J. Food Microbiol. 53, 43 – 52. Curk, M.C., Peladan, F., Hubert, J.C., 1994. Fourier-transform infrared (FT-IR) spectroscopy for identifying Lactobacillus species. FEMS Microbiol. Lett. 123, 241 – 248. Goodacre, R., Timmins, E.M., Rooney, P.J., Rowland, J.J., Kell, D.B., 1996. Rapid identification of Streptococcus and Enterococcus species using diffuse reflectance-absorbance Fourier transform infrared spectroscopy and artificial neural networks. FEMS Microbiol. Lett. 140, 233 – 239. Helm, D., Naumann, D., 1995. Identification of some bacterial cell components by FT-IR spectroscopy. FEMS Microbiol. Lett. 126, 75 – 80. Helm, D., Labischinski, H., Schallehn, G., Naumann, D., 1991. Classification and identification of bacteria by Fourier-transform infrared spectroscopy. J. Gen. Microbiol. 137, 69 – 79. Holck, A., Axelsson, L., Hqhne, K., Krockel, L., 1994. Purification and cloning of sakacin 674, a bacteriocin from Lactobacillus sake Lb674. FEMS Microbiol. Lett. 115, 143 – 150. Hugas, M., Garriga, M., Aymerich, T., Monfort, J.M., 1993. Biochemical characterization of Lactobacilli from dry fermented sausages. Int. J. Food Microbiol. 18, 107 – 113. Kansiz, M., Heraud, P., Wood, B., Burden, F., Beardall, J., McNaughton, D., 1999. Fourier transform infrared microspectroscopy and chemometrics as a tool for the discrimination of cyanobacterial strains. Phytochemistry 52, 407 – 417. Larsen, A.G., Vogensen, F.K., Josephsen, J., 1993. Antimicrobial activity of lactic acid bacteria isolated from sour doughs: purification and characterization of bavaricin A, a bacteriocin produced by Lactobacillus bavaricus MI401. J. Appl. Bacteriol. 75, 113 – 122. Lefier, D., Hirst, D., Holt, C., Williams, A.G., 1997. Effect of sampling procedure and strain variation in Listeria monocytogenes on the discrimination of species in the genus Listeria

161

by Fourier transform infrared spectroscopy and canonical variates analysis. FEMS Microbiol. Lett. 147, 45 – 50. Maquelin, K., Kirschner, C., Choo-Smith, L.P., Ngo-Thi, N.A., van Vreeswijk, T., St7mmler, M., Endtz, H.P., Bruining, H.A., Naumann, D., Puppels, G.J., 2003. Prospective study of the performance of vibrational spectroscopies for rapid identification of bacterial and fungal pathogens recovered from blood cultures. J. Clin. Microbiol. 41, 324 – 329. Mariey, L., Signolle, J.P., Amiel, C., Travert, J., 2001. Discrimination, classification, identification of microorganisms using FTIR spectroscopy and chemometrics. Vib. spectrosc. 26, 151 – 159. Martens, H., Martens, M., 2001. Multivariate Analysis of Quality: An Introduction. Wiley, Chichester. N&s, T., Isaksson, T., Fearn, T., Davies, T., 2002. A User-Friendly Guide to Multivariate Calibration and Classification. NIR Publications, Chichester. Naumann, D., Fijala, V., Labischinski, H., 1988. The differentiation and identification of pathogenic bacteria using FT-IR and multivariate statistical analysis. Mikrochim. ACTA 1, 373 – 377. Naumann, D., Helm, D., Labischinski, H., Giesbrecht, P., 1991. The characterization of microorganisms by Fourier-transform infrared spectroscopy. In: Nelson, W.H. (Ed.), Modern Techniques for Rapid Microbiological Analysis. VCH, New York, pp. 43 – 96. Nissen, H., Dainty, R., 1995. Comparison of the use of rRNA probes and conventional methods in identifying strains of Lactobacillus sake and L. curvatus isolated form meat. Int. J. Food Microbiol. 25, 311 – 315. astlie, H.M., Eliassen, L., Florvaag, A., Skeie, S., 2004. Phenotypic and PCR-based characterization of the microflora in Norvegia cheese during ripening. Int. J. Food Microbiol. 94, 287 – 299. Ripley, B.D., 1996. Pattern Recognition and Neural Networks. Cambridge University Press, Cambridge. Rudi, K., Flateland, S.L., Hanssen, J.F., Bengtsson, G., Nissen, H., 2002. Development and evaluation of a 16S ribosomal DNA array-based approach for describing complex microbial communities in ready-to-eat vegetable salads packed in a modified atmosphere. Appl. Environ. Microbiol. 68, 1146 – 1156. Savitzky, A., Golay, M.J.E., 1964. Smoothing and differentiation of data by simplified least squares procedures. Anal. Chem. 36, 1627 – 1639. Schillinger, U., Lqcke, F.K., 1987. Identification of lactobacilli from meat and meat products. Food Microbiol. 4, 199 – 208. Schillinger, U., Lqcke, F.K., 1989. Antibacterial activity of Lactobacillus sake isolated from meat. Appl. Environ. Microbiol. 55, 1901 – 1906. Tichaczek, P.S., Nissen-Meyer, J., Nes, I.F., Vogel, R.F., Hammes, W.P., 1992. Characterisation of the bacteriocins curvacin A from Lactobacillus curvatus LTH1174 and sakacin P from L. sake LTH673. Syst. Appl. Microbiol. 15, 460 – 468. Torriani, S., Van Reenen, C.A., Klein, G., Reuter, G., Dellaglio, F., Dicks, L.M.T., 1996. Lactobacillus curvatus subsp curvatus subsp nov and Lactobacillus curvatus subsp melibiosus subsp nov and Lactobacillus sake subsp sake subsp nov and Lactobacillus sake subsp carnosus subsp nov, new subspecies of Lactobacillus curvatus Abo-Elnaga and Kandler 1965 and

162

A. Oust et al. / Journal of Microbiological Methods 59 (2004) 149–162

Lactobacillus sake Katagiri, Kitahara, and Fukami 1934 (Klein et al. 1996, Emended Descriptions), respectively. Int. J. Syst. Bacteriol. 46, 1158 – 1163. Ward, J.H., 1963. Hierarchical grouping to optimize an objective function. J. Am. Stat. Assoc. 58, 236 – 244.

Westad, F., Martens, H., 2000. Variable selection in near infrared spectroscopy based on significance testing in partial least squares regression. J. Near Infrared Spectrosc. 8, 117 – 124.