Talanta 63 (2004) 527–532
Forward selection radial basis function networks applied to bacterial classification based on MALDI-TOF-MS Zhuoyong Zhang a,∗ , Dan Wang b , Peter de B. Harrington c , Kent J. Voorhees d , Jon Rees d a
Department of Chemistry, Capital Normal University, Beijing 100037, PR China Faculty of Chemistry, Northeast Normal University, Changchun 130024, PR China Department of Chemistry and Biochemistry, Ohio University Center for Intelligent Chemical Instrumentation, Ohio University, Athens, OH 45701-2979, USA d Department of Chemistry and Geochemistry, Colorado School of Mines, Golden, CO 80401, USA b
c
Received 28 July 2003; received in revised form 11 November 2003; accepted 17 November 2003 Available online 28 January 2004
Abstract Forward selection improved radial basis function (RBF) network was applied to bacterial classification based on the data obtained by matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDI-TOF-MS). The classification of each bacterium cultured at different time was discussed and the effect of parameters of the RBF network was investigated. The new method involves forward selection to prevent overfitting and generalized cross-validation (GCV) was used as model selection criterion (MSC). The original data was compressed by using wavelet transformation to speed up the network training and reduce the number of variables of the original MS data. The data was normalized prior training and testing a network to define the area the neural network to be trained in, accelerate the training rate, and reduce the range the parameters to be selected in. The one-out-of-n method was used to split the data set of p samples into a training set of size p−1 and a test set of size 1. With the improved method, the classification correctness for the five bacteria discussed in the present paper are 87.5, 69.2, 80, 92.3, and 92.8%, respectively. © 2003 Elsevier B.V. All rights reserved. Keywords: Radial basis function network; Matrix-assisted laser desorption/ionization; Time-of-flight; Mass spectrometry; Bacterium; Classification
1. Introduction The interest in studies dealing with the characterization of bacteria has been increased in recent years [1,2]. There are two major goals for the identification of microorganisms. The main goal of microorganism identification studies is to elucidate reproducible genus-, species-, and strain-specific ‘finger-print’ for any given organism [3]. The chemical identification of microorganisms can provide useful information for biological studies. Since the paper of Anhalt and Fenselau [4], applicability of mass spectrometry for bacterial identification has been under investigation. The matrix-assisted laser desorption/ionization time-of-flight mass spectrometry (MALDITOF-MS) can be obtained from whole cells of bacteria [5–7]. This makes the analyses simpler and much faster and ∗
Corresponding author. Fax: +86-10-68902320. E-mail address: Zhuoyong
[email protected] (Z. Zhang).
0039-9140/$ – see front matter © 2003 Elsevier B.V. All rights reserved. doi:10.1016/j.talanta.2003.11.034
the MALDI-TOF-MS has become a valuable tool for bacterial identification. Some algorithms have been developed for bacterial identification based on comparing the MALDITOF-MS spectra of the bacteria with reference spectra [8,9]. MALDI-TOF-MS has a high sensitivity to very small changes in chemical composition, so the reproducibility of the spectra is a critical factor in comparing bacteria spectra. Even the experimental conditions were controlled carefully, the spectra produced from identical bacterium are rarely the same. This enhances the difficulties in bacterial classification based on comparison of spectra from references and unknowns. Artificial neural networks (ANN) have been proved as a good tool to extract useful information and reveals inherent relationship from mass, complicated data or vague, incomplete information. The radial basis function (RBF) networks came into fashion since Broomhead and Lowew’s seminal paper in 1988 [10]. RBF network is a type of ANN for application to problems of supervised learning, such
528
Z. Zhang et al. / Talanta 63 (2004) 527–532
as regression [11–14], pattern recognition [15–17], etc. A temperature-constrained cascade correlation network (TCCCN) has been used for bacterial identification based on MALDI-TOF-MS [18]. RBF networks are local approximation networks. It has superiority in approximation, classification and learning speed and the structure of RBF networks is simple. In a previous study, a conventional RBF network has been applied to differentiate bacteria cultured at different times (24, 48, and 72 h, respectively) based on the data produced by MALDI-TOF-MS [19]. The use of an improved RBF network to solve the identical problem was presented in this paper. The improved method involves forward selection is brought up by Orr and can be called as forward selection radial basis function (FSRBF) network [11,20].
2. Theoretical bases 2.1. RBF network A typical RBF network is a three-layer network: a layer of input neurons feeding the input vectors into the network; a single hidden layer of RBF neurons calculating the outcome of the basis functions; and a layer of output neurons calculating a linear combination of the basis function. The number of input neurons should be the same as the number of input variables. The RBF networks discussed in this paper have only one neuron in the output layer. RBF networks are often used to solve problems of supervised learning. Supervised learning is to guess or estimate a function from some example input–output pairs, p {(xi , yi )}i=1 , with little or no knowledge of the form of the function. The function is learned from the samples that a teacher supplies. The training set contains elements that consist of paired values of the input and the output. The function relation between the input (x) and output (y) is y = f(x), where x is a vector and y is a scalar. The units in the input layer do not process the information, and they only distribute the input variables to the hidden layer. Thus, the RBF network can also be considered as a two-layer network. The RBF network has a single hidden layer, the network model f(x) =
m
wj hj (x),
j=1
is linear in the hidden-to-output weights {wj }m j=1 , and m is the number of hidden units. This linear character makes the approach, keep the mathematics simple and the computation relatively cheap. The characteristic feature of RBF is the radial nature of the hidden unit transfer functions, {hj }m j=1 , which depends
only on the distance between the input x and the center cj of each hidden unit, scaled by a metric Rj , hj = φ((x − cj )T R−1 j (x − cj )), where φ is some function which is monotonic for nonnegative numbers. Traditionally, attention was restricted to diagonal metrics and Gaussian basis functions, so the transfer functions can be written as p (xi − cji )2 hj = exp − , r2ji i=1 where r ji is the radius vector of the jth hidden unit (j is the jth hidden unit and i is the ith sample). A radial unit is defined by its center point and radius. The hidden units respond (nonlinearly) to the distance of points from the center represented by the radial unit. The nearer is the distance from the sample to the center, the stronger the response of the radial unit. 2.2. FSRBF network An intractable problem often met in RBF network applications is the choice of centers, which affect the complexity and the performance of a network greatly. If too few centers were used, the network may not be capable of generating a good approximation to the target function. However, with too many centers the network may overfit the data and it may fit misleading variations due to imprecise or noisy data. Forward selection radial basis function networks use the forward selection approach to determine the centers of RBF functions. Forward selection [10,11] is a direct approach to control model complexity and to select a subset of centers from a larger set that consists of all the input samples. If the larger set was used entirely, the network model may overfit the data because the network model is too complex. The input data vectors in the training set were used as the candidate centers. The method starts with an empty network and then adds one neuron to the hidden layer at a time, which is an incremental operation. The key equation is P m+1 = P m −
P m f J f TJ P m f TJ P m f J
,
which expresses the relationship between P m , the projection matrix for the m hidden units in the current subset, and P m+1 , the succeeding projection matrix if the Jth member of the full set is added. The vectors {f J }M J=1 are the columns of the design matrix for the entire set of candidate basis functions, F = {f 1 f 2 . . .f M }, for which M m. The responses of the m hidden units of a RBF network to the p inputs of the training set can be gathered together in a design matrix, H, which consists of H ij = hi (xj ).
Z. Zhang et al. / Talanta 63 (2004) 527–532
If the Jth candidate is chosen then f J is appended to the last column of H m , the design matrix of the current subset. This column is renamed H m+1 and the new design matrix is H m+1 . Using forward selection approach, the candidate unit which decreases sum-squared-error (SSE) most and had not already been selected at each step is chosen to be added to the current network. The expression of SSE is SSE =
p
(yˆ i − f(xi ))2 ,
i=1
for which yˆ i is equal to yi plus a small amount of unknown noise. When the error of network output reaches the pre-set error goal value in conventional RBF network the procedure of adding hidden neurons will stop. With the improved method (FSRBP network), to decide when to stop adding further neuron, the generalized cross-validation (GCV) is used as the model selection criterion (MSC) to calculate the prediction error during the training procedure and is 2 σˆ GCV =
pyˆ T P 2 yˆ . (trace (P))2
This is a quantity to estimate how well the trained network will perform in the future. Because of the radial character of the hidden unit and the linear character of the output unit, the training of the RBF network can be divided into two stages: (i) learning the centers and widths in the hidden layer; (ii) learning the connection weights from the hidden layer to the output layer.
3. Experimental Bacteria used in this work include (1) Bacillus lichenformis; (2) Bacillus sphaericus; (3) Bacillus cereus; (4) Bacillus subtilis; and (5) Staphylococcus aureus. Individual bacteria colonies were removed from the agar surface and spotted onto the MALDI sample prob. The dried cells were overlaid with a 1.5-l aliquot of matrix solution (12.5 mg of solid crystal dissolved in 1 ml of a 17% formic acid/33% acetonitrile/50% H2 O solution). The mass spectra were generated on a Voyager-DE STR (PerSeptive Biosystems, Inc., Framingham, MA) operating in positive linear mode. The following parameters were used: accelerating voltage 25 kV, grid voltage 90% of the accelerating voltage, extracting delay time of 750 ms, and low mass ion gate set to m/z 4000. The laser intensity (N2, 337) was set just above the ion generation threshold and mass spectra were acquired by averaging 100 laser spots. The MALDI-TOF mass spectra were measured in the mass range of m/z 4000–16,000, in which peaks from most characteristic biomarkers were included. The samples of each bac-
529
terium were classified into three classes according to culture times (24, 48, and 72 h, respectively). Wavelet transformation was used to compress the original data from 13,826 to 328 points to speed up the network training and reduce the number of variables of the original MS data. The wavelet transformation method can maintain the characteristic peaks of the original spectra. In addition, the data were normalized prior training and testing a network to define the area the neural network to be trained in, accelerate the training rate, and reduce the range the parameters to be selected in. For each bacterium there are p samples and the “leave one out” method was used to split the data set of p samples into a training set of size p−1 and a test set of size 1. In this method, leave one sample for prediction at a time, and the other samples are used for training set. For next testing, leave another sample that is different from the previous one as prediction sample. This procedure is repeated until every sample has been selected as the testing set for one time. Thus every sample of each bacterium was used as test sample for one time and as training sample for p−1 times. The primary goal of a neural network is to estimate the underlying function and to classify unknown samples into a specific class based on previous examples from each class. The whole network used to classify the bacteria consists of three sub-networks, each sub-net has one output unit corresponding to specific class. The discrete class labels of the training set outputs are given numerical values by interpreting the k-th class label as a probability of 1 that the example belongs to the class and a probability of 0 that it belongs to any other class. With the FSRBF network, there is only one neuron in the output layer, so the output of a network is a scalar. Differentiation was concluded from combining the three outputs of the network models. After training, each sub-net responds to the test with continuous value which can be interpreted as being proportional to class probability. The classification threshold was set to 0.5. If the output was greater than 0.5, the sample was assigned to the corresponding class. Combining the predicted values for one sample given out by the three models of one bacterium together, if one of the three outputs for one sample is over 0.5 and the other two are both less than 0.5, the sample was considered to be classified into a specific class. The root mean square error (RMSE) p 1/2 ˆ i − y i )2 i=1 (y RMSE = , p was used to evaluate the network performance at different radii for every bacterium. In this formula, yˆ i is the prediction value and yi is the true value. The algorithms were implemented in MATLAB 5.0 (Mathworks Inc.) in the present work. The internal functions for neural network in the MATLAB toolbox and a set of additional functions in order to improve RBF networks designed by Orr [20] were used.
530
Z. Zhang et al. / Talanta 63 (2004) 527–532 Table 2 Predicted results for the bacteria identifications with bias off forward selection RBF networks
4. Results and discussion 4.1. Effect of radius
Obtained identity
Radius reflects the size of the RBF unit, and thus affects the response of the network to an input directly. The effect of radius on RBF network performance was investigated in the present work. With small radii, the responses of networks are weak to all the samples because the hidden neurons are so narrow that most of the samples are far from the center. While relatively larger radii networks will give out strong responses to some of the samples in order to differentiate them. But if the radii were too large, the responses to all the samples will increase and become similar so that it is difficult to distinguish among the classes, which is caused by the large overlap of input regions of the radial basis neurons. The root mean standard error was used as a criterion for investigating the effect of radius of hidden neurons. Results show that the effect of the radii on RMSE, which indirectly reflects the effect to the output of the network, is very complicated. The RMSE do not monotonously decrease or increase with the increase of the radii, but there is a value or range of radii at which the RMSE is minimum in each case. The radius at which the RMSE is minimized for each figure and the optimum radius for each class of the bacteria are given in Table 1. 4.2. Effect of bias unit There is also an optional bias unit, which adds an extra term b to the network model f(x) = b +
m
wj hj (x),
j=1
f(x) =
m+1
wj hj (x).
j=1
Table 1 Optimum radii for each class of every bacterium
Non-classified
24 h*
48 h*
72 h*
Bacillus lichenformis 24 h 48 h 72 h
5 0 0
0 2 0
0 1 5
0 1 0
Bacillus sphaericus 24 h 48 h 72 h
2 0 0
0 3 0
0 0 3
1 2 2
Bacillus cereus 24 h 48 h 72 h
4 0 0
1 4 1
0 0 4
0 1 0
Bacillus subtilis 24 h 48 h 72 h
3 0 0
0 5 0
0 0 4
1 0 0
Staphylococcus aureus 24 h 5 48 h 0 72 h 0
0 5 1
0 0 3
0 0 0
When parameter bias is on, a bias unit is included, along with the candidate RBFs, so a bias unit will appear in the final network. Thus the MSC of the network may decrease when bias values are used in the RBF models. If Table 3 Predicted results for the bacteria identifications with bias on forward selection RBF networks Obtained identity
for which wj and b are the parameters that are optimized. This operation is equivalent to an extra basis function, which takes the value of 1 for all x and an extra weight with value b, so the model can be written as
Predicted identity
Predicted identity
Non-classified
24 h*
48 h*
72 h*
Bacillus lichenformis 24 h 48 h 72 h
4 0 0
0 1 0
0 1 5
1 2 0
Bacillus sphaericus 24 h 48 h 72 h
2 0 0
0 4 0
1 0 3
0 1 2
4 0 0
1 4 1
0 0 4
0 1 0
Bacillus sphaericus
Bacillus cereus
Bacillus subtilis
Staphylococcus aureus
Bacillus cereus 24 h 48 h 72 h
Bias off 24 h 2 48 h 1.5 72 h 7.5
8 8.5 10
5 8.5 3
7 3 1.5
8.5 5.5 7.5
Bacillus subtilis 24 h 48 h 72 h
3 0 0
0 5 0
0 0 4
1 0 0
Bias on 24 h 1.5 48 h 4.5 72 h 7
7.5 3 4
5 8.5 2
8.5 3 1.5
8 4.5 9
Staphylococcus aureus 24 h 5 48 h 0 72 h 0
0 5 0
0 0 3
0 0 1
Bacillus lichenformis
Z. Zhang et al. / Talanta 63 (2004) 527–532
531
Table 4 Comparison of classification correctness rates (%) obtained by traditional RBF and FSRBF
RBF FSRBF (bias off) FSRBF (bias on)
Bacillus lichenformis
Bacillus sphaericus
Bacillus cereus
Bacillus subtilis
Staphylococcus aureus
71.4 85.7 71.4
61.5 61.5 69.2
80 80 80
84.6 92.3 92.3
92.8 92.8 92.8
there is a bias unit in the final network, the network will be different from the one obtained with the bias unit off, thus the output of the RBF network for the same sample will differ.
While with parameter bias on, the classification accuracy for B. sphaericus and B. subtilis increased. The FSRBF network models without bias yielded significantly improved performance.
4.3. Classification 5. Conclusions The classification results for the bacteria under when bias is included and excluded from the model are given in Tables 2 and 3, respectively. When the bias is not used, the classification accuracy for the five bacteria was 85.7, 61.5, 80, 92.3, and 92.8%, respectively. Results indicate that all of the three outputs for some samples (no. 7 of B. lichenformis, nos. 1 and 6 of B. sphaericus, no. 9 of B. cereus and no. 2 of B. subtilis) are less than 0.5, so they were not classified into any of the three classes. Another case for which classification was not obtained is when multiple outputs exceed the classification threshold of 0.5 (nos. 5, 9, and 13 of B. sphaericus). There are also some samples classified into wrong classes (no. 8 of B. lichenformis, nos. 2 and 13 of B. cereus, and no. 11 of S. aureus). When the bias is included is the RBF model, the classification accuracy for the five bacteria were 71.4, 69.2, 80, 92.3, and 92.8%, respectively, and the estimated outputs are given in Table 3. There are some samples that were not classified (nos. 3, 6, and 9 of B. lichenformis, nos. 7, 9, and 10 of B. sphaericus, no. 9 of B. cereus, no. 2 of B. subtilis, and no. 11 of S. aureus) and some spectra were misclassified (no. 8 of B. lichenformis, no. 1 of B. sphaericus, and nos. 2 and 13 of B. cereus). The incorrect classifications may be caused by many factors. The major one may be the difference between individual samples of identical bacterium cultured at same and different times. Some microorganisms are subject to culture environment. A small variation in culture environment may cause changes in microorganism steins, and thus cause poor repeatability in MALDI-TOF-MS detection. 4.4. Comparison The classification results obtained by traditional RBF neural networks and improved RBF neural networks are given in Table 4. It can be seen that better results can be achieved with FSRBF approach. With parameter bias off, the FSRBF networks give out better results than conventional RBF networks for B. lichenformis and B. subtilis.
The improved RBF networks were applied to classify the bacteria cultured at different times based on MALDI-TOFMS spectra. With FSRBF networks, the accuracy of bacterial classification was improved over conventional RBF models. Moreover, with this improved method, for the use of separate network models for different classes, there is an advantage of mixing models that include or exclude which could lead to better prediction results. However, owing to the complicated biological effects on bacterial growth, the spectral reproducibility of the microorganism remains a problem, more rigid control of bacterial culture conditions seems a critical factor for improving the accuracy for bacterial classifications.
References [1] F. Xiang, G.A. Anderson, T.D. Veenstra, M.S. Lipton, R.D. Smith, Anal. Chem. 72 (11) (2000) 2475–2481. [2] J.O. Lay Jr., Trends Anal. Chem. 19 (8) (2000) 507–516. [3] A.J. Saenz, C.E. Petersen, N.B. Valentine, S.L. Gantt, K.H. Jarman, M.T. Kingsley, K.L. Wahi, Rapid Commun. Mass Spectrom. 13 (1999) 1580–1585. [4] J.P. Anhalt, C. Fenselau, Anal. Chem. 47 (1975) 219–225. [5] R.J. Arnold, J.P. Reilly, Rapid Commun. Mass Spectrom. 12 (1998) 630–636. [6] R.D. Holland, C.R. Duff, F. Rafli, J.B. Sutherland, T.M. Heinze, C.L. Holder, K. Voorhees, O.L. Jackson Jr., Anal. Chem. 71 (1999) 3226–3230. [7] R.J. Arnold, J.A. Karty, A.D. Ellington, J.P. Reilly, Anal. Chem. 71 (1999) 1990–1996. [8] K.H. Jarman, S.T. Cebula, A.J. Saenz, C.E. Petersen, N.B. Valentine, M.T. Kingsley, K.L. Wahl, Anal. Chem. 72 (2000) 1217– 1223. [9] P.A. Demirev, Y.-P. Ho, V. Ryzhov, C. Fenselau, Anal. Chem. 71 (1999) 2732–2738. [10] M.J.L. Orr, Introduction to radial basis function networks, 1996, http://www.anc.ed.ac.uk/∼mjo/papers/intro.ps. [11] M.J.L. Orr, J. Hallam, Int. J. Neural Syst. 10 (5) (2000) 397– 415. [12] Y.L. Loukas, Anal. Chim. Acta 417 (2000) 221–229. [13] J. Wang, J. Liu, Y. Pan, G. Chen, Comput. Appl. Chem. 17 (3) (2000) 262–266.
532
Z. Zhang et al. / Talanta 63 (2004) 527–532
[14] Y. Li, X. Huang, M. Sha, X. Meng, Chin. J. Chromatogr. 19 (2) (2001) 112–115. [15] A. Pulido, I. Ruisanchez, F.X. Rius, Anal. Chim. Acta 388 (1999) 273–281. [16] M. Blumenstein, B. Verma, A neural network for real-world postal address recognition, citeseer.nj.nec.com/9875.html. [17] Z.L. Guo, Z.Y. Liu, Res. Environ. Sci. 10 (4) (1997) 1–5.
[18] Z. Zhang, A. Urbas, P.B. Harrington, K.J. Voorhees, J. Rees, Chem. J. Chin. Univ. 23 (2002) 570–572. [19] Z. Zhang, D. Wang, P.B. Harrington, K.J. Voorhees, J. Rees, Chem. Res. Chin. Univ. 18 (2002) 453–457. [20] M.J.L. Orr, Matlab functions for radial basis function networks, 1999, http://www.anc.ed.ac.uk/∼rnjo/sotware/rbf2.zip.