no-growth interface

no-growth interface

International Journal of Food Microbiology 82 (2003) 233 – 243 www.elsevier.com/locate/ijfoodmicro A hybrid Bayesian–neural network approach for prob...

206KB Sizes 2 Downloads 113 Views

International Journal of Food Microbiology 82 (2003) 233 – 243 www.elsevier.com/locate/ijfoodmicro

A hybrid Bayesian–neural network approach for probabilistic modeling of bacterial growth/no-growth interface M.N. Hajmeer a,*, I.A. Basheer b a

Department of Population Health and Reproduction, School of Veterinary Medicine, University of California, Davis, CA 95616, USA b California Department of Transportation, Translab, 5900 Folsom Blvd., Sacramento, CA 95819, USA Received 15 October 2001; received in revised form 1 June 2002; accepted 26 June 2002

Abstract A hybrid probabilistic modeling approach that integrates artificial neural networks (ANNs) with statistical Bayesian conditional probability estimation is proposed. The suggested approach benefits from the power of ANNs as highly flexible nonlinear mapping paradigms, and the Bayes’ theorem for computing probabilities of bacterial growth with the aid of Parzen’s probability distribution function estimators derived for growth and no-growth (G/NG) states. The proposed modeling approach produces models that can predict the probability of growth of targeted microorganism as affected by a set of parameters pertaining to extrinsic factors and operating conditions. The models also can be used to define the probabilistic boundary (interface) between growth and no-growth, and as such can define and predict the values of critical parameters required to keep a desired pre-specified bacterial growth risk in check. A modular system incorporating the various computational modules was constructed to illustrate the application of the hybrid approach to the probabilistic modeling of growth of pathogenic Escherichia coli strain as affected by temperature and water activity. The proposed approach was compared to other techniques including the traditional linear and nonlinear logistic regression. Results indicated that the hybrid approach outperforms the other approaches in its accuracy as well as flexibility to extract the implicit interrelationships between the various parameters. Advantages and limitations of the approach were also discussed and compared to those of other techniques. D 2002 Elsevier Science B.V. All rights reserved. Keywords: Artificial neural networks; Bacterial growth; Bayesian probability estimation; Modeling; Parzen’s probability distribution

1. Introduction Artificial neural networks (ANNs) have recently begun to emerge as a novel and robust approach for modeling of complex nonlinear systems in a broad range of applications in science and engineering. The attractiveness of ANNs as empirical modeling schemes *

Corresponding author. Tel.: +1-530-754-7373; fax: +1-530752-5845. E-mail address: [email protected] (M.N. Hajmeer).

lies in their ability to extract, with high accuracy and irrespective of the degree of nonlinearity between system variables, the intrinsic relationships between independent (explanatory) and dependent (response) variables through training of the network on examples representing the phenomenon to be modeled. In this paper, we propose to use ANNs as a classification tool for characterizing the growth and no-growth (G/NG) of bacteria, and deriving the G/NG interface. We also extend the application to inquire about the probability of bacterial growth under a given set of operating

0168-1605/02/$ - see front matter D 2002 Elsevier Science B.V. All rights reserved. PII: S 0 1 6 8 - 1 6 0 5 ( 0 2 ) 0 0 3 0 8 - 2

234

M.N. Hajmeer, I.A. Basheer / International Journal of Food Microbiology 82 (2003) 233–243

conditions. The advantage of probabilistic growth models is that they can be vital components of a major risk assessment system. While many researchers (e.g., Bolton and Frank, 1999; Presser et al., 1998; Salter et al., 2000; Tienungoon et al., 2000) used logistic regression methods (Hosmer and Lemeshow, 1989) to characterize the interface between growth and nogrowth of bacteria, we propose to use ANNs and a Bayesian method as a hybrid system for computing the posterior probabilities of growth. The proposed approach is a modular system in which the first module develops a prediction model which produces an output reflecting the extent of growth from a given combination of explanatory variables, while in the other modules, the model outputs are further processed to compute the probability of growth. The advantage of the hybrid system is that, unlike logistic regression method, fewer assumptions are made in relation to the probability distribution of the explanatory and response variables. Neural networks have recently been applied to modeling a number of problems in the area of predictive microbiology. A literature review of such applications is given in Basheer and Hajmeer (2000). This includes predicting bacterial growth parameters and modeling growth curves from a set of operating and extrinsic parameters (Hajmeer et al., 1997, 1998, 2000; Basheer and Hajmeer, 2000; Najjar et al., 1997; Lou and Nakai, 2001; Jeyamkondan et al., 2001; Geeraerd et al., 1998). The ANN generalized approach for simulating growth curves proposed by Hajmeer et al. (2000) was shown to be more versatile, flexible, and less restrictive as it does not impose assumptions pertaining to the form of functions to be fitted to bacterial growth data. Additionally, ANNs offer a unique advantage over traditional approaches especially in simulating bacterial growth curves under dynamic conditions (CheroutreVialette and Lebert, 2002). The organization of this paper is as follows. First, the hybrid methodology for the probabilistic modeling of bacterial growth is presented, followed by description of the proposed modular system that incorporates the various modules. This will include a brief discussion of ANNs and the necessary statistical models needed to derive the probability density functions of model outputs and the subsequent conditional probability of growth. The proposed approach is then applied to data describing growth of pathogenic strain

of Escherichia coli in laboratory media under effect of temperature and water activity to develop probabilistic models and derive the G/NG interface. Finally, the accuracy of the proposed hybrid ANN – Bayesian approach is compared with other techniques.

2. Methodology The methodology proposed for developing the probabilistic bacterial growth model is summarized as follows. (1) Develop a well-trained and validated supervised ANN (e.g., using the error backpropagation training algorithm) using sufficiently large experimental database consisting of examples characterizing G/ NG of a given microorganism under a selected set of independent (explanatory) variables (e.g., the extrinsic and operating parameters). (2) Derive probability density function (PDF) using Parzen’s method for both the growth and nogrowth states using the relevant ANN solution as being the random variable. (3) Using Bayes’ theorem, compute the posterior probability of growth with the aid of the derived Parzen’s G/NG PDF estimators and the known (or reasonably assumed) prior probabilities of G/NG. A modular system was developed with four modules linked in series to perform all necessary computations. The hybrid system, combining ANNs and statistical Parzen – Bayesian approach, operates first by developing an ANN by training and testing on relevant examples in the first module. Subsequently, the Parzen’s growth and no-growth PDF estimators are derived in the second module. Finally, the third module computes the posterior probabilities of growth. A fourth module was also designed where the computed posterior probabilities would be plotted against the explanatory parameters (e.g., operating conditions) to acquire some knowledge pertaining to the likelihood of growth, and how it is influenced by the relative magnitude of those parameters. A brief discussion of the first three modules is given in the following subsections. 2.1. The ANN prediction module This module is the workhorse of the proposed serial hybrid system, in which a predictive ANN is devel-

M.N. Hajmeer, I.A. Basheer / International Journal of Food Microbiology 82 (2003) 233–243

oped. ANNs are computing devices made up of large number of simple, highly interconnected processing elements (called nodes or neurons) that emulate the structure and operation of the biological nervous system. Because they are abstraction of the biological neural system, they operate in parallel, learn from examples, are robust, able to handle imprecise and fuzzy information, and able to generalize (Basheer and Hajmeer, 2000). Learning in ANNs is accomplished through special training algorithms developed based on learning rules presumed to mimic the learning mechanisms of biological systems. There are many different types and architectures of neural networks varying fundamentally in the way they learn; the details of which are well documented in the literature (Schalkoff, 1997). The suitability of an arbitrary training paradigm for solving a particular problem is problemdependent. The backpropagation neural network, which was used in this study, is one but the most commonly used due to its precisely defined and understood learning laws and its distinguished ability to generalize. A backpropagation network would consist of an input layer with input nodes representing the explanatory variables of the problem, an output layer containing output node(s) representing the response variable (i.e., solution of problem), and at least one intermediate layer, called hidden layer, containing hidden nodes necessary for handling problem’s nonlinearity. Normally, one hidden layer with sufficient number of nodes is adequate to map an arbitrary function to any degree of accuracy, and thus a threelayer architecture ANNs were adopted for the present study. In a fully connected architecture, the nodes of any one layer are connected (through connection links carrying weights) to all the neurons in the succeeding and preceding layers. Through training, the optimal number of hidden nodes as well as the connection weights and nodes’ biases (thresholds) (vj and wji in Eq. (1) below) are determined. For a developed threelayer ANN with n input nodes (x = x1, x2,. . .xn), m hidden nodes, and one output node, the solution of y = f(x) for an arbitrary problem would be approximated by the ANN estimate, yˆ, computed from " !# m n X X yˆ ¼ r vj r wji xi ð1Þ j¼0

i¼0

where r is an activation function such as the simple logistic r(n) = 1/(1 + e  n) bounded between 0.0 and

235

1.0, xi is the ith input variable with x0 = 1.0, wji is weight for link between jth node in hidden layer and ith node in the input layer, vj is weight for link between the jth node in hidden layer and the output node. Note that wj0 = hj (for j = 1,m) and v0 = h0 are, respectively, the thresholds which determine the firing limits of the hidden and output nodes. Because an activation function (r) is used in the output layer, the ANN output (yˆ) is often called activation. A step-by-step simplified mathematics for the backpropagation training algorithm with illustrative example computations are given in Basheer and Hajmeer (2000) and Najjar et al. (1997), as well as a number of issues that must be addressed to insure successful ANN development. The mathematics and methods of building ANNs, diagnostic checks for improving the training, and validation of ANN and applications in the area of predictive microbiology are given in Basheer and Hajmeer (2000). The network is trained on a number of ‘training’ examples and validated on another number of independent ‘test’ examples. The training procedure adopted in this study involved starting with one hidden node in the hidden layer, training the network on the training examples, and then testing the ANN on the test examples to examine prediction performance. The same procedure was run repeatedly each time a new node was added to the hidden layer until the best architecture and set of connection weights and thresholds were obtained. A computer program that we wrote for ANN training based on backpropagation algorithm was used in this study. Because the problem involves classification into two classes (G/NG), the selection of the optimal network was based on monitoring the variation of performance parameters pertaining to the confusion matrix (C-matrix). However, because ANN activation (yˆ) is continuous, it is converted to 0 (no-growth) if yˆ V 0.5, and 1 (growth) if yˆ > 0.5. The C-matrix is stated as: 0 B C  matrix ¼ @

a

b

c

d

1

0

C B A¼@

No: of 0s predicted as 0 No: of 0s predicted as 1

1 C A

No: of 1s predicted as 0 No: of 1s predicted as 1

ð2Þ

where a, b, c, and d are the number of true negatives, false positives, false negatives, and true positives. Notice that (a + d) is the total number of cases classified correctly, and (b + c) total number of missclassified cases. Two performance measures obtained for both the training and test sets at each time a new ANN topology

236

M.N. Hajmeer, I.A. Basheer / International Journal of Food Microbiology 82 (2003) 233–243

was examined were used in gauging the performance of the network; namely fraction correct FC=(a + d)/ (a + b + c + d) and false alarm rate FAR = b/(b + d). A perfect classifier network is one with b = c = 0 implying that FC = 100% and FAR = 0.

degree of interpolation between the sample points. A rational method we used in this study is to try a number of k’s and chose the one that would yield a PDF estimate that most resembles the histogram of the data.

2.2. Parzen’s PDF estimator module

2.3. Bayesian conditional probabilities module

This module derives an approximation for the PDF of the ANN activation random variable, based on the limited number of activations obtained from the training and test examples pertaining to the problem. This is done to avoid assuming an arbitrary distribution (such as Gaussian) for the ANN activation (yˆ). Assuming a Gaussian-distributed activations when they deviate considerably from Gaussian can lead to erroneous probabilities. This module utilizes Parzen’s (1962) method to estimate the PDF of the yˆ random variable from a collection of ANN activations. For a sample of size n taken from a single population, the PDF estimator for a univariate case is derived from:

This module utilizes Bayes’ theorem to derive the posterior probability that a given example would belong to some class. In a classification problem consisting of K different mutually exclusive and exhaustive classes Ci, i = 1, 2, 3,. . ., k,. . .K, the posterior probability that class Ck was in effect when the random variable yˆ was sampled is determined from:

gðˆyÞ ¼



n 1 X yˆ  yˆ i W nk i¼1 k

ð3Þ

where yˆ is the random variable, yˆi is the ith sample of yˆ values drawn from the ANN activations, k is a smoothing parameter that controls the width of the distribution, and W is a weighting function such as the bell-shaped Gaussian function. The choice of Gaussian function does not assume Gaussian PDF for the random variable. Each of these yˆi’s represents the center of the weighting function that spreads out on both sides of yˆi proportional to k. It may be clear from Eq. (3) that the Parzen’s PDF estimator is a scaled average sum of the weighting functions for all sample cases. The Parzen estimator asymptotically approaches the true density function as the sample size increases. As for k, there is no mathematically rigorous method for selecting an optimum value, however, some guidelines exist (e.g., Masters, 1995). A too small k causes individual samples to exert too much influence, causing the PDF estimator to have distinct modes corresponding to the locations of the training examples. On the opposite, a very large k would cause significant blurring that badly distorts the details of the PDF. Between these two extremes, there exist a number of k’s that would cause an appropriate

pk LðˆyACk Þ PðCk AˆyÞ ¼ X pi LðˆyACi Þ

ð4Þ

i

where L(yˆACk) is the likelihood that the random variable yˆ belongs to class Ck, and pk is the prior probability of class k. The prior probabilities of all classes may be either assumed equal (when no information is available to indicate otherwise), assigned some distinct values based on experience, or assumed to be equal to the proportion of examples of each class in the available database; which was used in this study. Thus, in a twoclass problem (e.g., growth/no-growth), if N1 and N0 refer, respectively, to the number of event and nonevent cases in the database, then p1 = N1/(N0 + N1) and p0 = 1  p1. The likelihood function (L) in Bayes theorem (Eq. (4)) can be replaced by the PDF, and because the true PDF is often unknown, Parzen’s PDF estimator, g(yˆ ), is normally used. Therefore, for a two-class problem involving event and nonevent, the posterior probability that the ‘event’ class was in effect when the random variable y (e.g., ANN activation, yˆ) was sampled is generally computed from: P1 ðˆyÞ ¼

p1 L1 ðˆyÞ p1 g1 ðˆyÞ c p0 L0 ðˆyÞ þ p1 L1 ðˆyÞ p0 g0 ðˆyÞ þ p1 g1 ðˆyÞ ð5Þ

where L1( yˆ ) and L0( yˆ ) and g1( yˆ ) and g0( yˆ ) are, respectively, the likelihood functions and the corresponding Parzen’s PDF estimators for the random variable yˆ as being an event or a nonevent.

M.N. Hajmeer, I.A. Basheer / International Journal of Food Microbiology 82 (2003) 233–243

2.4. Step-by-step procedure For an arbitrary dataset comprising a number of input – output examples concerning bacterial growth/ no-growth, the steps needed to apply the proposed methodology are the following. (1) Divide the database containing the input vectors (e.g., T, pH, aw) and binary output (growth = 1, no-growth = 0) into one set of training examples and another set of test examples. (2) Develop a high performance prediction model from the data by training an ANN on training data and examining its validity on test examples. Run the developed optimal ANN on the available data to generate outputs (ANN activations); continuous between 0.0 and 1.0. (3) Separate all data based on the experimental observations into ‘event = growth’ and ‘nonevent = nogrowth’ sub-databases. Then, using the ANN-predicted outcomes (activations) for the ‘growth’ examples, derive the Parzen’s PDF estimator g1( yˆ ) for the ANN activation random variable using Eq. (3). Similarly, derive the Parzen’s PDF estimator g0( yˆ ) using the nogrowth examples. We used spreadsheet software to perform such computations. A plot of smooth growth and no-growth PDFs can be generated by running the PDF models on large number of yˆ values. (4) Compute the posterior ‘event’ probability, P1 using Bayes’ rule (Eq. (5)) and the computed values of PDFs for any given ANN activation yˆ. The prior probabilities for growth and no-growth may be assumed to be equal to the ratio of growth and nogrowth examples to the total number of examples in the database. The relationship between P1 and yˆ is plotted for a large number of yˆ values. Thus, for a computed ANN activation based on a given set of input variables, the associated posterior probability of growth could be determined. (5) Run the modular system on the available data, and plot and examine the relationship between the posterior growth probability, P1, and the various input parameters.

237

Salter et al. (2000), were used to develop the probabilistic hybrid system. A total of 179 combinations of temperature T (T = 7.7– 37.0 jC) and aw (aw = 0.943 – 0.987), representing the input variables, were tested for growth of E. coli R31 on laboratory media. The output variable is binary with growth labeled 1 and no-growth labeled 0. A total of 99 (T, aw) combinations yielded growth and 80 showed no growth. Thus, the prior probabilities are p1 = 0.55 and p0 = 0.45. The 179 examples were split into 143 examples for training the ANN and 36 examples (comprising 20% of the total number of examples) for testing its validity. 3.1. 2 –2 – 1 ANN Fig. 1 shows the variation of FC and FAR with the number of hidden nodes, for a maximum of 5000 training cycles. It is obvious that the topology was optimal with two nodes in the hidden layer. A much higher number of hidden nodes only slightly improved the prediction accuracy. The C-matrix coefficients and related performance measures for this ANN (referred to as 2– 2 –1 ANN to denote the number of nodes in each layer) are summarized in Table 1 for training, validation, and combined data. The FC of 93.3% suggests a reasonably good ANN classifier for this application. A mathematical expression for the ANN output (0.0 V yˆ V 1.0) based on Eq. (1) was derived using the obtained optimal weights and biases as: 1 ¼1 yˆ



19:281 1 þ expð4:154T * þ 17:868aw*  4:883Þ  18:381  7:037 þ 1 þ expð7:021T * þ 6:408aw*  5:281Þ þ exp

ð6Þ where T * and aw* are positive nondimensional temperature and water activity computed from T * = 0.029T (jC)  0.143 and aw* = 10aw  9.0.

3. Results

3.2. Modular system

The growth/no-growth data of E. coli R31as function of temperature (T ) and water activity (aw), given in

In order to facilitate the computations involved, provide an interactive means for graphical presenta-

238

M.N. Hajmeer, I.A. Basheer / International Journal of Food Microbiology 82 (2003) 233–243

Fig. 1. The effect of network topology on performance measures fraction correct FC and false alarm rate FAR for both the training and test examples.

tion of the results, and to allow for testing large number of scenarios, a modular system was constructed with the aid of spreadsheets. The modular system combines four serially operating computational units: (1) the derived optimal prediction ANN

module, (2) the Parzen’s PDF estimator module, (3) the Bayes’ conditional probability computation module, and (4) a module to plot, interactively, relationships between the computed posterior event probabilities and the explanatory parameters (i.e., T

Table 1 Comparison of the four approaches using confusion matrix performance measures Modeling approach

Combined data (179 examples) C-matrixa {a, b, c, d}

Hybrid ANN – Bayesian 2 – 2 – 1 ANN activations Nonlinear logistic regression Linear logistic regression a b c

Training data (143 examples) Accuracy (%) FCb

FARc

{75, 5, 6, 93} {75, 5, 7, 92} {67, 13, 4, 95}

93.9 93.3 90.5

5.1 5.2 12.0

{58, 22, 19, 80}

77.1

21.6

C-matrix = confusion matrix. FC = fraction correct. FAR = false alarm rate.

C-matrixa {a, b, c, d}

Validation data (36 examples) Accuracy (%) FCb

FARc

C-matrixa {a, b, c, d}

Accuracy (%) FCb

FARc

{65, 5, 4, 69} 93.7 6.8 {10, 0, 2, 24} 94.4 0.0 {65, 5, 5, 68} 93.0 6.9 {10, 0, 2, 24} 94.4 0.0 Not applicable as the entire database was used in developing logistic regression models.

M.N. Hajmeer, I.A. Basheer / International Journal of Food Microbiology 82 (2003) 233–243

239

Fig. 2. E. coli R31 growth and no-growth Parzen’s probability distribution function estimators for the 2 – 2 – 1 ANN activation random variable.

and aw) as these parameters are varied. The objective of the fourth module was to investigate how the model parameters affect the probability of growth by examining 2-D and 3-D plots. The ANN model (Eq. (6)) was run on the entire data and the ANN predictions were separated into two

sets: the ‘event’ set contained those cases where experimental observation indicated growth and the ‘‘nonevent set’’ for no-growth cases. The Parzen’s growth and no-growth PDF estimators for the ANN activation random variable yˆ were derived and plotted in Fig . 2, with optimal k found at 0.18 for both

Fig. 3. The posterior probability of growth of E. coli R31 as function of 2 – 2 – 1 ANN activation.

240

M.N. Hajmeer, I.A. Basheer / International Journal of Food Microbiology 82 (2003) 233–243

Fig. 4. The probabilistic interface between E. coli R31 growth and no-growth at various probability-of-growth levels as affected by temperature and water activity using (a) the proposed hybrid ANN – Bayesian approach, (b) nonlinear logistic regression, (c) 2 – 2 – 1 ANN activations, and (d) linear logistic regression.

M.N. Hajmeer, I.A. Basheer / International Journal of Food Microbiology 82 (2003) 233–243

PDF’s. Note that according to Eq. (3), the g1( yˆ ) expression involves 99 terms and g0( yˆ ) 80 terms. As can be seen in Fig. 2, neither PDF resembles Gaussian distribution, and as such any assumption of normality in an attempt to compute probabilities directly from the ANN activations can lead to erroneous results. The posterior probabilities of growth were computed according to Eq. (5) for all ANN activations between 0.0 and 1.0. The variation of the posterior event probabilities, P1, with the ANN activation yˆ is plotted in Fig. 3. An equation approximating Fig. 3 was derived as (R2 = 0.999): P1 ¼ 2:8391 yˆ 3 þ 4:0182 yˆ 2  0:1994 yˆ

ð7Þ

for all 0.0 V yˆ V 1.0. The practical use of this approach would be first to compute, for a given set of T and aw, the ANN activation from Eq. (6), then to use Eq. (7) (or Fig. 3) to determine the probability of growth. Some researchers (e.g., Goodman and Harrell, 1998) indicated that the operation of three-layer ANN resembles generalized nonlinear logistic regression, thus, it would be of interest to compare ANN activations to posterior growth probabilities. In Fig. 3, the dashed line represents the equality between the ANN activation and the computed posterior probability of growth (i.e., P1 = yˆ). As seen, there can be significant difference (up to 17%) between the two estimates, which may get much higher for other problems. The ANN activation exceeds the posterior probability of growth for yˆ < 0.4 indicating that considering ANN activations as event probabilities in this region is on the conservative side. However, for yˆ>0.4, the use of ANN activations would underestimate the actual probability of growth. The relationship between the posterior event probability and T and aw is displayed in Fig. 4(a). A number of (contour) curves are drawn each connecting a set of (T, aw) points with equal probability of growth ( P1). Therefore, each of these curves represents the boundary (interface) between growth and nogrowth, for the desired probability of growth. The curves were obtained by first running the modular system on 348 (T, aw) combinations obtained from the T and aw ranges with T increment of 1.0 jC and aw increment of 0.004. The obtained posterior probabilities of growth were then plotted against T and aw using contouring software for equiprobability-ofgrowth contour lines of P1 = 0.05, 0.10, 0.30, 0.50,

241

0.70, 0.90, and 0.95. As an example on the use of such curves, the P1 = 0.05 equiprobability curve indicates that all (T, aw) combinations below such curve would imply a maximum of 5% probability of growth. Similarly, all (T, aw) combinations just above the 95% curves indicate that there is at least 95% chance that those combinations would lead to growth. The question as to which probability curve is to use for a specific application depends on the level of stringency required. A higher curve in Fig. 4(a) yields a larger variety of (T, aw) combinations below which (i.e., the curve) no growth may occur for the corresponding P1. Lower curves would reduce both the number of possible (T, aw) combinations and the likelihood of growth (i.e., more conservative probability). The equiprobability plots (and related models) are effective tools useful for adjusting the value of one or more operating parameters needed to keep the probability of growth of a given microorganism in check when one or more parameters deviate from their planned values. This is of interest in process control where the risk is required to remain below a specified acceptable limit. For example, using posterior probability plot (Fig. 4(a)), an ‘intended’ increase in aw from 0.945 to 0.970 will require that the temperature be reduced from 20 to 10 jC in order to keep the probability of growth at or below 10%. Such probabilistic framework may be integrated with a major process control system that will automatically adjust one or more parameters to maintain the desired level of risk below a specified value. 3.3. Comparison with other methods The proposed methodology was compared to three approaches, namely (i) ANN activations, (ii) linear logistic regression model, and (iii) nonlinear logistic regression model. Because the true probability of growth is normally unknown, the accuracy of the methods in predicting probability can not be judged. However, the comparison to the other three approaches was carried out qualitatively using C-matrix performance measures, and qualitatively by comparing the various approaches relative to their conservatism with regard to probability of growth. The first approach involved direct comparison of the ANN activations with the posterior probabilities. The second approach involved developing a linear logistic

242

M.N. Hajmeer, I.A. Basheer / International Journal of Food Microbiology 82 (2003) 233–243

regression model relating the logit of probability of growth to linear combination of T and aw. We used a Java script (Pezullo, 2001) for logistic regression analysis. Linear logistic regression using the E. coli data yielded Logit( P1) =  201.07 + 0.1911 T + 206.25aw, where P1=[1 + e  Logit( P1)]  1. The third approach was the nonlinear logistic regression model developed by Salter et al. (2000) on the same data and expressed as:

below 50%. This is also demonstrated in Fig. 3 which clearly shows such a relationship around 0.40, and (iii) the linear logistic regression model is far from being realistic. It is also evident that for this E. coli study, the ANN activations are nearly as accurate as the hybrid approach when both approaches are used as G/NG classifier (see Table 1).

4. Conclusions LogitðP1 Þ ¼ 6:804 þ 7:153lnðaw  0:943Þ þ 10:0lnðT  3:411Þ þ 318:9ln½1  expð0:30ðT  49:23ÞÞ ð8Þ The continuous-valued predictions were converted (using the 0.5 threshold) into dichotomous variable representing whether growth has or has not occurred. The elements of the C-matrices and the two performance measures for the three approaches in comparison with the hybrid approach are summarized in Table 1 (using all 179 examples). It is evident from Table 1 that the hybrid ANN – Bayesian approach outperforms the other methods in its prediction accuracy. Also, the hybrid approach has the lowest false alarm rate; a major attribute when dealing with the economics of a plant process. The second method for comparing the four approaches relied on equiprobability plots. Fig. 4(b,c,d) shows the equiprobability plots produced using the ANN activations, linear, and nonlinear logistic regression, in comparison with Fig. 4(a) for the hybrid ANN –Bayesian approach. It is seen in Fig. 4 that the hybrid approach yielded the narrowest band of probability contour lines compared to other three approaches. This indicates closer results to the 1/0 (i.e., G/NG) experimental values. Also, comparison of the four equiprobability plots reveal that (i) the hybrid ANN – Bayesian approach is more conservative (i.e., gives higher probabilities of growth for same combination) than the Salter et al. (2000) logistic regression model for T between 17 and 29 jC, while it is less conservative for other T values, (ii) the 50% probability curves (dashed lines) for both the ANN activation and the ANN – Bayesian methods are identical, however, the hybrid system is more conservative for levels above 50% and less conservative for levels

A hybrid approach for computing probability of bacterial growth was presented. The approach combines the highly effective nonlinear mapping characteristics of ANNs and Bayesian methodology with probability density functions for the ANN activations estimated using Parzen’s method. The hybrid ANN – Bayesian approach was designed as a serial modular system where separate computations are carried out in four modules. The proposed approach was applied on data pertaining to growth of E. coli R31 in laboratory media as affected by temperature and water activity. The objective of this application was to illustrate the computations involved and provide a mechanism for comparing the proposed technique with other techniques such as logistic linear and nonlinear regression as well as raw ANN activations. Results indicated that the hybrid approach outperforms the other examined approaches in classification accuracy. The proposed hybrid ANN – Bayesian approach also has more advantages over nonlinear logistic regression. Logistic regression involves many assumptions as to the distribution of the input and output parameters, and violation of such assumptions can have a major impact on model validity. However, ANNs and Parzen’s PDF estimator methods do not require strict assumptions to be imposed (e.g., linearity and normality), and as such, models derived using the proposed approach can be more reliable. Additionally, the selection of nonlinear terms in logistic regression is tedious as there exists no specific method for choosing the shape of those nonlinear terms. On the contrary, an ANN has the ability to detect, on its own, all possible linear or nonlinear interactions between the model parameters. The hybrid system can be appended to a risk assessment model where the probability of growth could be determined, and it can be automated such

M.N. Hajmeer, I.A. Basheer / International Journal of Food Microbiology 82 (2003) 233–243

that probabilities are computed online in real time while monitoring the operating parameters of an actual process. Finally, the choice of ANN as prediction model was due to its ability to model highly nonlinear complex system and to outperform statistical regression in many applications; but could be replaced by any other prediction tool at the user’s interest.

References Basheer, I.A., Hajmeer, M.N., 2000. Artificial neural networks: fundamentals, computation, design and application. Journal of Microbiological Methods 43, 3 – 31. Bolton, L.F., Frank, J.F., 1999. Defining the growth/no-growth interface for Listeria monocytogenes in Mexican-style cheese based on salt, pH, and moisture content. Journal of Food Protection 62, 601 – 609. Cheroutre-Vialette, M., Lebert, A., 2002. Application of recurrent neural network to predict bacterial growth in dynamic conditions. International Journal of Food Microbiology 73, 107 – 118. Geeraerd, A.H., Herremans, C.H., Cenens, C., Van Impe, J.F., 1998. Application of artificial neural networks as a non-linear modular modeling technique to describe bacterial growth in chilled food products. International Journal of Food Microbiology 44, 49 – 68. Goodman, P.H., Harrell Jr., F.E., 1998. Neural networks: advantages and limitations for biostatistical modeling. http://www.scs.unr. edu/nevprop/docs/JSMgoodmanpdf.pdf, 10 pp. Hajmeer, M.N., Basheer, I.A., Najjar, Y.M., 1997. Computational neural networks for predictive microbiology: II. Application. International Journal of Food Microbiology 34, 51 – 66. Hajmeer, M.N., Basheer, I.A., Fung, D.Y.C., Marsden, J.L., 1998. A nonlinear response surface model based on artificial neural networks for growth of Saccharomyces cerevisiae. Journal of Rapid Methods and Automation in Microbiology 6, 103 – 118.

243

Hajmeer, M.N., Basheer, I.A., Marsden, J.L., Fung, D.Y.C., 2000. New approach for modeling generalized microbial growth curves using artificial neural networks. Journal of Rapid Methods and Automation in Microbiology 8 (4), 265 – 284. Hosmer, D.W., Lemeshow, S., 1989. Applied Logistic Regression. Wiley, New York. Jeyamkondan, S., Jayas, D.S., Holley, R.A., 2001. Microbial growth modelling with artificial neural networks. International Journal of Food Microbiology 64, 343 – 354. Lou, W., Nakai, S., 2001. Application of artificial neural networks for predicting the thermal inactivation of bacteria: a combined effect of temperature, pH, and water activity. Food Research International 34, 573 – 579. Masters, T., 1995. Advanced Algorithms For Neural Networks. Wiley, New York. Najjar, Y.M., Basheer, I.A., Hajmeer, M.N., 1997. Computational neural networks for predictive microbiology: I. Methodology. International Journal of Food Microbiology 34, 27 – 49. Parzen, E., 1962. On estimation of a probability density function and mode. Annals of Mathematical Statistics 36, 1065 – 1076. Pezullo, J., 2001. http://members.aol.com/johnp71/logistic.html. Presser, K.A., Ross, T., Ratkowsky, D.A., 1998. Modelling the growth (growth/no growth interface) of Escherichia coli as function of temperature, pH, lactic acid concentration, and water activity. Applied and Environmental Microbiology 64 (5), 1773 – 1779. Salter, M.A., Ratkowsky, D.A., Ross, T., McMeekin, T.A., 2000. Modelling the combined temperature and salt (NaCl) limits for growth of a pathogenic Escherichia coli strain using nonlinear logistic regression. International Journal of Food Microbiology 61, 159 – 167. Schalkoff, R.J., 1997. Artificial Neural Networks. McGraw-Hill, New York, USA. Tienungoon, S., Ratkowsky, D.A., McMeekin, T.A., Ross, T., 2000. Growth limits of Listeria monocytogenes as a function of temperature, pH, NaCl, and lactic acid. Applied and Environmental Microbiology 66 (11), 4979 – 4987.