ARTICLE IN PRESS Mechanical Systems and Signal Processing Mechanical Systems and Signal Processing 22 (2008) 1395–1411 www.elsevier.com/locate/jnlabr/ymssp
Uncertainty analysis of a neural network used for fatigue lifetime prediction S. Gareth Piercea,, Keith Wordenb, Abderrezak Bezazic a
Department of Electronic and Electrical Engineering, Centre for Ultrasonic Engineering, University of Strathclyde, 204 George Street, Glasgow G1 1XW, UK b Dynamics Research Group, Department of Mechanical Engineering, University of Sheffield, Mappin Street, Sheffield S1 3JD, UK c Laboratoire de Me´canique et Structures, Universite´ de Guelma, BP 401, Guelma 24000, Algeria Received 4 April 2006; received in revised form 7 December 2007; accepted 16 December 2007 Available online 5 January 2008
Abstract The application of interval set techniques to the quantification of uncertainty in a neural network regression model of fatigue lifetime is considered. Bayesian evidence training was implemented to train a series of multi-layer perceptron networks on experimental fatigue life measurements in glass fibre composite sandwich materials. A set of independent measurements conducted 2 months after the training session, and at intermediate fatigue loading levels, was used to provide a rigorous test of the generalisation capacity of the networks. The robustness of the networks to uncertainty in the input data was investigated using an interval-based technique. It is demonstrated that the interval approach allowed for an alternative to probabilistic-based confidence bounds of prediction accuracy. In addition, the technique provided an alternative network selection tool, and also allowed for an alternative to estimating the lifetime prediction error that was found to be a significant improvement over the Bayesian-derived estimate of confidence bound. r 2007 Elsevier Ltd. All rights reserved. Keywords: Lifetime prediction; Fatigue; Neural networks; Uncertainty; Interval analysis
1. Introduction There exists a diverse list of applications of artificial neural networks (ANNs) in areas broadly subdivided into regression and classification problems. Regression models are common in control systems [1], financial modelling [2] and general time series modelling [3], whereas classification models are more common in medical [4], remote sensing [5], acoustic and ultrasonic damage detection [6,7] and general image processing, e.g., for face and handwriting recognition [8,9]. Despite the diverse range of application areas, their remains a significant issue with respect to the quantification of the reliability of a trained network whether it be used for classification or regression problems. This is particularly exacerbated for non-linear neural network models of high complexity where complete mapping between the input and output layers may be analytically intractable. This ‘‘black-box’’ characteristic can make ANNs resistant to traditional methods of certification, and thus often precludes their Corresponding author. Tel.: +44 141 548 2617; fax: +44 141 552 2487.
E-mail address:
[email protected] (S.G. Pierce). 0888-3270/$ - see front matter r 2007 Elsevier Ltd. All rights reserved. doi:10.1016/j.ymssp.2007.12.004
ARTICLE IN PRESS 1396
S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
use in safety critical applications. This behaviour is in sharp contrast to ‘‘white-box’’ models, such as decision trees and k-nearest neighbour techniques, which often allow for direct interpretation of the model parameters [10]. Incomplete knowledge of the model response function leads to a problematic issue when attempting to classify new data that may lie far (in a statistical sense) from data used to train and test a classifier. In practice, great care should be exercised in ensuring that subsequent data streams presented to a trained network should be drawn from a statistically similar distribution to the training data [10,11]. Failure to ensure this can lead to highly undesirable results if the network was subsequently presented with data of a type it was not familiar with [12]. This whole issue is fundamentally linked to the generalisation capacity of a network model which is essentially a measure of the networks’ capacity to learn the underlying structure of the data rather than any noise present in the data. A network with good generalisation performance will tend to produce acceptable error rates for both training and testing data, whereas a poorly generalising network would give a relatively low error on the training set, and relatively high error on a test dataset. Such a network would be said to be overfitted. Generalisation can be understood by considering the bias and variance of the network model [13]. There is a trade-off between bias and variance, and a function too closely fitted to the training data (a large network with many parameters) will have a large variance and hence generalise poorly to new data. By smoothing the function (a simpler network with fewer parameters), the generalisation improves, but if taken too far will yield a model with insufficient complexity to model the data, yielding a high bias and hence large error. The key to good neural network performance is to find the model with the best generalisation performance to new data, and this requires careful consideration of the quantity and quality of the training data (it should be statistically well sampled from the generative distribution) and the relation to the overall size of the network. The use of crossvalidation and early stopping [3,13] using an independent validation dataset are often used as termination or selection criteria for network training; the final network performance being evaluated using a third independent test dataset. Other methods for ensuring good generalisation have been developed through network regularisation techniques. The problem with overfitted networks can be understood as a tendency to converge to weight matrix solutions with high component values that tend to generate excessively sharp decision boundaries (where the sharpness is more characteristic of noise in the data) [13,14]. Regularisation seeks to penalise the large weight and bias values of the network that are associated with such sharp decision boundaries. Perhaps the simplest regularisation implementation is through additive weight decay in the error function [3]. More elaborate approaches include the Bayesian evidence technique to update the network weights [15] which has the additional benefit of being able to estimate network confidence bounds directly. An important aspect of neural network model evaluation lies in the estimation of likely error associated with a regression or classification prediction. A range of techniques (including the Bayesian approach detailed earlier) have been developed for output confidence interval predications [14–19]; they all adopt a probabilistic standpoint and therefore suffer from the common drawback that since the probability distributions are usually estimated from the low-order moments of the data (typically mean and standard deviation), there is often no validation of the extremes of the distributions. Unfortunately, it is often the extreme events of the data that are likely to be associated with the unpredictable behaviour of greatest interest. An alternative approach to this issue has been investigated by the authors using a novel non-probabilistic approach to assess neural network robustness to uncertainty in input data [20]. The technique is based on the theory of information-gap uncertainty [21] and lies in presenting both crisp (single-valued) data, and interval [22] data to a number of neural networks under evaluation. Previous work has focussed on the application of this technique to classification problems [23,24]. The present paper focuses attention on an application of the interval uncertainty based technique on a regression problem comprising the prediction of fatigue lifetime of a sandwich composite material subjected to cyclic loading. A variety of neural network models have been proposed for fatigue life prediction based on the rationale that ANNs have the potential to provide a mechanism for dealing with multi-variate, often noisy and possibly non-linear datasets associated with fatigue testing, where an exact analytic model is either intractable, or too time consuming to develop [25]. Such applications have been discussed by Al-Assaf and El Kadi [26] using multi-layer perceptrons (MLPs), radial basis function (RBF) networks and modular networks [27], using four input parameters to the networks (number of cycles, maximum load, fibre orientation and stress ratio). Recent work has investigated using just a single input (strain energy) [28], recurrent neural networks and polynomial classifiers [29]. The authors have recently reported on the implementation of Bayesian evidence
ARTICLE IN PRESS S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1397
based modelling to generate lifetime predictions for a set of GFRP laminated PVC foam core sandwich samples [30]. The present paper takes this as a starting point to discuss the application of the interval uncertainty technique to this modelling approach. It is established that the benefits of the interval approach are three-fold. Firstly, it is possible to remove the reliance on probabilistic-based estimates of confidence bounds, as the interval set outputs are fully conservative. It is possible to establish worst case error predictions that comprise an inclusive bounded solution set (for a specified degree of input uncertainty to the network). This approach circumvents the inherent problem associated with all Monte-Carlo-based sampling techniques of the input space, where it is impossible to guarantee that all possible combinations of input space have been correctly investigated. Secondly, the technique can be used to identify a particular model which is intrinsically more robust to uncertainty on the input data than network solutions obtained by conventional training paradigms. Thirdly, the technique provides an alternate approach to estimating the error associated with the model lifetime prediction. This last application is compared (and found to be superior) to a set of error predictions based on Bayesian evidence techniques. 2. Materials and experimental technique 2.1. Materials and preparation The materials under investigation comprised a series of sandwich construction samples with cross-ply laminates skins and PVC foam cores. The skins had a stacking sequence of (02/902)s consisting of unidirectional glass fibre fabric with a surface density of 300 g/m2 impregnated with epoxy resin SR 1500/SD. This lay-up sequence was chosen for the work, since it was found to have strong fatigue resistance compared to other stacking sequences previously investigated by the authors [31]. The core structure comprised a 15-mm-thick expanded PVC foam (Herex C70 75) having a pore diameter of between 280 and 500 mm. The sandwich manufacturing was carried out using a vacuum bag moulding technique. The manufacture of the sandwich, the skins and the joining of the core, was carried out at the same time while starting with the plies of a skin then by interposing the core and then the second skin. The sandwich was impregnated at room temperature, and then was vacuumed at a pressure of 30 kPa for 10 h inside the mould. Before any tests, the plates were maintained in the oven for 4 h at 40 1C in order to allow a complete polymerisation of the epoxy resin. The specimens were cut out, using a diamond saw, starting from plates of dimension 300 300 mm, according to standard ASTM C393-00. The specimens had dimensions of 300 40 19 mm, which corresponded to the length, width and thickness, respectively. 2.2. Experimental test conditions A three-point bending configuration was used to undertake static and dynamic fatigue tests using an MTS 858 servo-hydraulic testing machine. This had a load capacity of 710 kN whose control and data acquisition were performed by a computer. The load unit was fatigue-rated at 10 kN and could be operated at frequencies up to 30 Hz. The supports were of cylindrical shape of 15 mm diameter for the two lower supports, and 20 mm for the central support. Initial static rupture tests were conducted on five different samples, using a loading rate of 5 mm/min. These tests enabled a mean value of static rupture displacement dr to be established. Fig. 1 illustrates the static load tests for (a) zero displacement, (b) intermediate displacement in linear portion of characteristic and (c) high displacement leading to total rupture of the top laminated skin. Dynamic fatigue tests were carried out under displacement control using a sinusoidal drive waveform of frequency 10 Hz. The mean displacement (dmoy) was maintained constant and equal to 50% of the displacement corresponding to the mean rupture displacement dr in the static tests. The ratio of the distance between supports, l, to the thickness of the specimen, h, was l/h ¼ 15. Initial dynamic tests were conducted at three different levels of loading with rd ¼ 0.60, 0.70 and 0.80, rd ¼
d max , dr
where dmax is the maximum displacement and dr is the ultimate failure displacement for the static tests.
(1)
ARTICLE IN PRESS S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1398
Fig. 1. Photographs illustrating the different stages of the static test from (a) the start of the test with zero load to (c) the rupture of the top skin of the sandwich.
For each of the three values of rd, nine samples were separately tested either until failure (for the higher loading rates of 0.7 and 0.8) or 1,000,000 cycles, whichever occurred first. The number of cycles, N was recorded along with the reduction in the force required to produce the displacement (reduced force F/F0), F0 being the force at the first cycle. This data was then divided into separate training, validation and test1 datasets for ANN training and evaluation. The training, validation and testing sets comprised a total of 634, 257 and 264 data points, respectively. A completely separate set of dynamic tests was conducted 2 months after the above data was collected. This comprised loading at intermediate values of rd ¼ 0.65 and 0.75, for four samples at each value of rd. This data (284 points in total) was used to provide a completely independent testing set against which to appraise the ANN model performance. This dataset was labelled test2 data. 3. ANN network topology and training 3.1. Network structure and training MLP networks were implemented and trained using MATLABTM in conjunction with the NETLAB toolbox developed by Nabney [14]. The data was presented to a series of MLP networks with a variable number of hidden nodes arranged in a single hidden layer. Each network had two input nodes corresponding to the input variables N and R, and a single output node corresponding to the value of F/F0. The output y, from the second (hidden) layer was given by " # M d X X ð2Þ ð1Þ y¼ wkj tanh wji xi þ bj þ bk , (2) j¼1
i¼1
wð1Þ ji
is the weight matrix of the first layer, wð2Þ where kj the weight matrix of the second layer, bj the bias vector of the first layer, bk the bias vector of the second layer, d the number of input nodes and M the number of hidden nodes. Two training regimes were investigated, both maximum likelihood and Bayesian evidence based training [13,14]. For the maximum likelihood training, the error function used to assess the performance of the output prediction with respect to the true target value was given by the standard mean square error function with an additional term to provide weight regularisation: E ML ¼
N W 1X aX ðtn yn Þ2 þ w2 , 2 n¼1 2 i¼1 i
(3)
where EML is the (maximum likelihood) error, ti the target value, yi the network output prediction, wi the weight matrix and a the hyperparameter controlling weight decay. The purpose of the second term was implement penalisation of large weight values in the network [13]. When implementing Bayesian evidence training [13–15], the error function was modified to E BE ¼
N W bX aX ðtn yn Þ2 þ w2 , 2 n¼1 2 i¼1 i
(4)
ARTICLE IN PRESS S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1399
where EBE is the (Bayesian evidence) error function and b is the parameter describing the inverse variance of the noise model for the target data. For a succession of training sets with different numbers of patterns N, then it is clear that the first term becomes more and more dominant with increasing N, whilst the second term is independent of N. For fixed values of a and b, for increasingly large values of N, the maximum likelihood solution becomes an increasingly acceptable approximation to the most probable weight distribution wMP calculated from the Bayesian approach [13–15]. Besides accommodating regularisation in a consistent framework, the Bayesian approach has the additional advantage of providing a mechanism to generate confidence bounds on the output prediction values. Assuming that the posterior distribution of the weight matrix is Gaussian in nature, it is possible to find the variance corresponding to the mean output y(x, wMP), i.e. the standard prediction output for the most probable weight distribution. This variance was given by [13] s2 ¼
1 þ gT A1 g, b
(5)
where A is the Hessian matrix defining the second derivatives of the error function and g is the gradient of the error function. The standard deviation, s of the predictive distribution can be interpreted as an error bar on the mean value yMP which has two contributions. The first arises from the intrinsic noise in the target data, the second from the posterior distribution of the network weights. The ease of implementation of these powerful network training paradigms was a major consideration in employing the NETLAB toolbox to realise the network training. The number of hidden nodes in the second layer was varied between 1 and 10 hidden units. Each individual network structure was trained with 100 independent training sessions starting at differently randomly chosen points on the error surface, so that a total of 1000 independent networks were evaluated. Up to 100 iterations of a scaled conjugate gradient optimisation were implemented using a small hyperparameter a ¼ 0.01 in maximum likelihood training to control weight decay. For the Bayesian evidence training, the same initial value of a ¼ 0.01 was employed along with an initial inverse noise variance parameter b ¼ 100. During the evidence update procedure of the network training [14], these hyperparameters were re-evaluated iteratively. 3.2. Data pre-processing For each set of loading values (rd ¼ 0.6, 0.7, 0.8), nine separate samples were fatigued. These results were grouped into training, validation and test sets in the ratio 5:2:2, respectively. Numerically, the variables ranged from 1 to 1,000,000 for N the number of cycles, and between 0.6 and 0.8 for rd the loading value. The output values of F/F0 ranged from 0 to 1. In order to fully utilise the maximum dynamic range of the tanh transfer function of the ANN networks, the input variables were first normalised. The values of N were logarithmically compressed (such that the values now lay between 0 and 6) and then N and rd were normalised to lie in the range 1 to +1 before presentation to the networks. 4. Conventionally trained network results 4.1. Neural network training and network selection Both maximum likelihood and Bayesian evidence training algorithms were implemented in MATLAB using the NETLAB toolbox. The performance of the networks was evaluated by calculating the percentage mean square error of the true target values from the network predictions. MSE ¼
N 100 X ðtn yn Þ2 , Ns2t n¼1
(6)
where yn represents the network outputs, tn the target values, N the number of samples and st is the variance of the target data.
ARTICLE IN PRESS S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1400
The data used to train the networks is plotted in Fig. 2. Note the inherent scatter present in the load reduction (F/F0) at each level of loading rd. This behaviour is typical of fatigue tests and is a motivating factor behind the use of the neural network modelling technique. For each of the 1–10 hidden node network structures, with 100 networks of each, the error was calculated for the training, validation and test datasets. These errors are tabulated in Tables 1 and 2 for maximum likelihood (ML) and Bayesian evidence (BE) training. In each case, the best performing network (from 1000) on the validation dataset is identified. Within a conventional cross-validation network training framework, this result represents the best network choice from all the possibilities. Included at the bottom of each of Tables 1 and 2 is the mean and variance of the forward propagation of the test set through all 100 networks. For maximum likelihood training, the best selected network was for three hidden nodes, network no. 93. For all 100 networks with three hidden nodes, the test data returned a mean of 33.5% with variance of 0.593%. For Bayesian evidence training, the best selected network was for two hidden nodes, network no. 85. For all 100 networks with two hidden nodes, the test data returned a mean of 31.6% with variance of 0.449%. Although the Bayesian evidence based training produced only a marginal improvement over the maximum likelihood training in this example, it should be remembered that in this case a large amount of training data was available to the networks. In situations with less training data, it is typically the case that the Bayesian evidence approach provides superior performance to Maximum likelihood training. Additionally, 1 0.9
Reduced Force F/F0
0.8 0.7 0.6 0.5 0.4 0.3 rd= 0.8 rd= 0.7 rd= 0.6
0.2 0.1 0 100
101
102 103 104 Number of cycles N
105
106
Fig. 2. Training set used for neural network modelling showing scatter of fatigue performance.
Table 1 Error rates for maximum likelihood training No. of hidden nodes
Training data (%) Validation data (%) Test data (%)
1
2
3
4
5
6
7
8
9
10
41.8 33.4 33.4
39.8 32.6 32.5
39.2 31.3 32.1
39.3 33.3 32.0
39.0 33.6 32.5
38.2 32.9 32.6
38.6 33.9 32.4
38.0 32.0 32.6
36.8 32.3 32.1
38.0 33.4 32.5
Best network, three hidden nodes, network no. 93. Mean 33.5%, variance 0.593% across all 100 networks, with n_hidd ¼ 3 with test data.
ARTICLE IN PRESS S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1401
Table 2 Error rates for Bayesian evidence training No. of hidden nodes
Training data (%) Validation data (%) Test data (%)
1
2
3
4
5
6
7
8
9
10
41.8 33.8 33.4
34.5 27.4 28.4
30.2 28.1 29.6
30.3 28.7 29.8
30.6 29.2 29.8
30.3 29.1 29.6
30.6 29.9 29.8
30.8 31.3 29.9
30.8 30.3 30.0
30.7 30.1 29.4
Best network, two hidden nodes, network no. 85. Mean 31.6%, variance 0.449% across all 100 networks, with n_hidd ¼ 2 with test data.
1
Reduced Force F/F0
0.9
0.8
0.7
0.6
0.5
Training Data Model Prediction
0.4 100
102
104 Number of cycles N
106
Fig. 3. Training data and neural network prediction using Bayesian evidence training.
the Bayesian approach provides for the generation of prediction error bars based on the density of the training data and posterior distribution of the weight matrix. Figs. 3 and 4 illustrate the fit of the best selected models for Bayesian evidence and maximum likelihood training, respectively. It is clear that the Bayesian model provides a better fit to the data, especially at lower values of N. The test of the network performance was when evaluating the model fit to both test1 and test2 datasets. Test1 set was recorded at the same time as the training and validation data was recorded, and test2 set was recorded 2 months later using intermediate values of load rd ¼ 0.65, 0.75. Since these loading levels were inclusive in the range of the data used to train the network, it was anticipated that a well-generalised network would be able to provide a good prediction of these new loading levels. The results in Figs. 5 and 6 for test1 and test2 datasets, respectively, confirm the excellent fit of the model to the data. The numerical values of fit between model and data were given by mean square errors of 31.5% for test1 set and 27.8% for test2 set. Following successful construction of the model, it was possible to generate standard S–N lifetime curves by interpolating the model output values for pre-determined values of reduced stiffness F/F0. These curves
ARTICLE IN PRESS 1402
S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1
Reduced Force F/F0
0.9
0.8
0.7
0.6
0.5 Training Data Model Prediction 0.4 100
102
104 Number of cycles N
106
Fig. 4. Training data and neural network prediction using maximum likelihood training.
1
Reduced Force F/F0
0.9
rd=0.6
0.8 rd=0.8 0.7
0.6
0.5
0.4 100
rd=0.7
Test1 Data Model Prediction
102
104 Number of cycles N
106
Fig. 5. Test1 data and neural network prediction using Bayesian evidence training.
represent the number of load cycles required to reduce the stiffness by a given amount for a given loading level. For example, in Fig. 3, the line corresponding to a reduction in stiffness of 10% is shown (at F/F0 ¼ 0.9), the line corresponding to this reduction in the S–N curve is designated the N10 curve, showing the number of cycles required to achieve a 10% reduction in stiffness for a given loading level. Fig. 7 illustrates the S–N curve
ARTICLE IN PRESS S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1403
1
0.9
Reduced Force F/F0
rd=0.65 0.8
0.7
0.6
rd=0.75
0.5
Test2 Data Model Prediction
0.4 100
102
104 Number of cycles N
106
Fig. 6. Test2 data and neural network prediction using Bayesian evidence training.
Model Prediction Training data Validation data Test1 data Test2 data
0.8
Loading level rd
0.75
0.7
0.65 N5
N10
N20
0.6 101
102
103 104 Number of cycles N
105
106
Fig. 7. S–N curves for N5, N10, N20 showing experimental data and model prediction for Bayesian model with two hidden nodes, no. 85.
for the ANN model structure along with all the experimental datasets for N5, N10, N20. It is clear that excellent agreement was obtained between the model and the training, validation and test datasets. Very good agreement was also obtained for the independent test2 dataset with the only major discrepancy an overestimation of lifetime at the 0.75 loading level for the N10 curve.
ARTICLE IN PRESS 1404
S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1.1
1 rd=0.65
Reduced Force F/F0
0.9
0.8
0.7 Test2 Data Model Prediction Model Error
0.6
rd=0.75
0.5
0.4 100
1.5x105
23 102
104 Number of cycles N
106
Fig. 8. Model prediction and error bars for loading level of rd ¼ 0.65.
4.2. Error predictions A significant advantage of the Bayesian evidence based training employed in this work was the inherent capability to make estimates of the prediction accuracy of the model. Fig. 8 illustrates the model predicted output for a loading level of rd ¼ 0.65 along with test2 datasets. The dotted lines indicate the one standard deviation error bars calculated in accord with Eq. (5). Note that these error bars are larger than may be expected. Considering a lifetime prediction made for N10, i.e. F/F0 ¼ 0.9 as illustrated in Fig. 8, then it is clear that the model error bars produce a large variation in lifetime, from 23 cycles to 1.5 105 cycles. The reason that the model prediction error bars are so large is that the model is not trying to provide a fit to the training data at just rd ¼ 0.6, it is instead fitting a model to all of the available training data points, recorded at the three values of rd ¼ 0.6, 0.7, 0.8. Therefore, the associated model prediction error based on a consideration of Eq. (5) must take into account the scatter of data across all the training datasets. This scatter is higher than in a single dataset, leading to the increase in the prediction error bars. Confirmation experiments [30] showed that training a network on just a single set of training data (e.g., at rd ¼ 0.6) produced a set of model prediction error bars very similar to the variance of that dataset. 5. Interval-based networks and network robustness 5.1. Interval propagation and definition of interval error functions Having investigated the network performance to crisp (i.e. single-valued) input data, the next step was to quantify the robustness of the models to variations in the input data. Since each model had different values of the weight and bias matrix, it was anticipated that different models would have different sensitivity to such variations of the input data. Identifying the network with the lowest such sensitivity involved testing each network for its robustness to change in input conditions. One traditional approach to this problem would be to employ Monte-Carlo techniques involving randomising the input data (within defined bounds) a large number of times, and evaluating the change in output error for each discrete set of inputs. This technique has
ARTICLE IN PRESS S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1405
a significant drawback, especially when applied to non-linear MLP networks, in that it is impossible to be sure of mapping all possible combinations of variation in input space to output space unless an unfeasibly large number of sample points are used [32]. Since interest lies in understanding the worst possible performance of the classifier in the presence of input data uncertainty, a Monte-Carlo approach would therefore not provide certainty of behaviour under all possible input data conditions. The other techniques discussed in the introduction have a similar flaw in that they generate a probabilistic view of the likelihood of the classifier performance with respect to input data fluctuation. It was to circumvent this problem that an interval number approach to the input uncertainty was adopted. Interval numbers [33] occupy a bounded range on the number line, and can be defined as an ordered pair of real numbers [a, b] with aob such that ½a; b ¼ fxjapxpbg.
(7)
Interval numbers have specific rules for the standard arithmetic operations of addition, subtraction, multiplication and division [32], and their forward propagation through ANNs is calculable provided that the transfer functions associated with the networks are monotonic (the tanh function of Eq. (2) satisfies this condition). The MATLABTM compatible toolbox INTLAB [34] was used to implement the interval calculations required to calculate forward propagation through the MLP networks. This toolbox incorporated a rigorous approach to rounding which was critical when using finite precision calculations in order to preserve the true nature of the conservative interval bounds. Rigorous testing of the INTLAB toolbox against established interval calculation routines was undertaken before use on MLP network structures [35]. In practice, each input value xi of the datasets was intervalised by a parameter a such that ½xia ; xib ¼ ½ðxi aÞ; ðxi þ aÞ
(8)
This interval was then forward propagated through the networks under test, and the interval output values [ya, yb] were compared with the target values to calculate worst case (WC) and best case (BC) errors calculated according to the following relations: ( ) N ðyna tn Þ2 if jyna tn jXjynb tn j 100 X WC ¼ , (9) Ns2t n¼1 ðynb tn Þ2 if jyna tn jojynb tn j 9 8 0 if yna otn oynb > > > > > > > > N = < 100 X else , BC ¼ 2 ðyna tn Þ if jyna tn jpjynb tn j > Ns2t n¼1 > > > > > > ; : ðy tn Þ2 if jy tn j4jy tn j > nb
na
(10)
nb
where yna is the lower interval bound of the nth output, ynb is the upper interval bound of the nth output and tn is the target value. The worst case errors corresponded to the furthest output bound from the target, and the best case errors to the closest, with the proviso that the best case errors were zero, if the targets were contained wholly within the output interval. 5.2. Forward propagation of intervals though networks Discrete values of uncertainty size a were applied to the training, validation, test1 and test2 datasets, and the resulting intervalised sets propagated through all the networks under investigation. a took values of (0, 0.002, 0.004, 0.006, 0.008, 0.01, 0.012, 0.014, 0.016, 0.018, 0.02, 0.025, 0.03, 0.035, 0.04, 0.045, 0.05, 0.06, 0.07, 0.08, 0.09 and 0.1). Since the datasets were normalised to unity, this implied a maximum uncertainty of 10% applied to the input data. Note that data was intervalised across both the input variables (N and rd) to fully investigate variation in the input data space (it would be possible to consider the variables separately if desired). Fig. 9 illustrates the network no. 85 output for two hidden nodes, Bayesian evidence training, and interval size of a ¼ 0.1, considering test2 dataset, for just a single loading level (rd ¼ 0.75) shown for clarity.
ARTICLE IN PRESS 1406
S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1
Reduced Force F/F0
0.9
0.8
0.7
0.6 Test2 Data Model Prediction Interval Lower Bound Interval Upper Bound
0.5
0.4 100
101
102
103 104 Number of cycles N
rd=0.75
105
106
Fig. 9. Interval propagation of test2 data through Bayesian evidence network with two hidden nodes, no. 85 (showing only rd ¼ 0.75 for clarity).
140
120 Worst Case Error Best Case Error
Error (%)
100
80
60
40
20
0 0
0.02
0.04 0.06 Interval Size
0.08
0.1
Fig. 10. Worst case and best case errors as function of interval size for test2 dataset, Bayesian evidence trained network with two hidden nodes, no. 85.
It is evident that the interval outputs always encompass the ‘‘crisp’’ solution, i.e. the model prediction output where the interval size can be considered to be zero. The target values (circles in Fig. 9) are not necessarily bounded by the interval outputs.
ARTICLE IN PRESS S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1407
By evaluating the worst case (WC) and best case (BC) error functions in accord with Eqs. (9) and (10), it is possible to plot these as functions of the interval size as illustrated in Fig. 10. Such a plot is useful, as it allows a designation of a permitted level of input data fluctuation to guarantee a particular absolute worst case mean square error on the output. For example, in Fig. 10, a designated allowed worst case error of 80% corresponds to an uncertainty in input parameters of 6% and a worst case error of 124% corresponds to 10%, etc. 5.3. Robust network selection using interval techniques Having established the definition of the worst case and best case error functions applied to a single network structure, the technique was extended to investigate propagation through multiple networks. In Fig. 11, the mean and range of the worst case and best case error functions are plotted for all 100 networks with two hidden nodes trained using the Bayesian evidence algorithm. The markers indicate the mean of the 100 values, and the error bars show the range of values obtained. It is obvious from Fig. 11 that there is a very large range (the error axis is plotted on a logarithmic scale) of the worst case error values across the 100 networks. Not all networks are equally robust to uncertainty in the input data. Fig. 11 identifies the best and worst networks at interval size a ¼ 0.1 as network no. 85 and no. 89, respectively (the best network has the lowest worst case error for a given uncertainty value). In this fashion, it is possible to use the interval technique to discriminate between otherwise similarly performing networks. Consulting Table 2 shows that across all 100 Bayesian evidence trained networks, the error on test1 set had a mean ¼ 31.6% and variance ¼ 0.449%. This implies that in a conventional network training and testing framework, there was little to choose between the different networks. However, reference to Fig. 11 shows that when response to uncertainty is considered, the networks may have very widely different performance. It is co-incidental that in this case the best selected network (no. 85) from the conventional framework is very close to the network with the lowest worst case error performance (in fact network no. 78). In general, the authors have found that this is not the case [20,23,24,32].
103
Error (%)
102
101
100
Worst Case Error Best Case Error Network #85 Network #89
10-1 0
0.01
0.02
0.03
0.04 0.05 0.06 Interval size
0.07
0.08
0.09
0.1
Fig. 11. Mean and range of error rates as function of interval size: test2 dataset, Bayesian evidence networks, two hidden nodes.
ARTICLE IN PRESS 1408
S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
Reduced Force F/F0
1
0.95
0.9 rd=0.65 0.85 101
102
103
104
Test2 Data Model Prediction Interval Lower Bound Interval Upper Bound
1 Reduced Force F/F0
105
0.95
0.9 rd=0.75 0.85 101
102
103 Number of cycles N
104
105
Fig. 12. Interval propagation of test2 data through Bayesian evidence network no. 85 with two hidden nodes: (a) rd ¼ 0.65, (b) rd ¼ 0.75.
5.4. Interval technique for appraisal of network prediction errors The interval approach also provides an interesting alternative for network error quantification. Consider Fig. 12 showing test2 dataset, the model prediction outputs, and the interval outputs (at a ¼ 0.16) for Bayesian network no. 85 with two hidden nodes. The loading rates are plotted on separate axes for clarity. The interval size was chosen such that the intervalised model output predictions wholly contained all of the actual experimental data points of test2 dataset. Fig. 12 also shows dotted lines at F/F0 ¼ 0.9 corresponding to the N10 lifetime prediction (following the same argument detailed in Section 4.2 and Fig. 8 used for error predictions concerning Bayesian evidence training). For the interval predictions, the lifetime range estimates are now 2.0 103 to 33 103 cycles for the rd ¼ 0.65 data, and 0.25 103 to 4.4 103 cycles for rd ¼ 0.75. For the rd ¼ 0.65 data, this represents a huge reduction in lifetime prediction range compared to the Bayesian evidence derived error bars which returned 23 to 1.5 105 cycles. Furthermore, the interval output predictions were conservative in nature, and therefore represented the worst possible case for the given level of input uncertainty. In contrast, the Bayesian error bars of Fig. 8 correspond to a probabilistic model of the variance (Eq. (5)). 5.5. Summary of main results for interval techniques and network robustness A number of distinct advantages were apparent when applying interval-based techniques to network analysis and performance appraisal which are detailed as follows. Firstly, it was demonstrated that through the definition of an interval worst case error function, it was possible to quantify the robustness of a particular network to uncertainty in the input dataset. The advantage of the interval technique over conventional MonteCarlo-based simulations lay in the fact that since the interval bounds on the worst case error function were fully conservative, it followed that the worst case error was indeed the worst possible case that could occur for the given level of input data uncertainty (since Monte-Carlo techniques are probabilistic in nature, this condition was not guaranteed when using them). Secondly, since the amount of robustness to input uncertainty varied for each separate network structure under investigation, it was possible to use the interval
ARTICLE IN PRESS S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1409
propagation technique to select the most robust network to input uncertainty. This provided an alternative approach to distinguishing between networks that superficially seemed to be similar when using conventionalbased network selection protocols. Finally, the interval technique afforded an alternative approach to the quantification of network prediction errors. By setting the input interval size such that the output interval predictions contained all of the measured test data points, it was possible to use the model output interval bounds as conservative limits for the network prediction errors. This approach gave significantly tighter error bounds for the real data than obtained when using the probabilistic error bounds calculated from the variance of the posterior distribution function using Bayesian-based network training algorithms as detailed in Section 4.2. 6. Conclusions The application of interval-based uncertainty techniques to a neural network regression model has been investigated. Experimental data from a series of fatigue tests of a set of foam core sandwich composite samples was used to train and test a series of multi-layer perceptron artificial neural networks. The importance of good network training and the relation to generalisation have been discussed, and the implemented networks consisted of trials of both conventional maximum likelihood and Bayesian evidence based training algorithms. It was established that the Bayesian-trained networks provided a superior fit to the available training data using a simpler network structure (two hidden nodes as opposed to three for the maximum likelihood training). The fit of the models was subjected to a stringent test using a second set of experimental measurements (conducted 2 months after the training data was recorded), at intermediate loading values to those taken in the training phase. Again excellent fit between the model predictions and the experimental data was obtained. The Bayesian training technique had an additional benefit in that it allowed for a direct calculation of the prediction error associated with a particular model output. However, it was demonstrated that because the Bayesian error bars were calculated to account for the scatter in all different loading levels across the different datasets, the error bars so obtained were sufficiently large to be of little practical value. For test2 dataset at a loading level of rd ¼ 0.65, the N10 lifetime range was estimated as lying between 23 and 1.5 105 cycles. Following this conventional training and evaluation approach, the application of an interval uncertainty technique was introduced. Instead of forward propagating single-valued (or crisp) data through the network models, the input data was converted into an interval set. This data was then presented to the network structures under investigation, and the interval output predictions were compared with the target values to obtain a measure of the worst case and best case error performance for each network (for a given level of input uncertainty). It has been shown that there were three principal areas of interest to using such an approach. Firstly, it was demonstrated it was possible to remove the reliance on probabilisticbased estimates of confidence bounds, as the interval set outputs were, by their nature, fully conservative. It was possible to establish an absolute worst case error prediction that comprised an inclusive bounded solution set (for a specified degree of input uncertainty to the network). This approach circumvented the inherent problem associated with Monte-Carlo-based sampling techniques of the input space, where it is generally unfeasible to guarantee that all possible combinations of input space have been correctly investigated. Secondly, the technique was used to identify a particular model that was shown to be intrinsically more robust to uncertainty on the input data than other network solutions that would appear indistinguishable when judged by conventional training paradigms. The third benefit associated with the interval technique was identified when using the approach to estimate bounds on the lifetime prediction from a neural network model. For test2 dataset at a loading level of rd ¼ 0.65, the N10 lifetime range was estimated as lying between 2.0 103 and 33 103 cycles, a very significant improvement over the Bayesian estimate. Acknowledgements This work was supported by EPSRC grant number RA 013700 in association with DSTL Farnborough who are acknowledged for sample provision and assistance with data collection. The authors acknowledge the use
ARTICLE IN PRESS 1410
S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
of the software package NETLAB developed by Ian Nabney of Aston University (http://www.ncrg.aston. ac.uk/netlab/) and INTLAB developed by Siegfried M. Rump at Technical University Hamburg-Harburg, Germany (http://www.ti3.tu-harburg.de/rump/intlab/).
References [1] F. Zorriassatine, J.D.T. Tannock, A review of neural networks for statistical process control, Journal of Intelligent Manufacturing 9 (3) (1998) 209–224. [2] V. Kodogiannis, A. Lolis, Forecasting financial time series using neural network and fuzzy system-based techniques, Neural Computing & Applications 11 (2) (2002) 90–102. [3] S. Haykin, Neural Networks, a Comprehensive Foundation, second ed., Prentice-Hall, 1999. [4] P.J.G. Lisboa, A review of evidence of health benefit from artificial neural networks in medical intervention, Neural Networks 15 (2002) 11–39. [5] M.F. Augusteijn, B.A. Folkert, Neural Network classification and novelty detection, International Journal of Remote Sensing 23 (14) (2002) 2891–2902. [6] G. Manson, K. Worden, D. Allman, Experimental validation of a structural health monitoring methodology. Part III. Damage location on an aircraft wing, Journal of Sound and Vibration 259 (2) (2003) 365–385. [7] A. Masnata, M. Sunseri, Neural network classification of flaws detected by ultrasonic means, NDT&E International 29 (2) (1996) 87–93. [8] M.J. Er, W.L. Chen, S.Q. Wu, High-speed face recognition based on discrete cosine transform and RBF neural networks, IEEE Transactions on Neural Networks 16 (3) (2005) 679–691. [9] C. Neubauer, Evaluation of convolutional neural networks for visual recognition, IEEE Transactions on Neural Networks 9 (4) (1998) 685–696. [10] S. Dreiseitl, L. Ohno-Machado, Logistic regression and artificial neural network classification models: a methodology review, Journal of Biomedical Informatics 35 (2002) 352–359. [11] G. Schwarzer, W. Vach, M. Schumacher, On the misuses of artificial neural networks for prognostic and diagnostic classification in oncology, Statistics in Medicine 19 (2000) 541–561. [12] Z. Zhang, K. Friedrich, Artificial neural networks applied to polymer composites: a review, Composites Science and Technology 63 (14) (2003) 2029–2044. [13] C.M. Bishop, Neural Networks for Pattern Recognition, Oxford University Press, 1995. [14] I.T. Nabney, Netlab—Algorithms for Pattern Recognition, Springer, 2002. [15] D.J.C. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press, 2003. [16] G. Papadopoulos, P.J. Edwards, Confidence estimation methods for neural networks: a practical comparison, IEEE Transactions on Neural Networks 12 (6) (2001) 1278–1287. [17] D. Lowe, C. Zapart, Point-wise confidence interval estimation by neural networks: a comparative study based on automotive engine calibration, Neural Computing & Applications 8 (1999) 77–85. [18] G. Chryssolouris, M. Lee, A. Ramsey, Confidence interval prediction for neural network models, IEEE Transactions on Neural Networks, 7 (1) (1996) 229–232. [19] R. Shao, E.B. Martin, J. Zhang, A.J. Morris, Confidence bounds for neural network representations, Computers and Chemical Engineering 21 (Suppl.) (1997) S1173–S1178. [20] S.G. Pierce, Y. Ben-Haim, K. Worden, G. Manson, Neural network robustness optimisation using information-gap theory, IEEE Transactions on Neural Networks 17 (6) (2006) 1349–1361. [21] Y. Ben-Haim, Information-Gap Decision Theory: Decisions Under Severe Uncertainty, Academic Press, 2001. [22] R.M. Moore, Interval Analysis, Prentice-Hall, 1966. [23] S.G. Pierce, K. Worden, G. Manson, A novel info-gap technique to assess reliability of neural network based damage detection, Journal of Sound and Vibration 293 (1–2) (2006) 96–111. [24] S.G. Pierce, K. Worden, G. Manson, Uncertainty propagation through radial basis function networks, part II: classification networks, EURODYN, 4–7 September 2005, Paris, France, Paper no. 569, 2005. [25] J.A. Lee, D.P. Almond, A neural-network approach to fatigue-life prediction, in: B. Harris (Ed.), Fatigue in Composites: Science and Technology of the Fatigue Response of Fibre-Reinforced Plastics, University of Bath, UK, 2003. [26] Y. Al-Assaf, H. El Kadi, Fatigue life prediction of unidirectional glass fiber/epoxy composite laminae using neural networks, Composite Structures 53 (1) (2001) 65–71. [27] H. El Kadi, Y. Al-Assaf, Prediction of the fatigue life of unidirectional glass fiber/epoxy composite laminae using different neural network paradigms, Composite Structures 55 (2) (2002) 239–246. [28] H. El Kadi, Y. Al-Assaf, Energy-based fatigue life prediction of fiberglass/epoxy composites using modular neural networks, Composite Structures 57 (1–4) (2002) 85–89. [29] Y. Al-Assaf, H. El Kadi, Fatigue life prediction of composite materials using polynomial classifiers and recurrent neural networks, Composite Structures 77 (4) (2007) 561–569. [30] A. Bezazi, S.G. Pierce, K. Worden, Harkarti El Hadi, Fatigue life prediction of sandwich composite materials under flexural tests using a Bayesian trained artificial neural network, International Journal of Fatigue 29 (4) (2007) 738–747.
ARTICLE IN PRESS S.G. Pierce et al. / Mechanical Systems and Signal Processing 22 (2008) 1395–1411
1411
[31] A. Bezazi, A. El Mahi, j.m. Berthelot, A. Kondratas, Investigation of cross-ply laminates behaviour in three point bending tests. Part II: cyclic fatigue tests, Materials Science 9 (1) (2003) 128–133. [32] S.G. Pierce, K. Worden, G. Manson, Information-gap robustness of a neural network regression model, in: IMAC XXII, 26–29 January, Dearborn, MI, 2004. [33] R.M. Moore, Interval Analysis, Prentice-Hall, 1966. [34] S.M. Rump, INTLAB—INTerval LABoratory, in: T. Csendes (Ed.), Developments in Reliable Computing, Kluwer Academic Publishers, 1999, pp. 77–104. [35] S.G. Pierce, Robust reliability of neural networks for engineering applications, Final Report EPSRC RA013700, Chapter 3, INTLAB Software Evaluation, University of Sheffield, Final Project Report, 2006.