Wavelet energy signatures and robust Bayesian neural network for visual quality recognition of nonwovens

Expert Systems with Applications 38 (2011) 8497–8508 Contents lists available at ScienceDirect Expert Systems with Applications journal homepage: ww...

Download PDF

821KB Sizes 0 Downloads 18 Views

Report

PDF Reader
Full Text

Expert Systems with Applications 38 (2011) 8497–8508

Contents lists available at ScienceDirect

Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa

Wavelet energy signatures and robust Bayesian neural network for visual quality recognition of nonwovens Jianli Liu a,⇑, Baoqi Zuo b,c, Xianyi Zeng d,e, Philippe Vroman d,e, Besoa Rabenasolo d,e a

College of Textiles and Clothing, Jiangnan University, Wuxi, 211422, China College of Textile and Clothing Engineering, Soochow University, Suzhou, 215123, China c National Engineering Laboratory for Modern Silk, Soochow University, Suzhou, 215123, China d Univ Lille Nord de France, F-59000 Lille, France e ENSAIT, GEMTEX, F-59056 Roubaix, France b

a r t i c l e

i n f o

Keywords: Visual quality recognition Nonwovens Wavelet energy signature Outlier detection Evidence criterion Regularization parameter Bayesian neural network

a b s t r a c t In this paper, the visual quality recognition of nonwovens is considered as a common problem of pattern recognition that will be solved by a joint approach by combining wavelet energy signatures, Bayesian neural network, and outlier detection. In this research, 625 nonwovens images of 5 different grades, 125 each grade, are decomposed at 4 levels with wavelet base sym6, then two energy signatures, norm-1 L1 and norm-2 L2 are calculated from wavelet coefﬁcients of each high frequency subband to train and test Bayesian neural network. To detect the outlier of training set, scaled outlier probability of training set and outlier probability of each sample are introduced. The committees of networks and the evidence criterion are employed to select the ‘most suitable’ model, given a set of candidate networks which has different numbers of hidden neurons. However, in our research with the ﬁnite industrial data, we take both the evidence criterion and the actual performance into account to determine the structure of Bayesian neural network. When the nonwoven images are decomposed at level 4, with 500 samples to training the Bayesian neural network that has 3 hidden neurons, the average recognition accuracy of test set is 99.2%. Experimental results on the 625 nonwoven images indicate that the wavelet energy signatures are expressive and powerful in characterizing texture of nonwoven images and the robust Bayesian neural network has excellent recognition performance. Ó 2011 Elsevier Ltd. All rights reserved.

1. Introduction During the past three decades, much work has been conducted on fabric defect automatic detection and recognition using computer vision (Liu & Zuo, 2007). However, few related research has been done to identify the visual quality of nonwovens specially, such as surface uniformity, defects, and color difference. As the direct reﬂection of the ﬁbrous mass distribution and structural performance, the evaluation of visual quality is one of the vital measurements of nonwoven products. In recent years, even though some researchers employed image analysis to describe the surface uniformity and mass distribution that are just one aspect of the evaluation of nonwoven products, the visual quality recognition by using computer vision and pattern recognition is just at the beginning (Liu et al., 2009, 2010; Militky´, Rubnerová, & Klicka, 1999). In this paper, a novel method to recognize the visual quality of nonwoven that involves wavelet energy signature, Bayesian neural network, outlier detection and model selection is presented. Each ⇑ Corresponding author. E-mail address: [email protected] (J. Liu). 0957-4174/$ - see front matter Ó 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.eswa.2011.01.049

nonwoven image is decomposed at level 4 with wavelet base sym6, and two textural features, norm-1 L1 and norm-2 L2 calculated from the wavelet coefﬁcients of each high frequency subband, are used as the inputs of Bayesian neural network for training and test. To detect the outlier in the training set, the scaled outlier probability is introduced that makes the Bayesian neural network more robust. Considering the evidence criterion as a complementary evaluation parameter, the model selection method is also discussed to determine the suitable number of hidden neurons of Bayesian neural network. The outline of this paper is as follows. In the next four sections, wavelet energy signature, Bayesian neural network, outlier probability estimation, and evidence criterion for model selection will be brieﬂy discussed. In Section 5, experiments are carried out to efﬁciently solve the visual quality recognition of nonwovens. Finally, a general conclusion is demonstrated. 2. Wavelet energy signature The central conception of image recognition using textural characters is that the speciﬁc information of image can be interpreted and represented by some textural features. In another word,

8498

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

the surface quality of nonwovens can be portrayed and characterized through the visual properties, or different textural features based on various methods. In this paper, wavelet energy signature is proposed. The basic idea of wavelet energy signature is to generate textural features from wavelet subband coefﬁcients or subimages at each scale (Van de Wouwer, Scheunders, & Van Dyck, 1999). Since most relevant texture information has been removed by iteratively low pass ﬁltering, the energy signature of the low resolution image is generally not considered as a texture feature. So in the wavelet energy signature frame, the majority of features are extracted from the high frequency subbands. Taking the assumption that the energy distribution in frequency domain identiﬁes texture, we compute energies of wavelet subband as texture features (Zhang & Wang, 2007). In this paper, norm-1, L1, and norm-2 energy, L2, are used as measures with wavelet base sym6 that has orthogonal and biorthogonal properties. The support width, ﬁlter length, and number of vanishing moments of sym6 are 11, 12, and 6, respectively. From the number of vanishing moments, we can inform that sym6 has sharp ﬂuctuation from cuspate peak to bottom within 6 units, which is more useful to capture the minute difference between the nonwovens. More speciﬁcally, given the wavelet coefﬁcients w(ij) at level l of the subband k, typically the following two values are used as wavelet energy signatures:

L1 ¼

2

L ¼

M X N 1 X jwl ðijÞj ð1 6 l 6 J; k ¼ h; v ; dÞ MN i¼1 j¼1 k

M X N 1 X ðwl ðijÞÞ2 MN i¼1 j¼1 k

ð1Þ

!1=2 ð1 6 l 6 J; k ¼ h; v ; dÞ

ð2Þ

where L1 and L2 are the two energy values of the texture projected to the subspace at level l for subband k. J denotes the maximum decomposition level with wavelet transform; h, v and d denote the horizontal, vertical and diagonal high frequency subband. M N is the size of the coefﬁcient matrix. Because the coefﬁcient matrix is squared, M equals to N. One note should be added is that this is an abuse of terminology since strictly speaking L1 norm is not an energy function. Sometimes it is chosen due to its simplicity (Do & Vetterli, 2002). Analyzed with a statistical vision, L1 and L2 are essentially equivalent to the means and standard deviations of the magnitude of the wavelet coefﬁcient matrix, and a combined feature set of L1 and L2 is emerged. Another important parameter of Eqs. (1) and (2) is the decomposition level l. When the size of an image is ﬁxed, with the given wavelet base, the maximum decomposition level is also ﬁxed. It is given by:

ðLW 1Þ ð2^ lev elÞ < LS

ð3Þ

where LW is the length of wavelet ﬁlter coefﬁcients and LS is the length of image row or column. In our research, because the ﬁlters length of sym6 is 12 and the size of nonwoven image is 512 512, the maximum decomposition level is 4 in theory. Usually, for a 2-D case, the practical decomposition level is no more than 3. For comparison, we will give some experiment results when the nonwoven image is decomposed at level 4. According to 2-D discrete wavelet transform, when the nonwoven image is decomposed at level j, 3 ⁄ j subbands will be generated and 6 ⁄ j energy signatures will be extracted. So, for the Bayesian neural network involved in this research, the number of input neurons is 6, 12, 18, and 24 correspondingly, when the input features are calculated at different four levels.

3. Robust Bayesian neural network Generally, Bayesian neural network for classiﬁcation problem is considered as a special case using multi layer perceptron as classiﬁer that unites with the Bayesian evidence framework proposed by MacKay and backpropagation algorithm used to optimize the weights (Lampinen & Vehtari, 2001; MacKay, 1995; Müller and Insua, 1998; Orre, Lansner, Bate, & Lindquist, 2000; Penny & Roberts, 1999; van Hinsbergen, van Lint, & van Zuylen, 2009). In this research, to detect the outlier in training set, the scaled outlier probability is also taken into account with Bayesian neural network to increase its robustness. Its practical and commonly recognized beneﬁts are 1. Automatic complexity control: Bayesian inference techniques allow the values of regularization factor that is also called as weight decay parameter, to be adjusted using only the training data, without validation data. 2. The over ﬁtting problem can be solved by the introduction of regularization term within the cost function. 3. Possibility to use prior information of the training data that will enhance the performance of classiﬁcation. 4. Robustness is incorporated via a probabilistic deﬁnition of outlier. 3.1. Bayesian neural network This section presents a summary of the key ideas of the Bayesian learning techniques in the context of training multi layer perceptrons (MLPs) on classiﬁcation problems (MacKay, 1992a). At ﬁrst, we deﬁne a network structure, S, which in this paper speciﬁes the number of hidden units in a MLP network. We also deﬁne a prior distribution for the weights in network S, p(w), that express our initial beliefs about the weight before any data have arrived. When the data, D, is the observed, the prior distribution is updated to a posterior distribution according to Bayesian theorem

pðwjD; SÞ ¼

pðDjw; SÞpðwjSÞ pðDjSÞ

ð4Þ

This posterior distribution combines the likelihood function, p(D|w, S), which contains information about w from observation, and the prior p(w|S), which contains information about w from background knowledge. The term in the denominator, p(D|S), is known as the evidence for model S that will be used as a criterion for neural network selection sometimes. Assume that we have a training set D, D ¼ fX u ; T uj g, u = 1, . . . , Nt,Xu denotes the uth feature vector, and T uj the uth corresponding desired target of class j, Nt the size of the training set. Xu = {x1, . . . , xm}, m is the number of features or variables used to capture each sample. T uj ¼ ft u1 ; . . . ; t unc g with t uj ¼ 1, if X u 2 C j , otherwise tj=0. This class labeling scheme is known as 1-of-nc coding, where nc denotes the maximum of the classes. The hidden unit activation function is hyperbolic tangent function. Thus, the output of the hidden neurons for a sample, Xu, may be written as u

hk ðX Þ ¼ tanh

m X

! wIki xi

þ

wIk0

;

k ¼ 1; 2; . . . ; n

ð5Þ

i¼1

where wIik is the weight connecting input neuron i and hidden neuron k, wIk0 is the threshold (bias) for hidden unit k, while m and n are the number of input and hidden neurons, respectively. From the mathematical vision, the neurobiological threshold of the hidden neuron k is also can be taken as a bias weight, if the initial value of bias is set as 1, then Eq. (5) can be rewritten as:

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

hk ðX u Þ ¼ tanh

m X

! wIki xi ;

x0 ¼ 1; k ¼ 1; 2; . . . ; n

ð6Þ

i¼0

If the activation function of the output layer is linear, the hidden neurons’ outputs are weighed and summed, yielding the following networks, u

t j ðX Þ ¼

n X

wHjk hk ðX u Þ

þ

wHj0 ;

j ¼ 1; 2; . . . ; nc

ð7Þ

C w ðwÞ ¼ ED ðwÞ þ aEw ðwÞ

where is the weight connecting hidden neuron k and the output neuron j, wHj0 is the threshold for the unbounded output j and nc is the number of output neurons. Similarly, Eq. (7) also can be formulated simply as:

t j ðX u Þ ¼

n X

wHjk hk ðX u Þ;

h0 ðX u Þ ¼ 1; j ¼ 1; 2; . . . ; nc

ð8Þ

k¼0

To ensure that the outputs can be interpreted as probabilities, we use the normalized exponential transformation known as SoftMax (Bridle, 1990):

ED ðwÞ ¼

Ew ðwÞ ¼

ð9Þ

where ^t uj is short for the estimated posterior probability that sample, Xu, belongs to class Cj. We thus have the following properties: P ^u 0 6 ^tuj 6 1, and nc j¼1 t j ¼ 1. Additionally, we assume that the prior for one sample that belongs to the speciﬁc class Cj is uniform, and equal to 1/nc. 3.2. Outlier detection In this paper, outliers are deﬁned as those samples having the corresponding target class label erroneously ‘ﬂipped ’ to another class. If the sample for some reason is erroneously registered, the label can have a random relation to the input pattern. Hence, we deﬁned a probability e of being assigned with random target label (Larsen et al., 1998; Sigurdsson et al., 2002). The outlier probability e 2 [0, 1] is assumed to be independent of both ‘true’ class label and input pattern value. When the outlier probability is considered, the posterior probability distribution can be formulated as

^0 ðT uj jX u Þ ð1 eÞ þ ^ðT uj jX u Þ ¼ p p

e

c X

c1

i¼1;i–j

^0 ðT ui jX u Þ p

ð10Þ

^0 ðT uj jX u Þ is the posterior probability with zero outlier probwhere p ability. The ﬁrst term in Eq. (10) is the probability that the input pattern Xu is not an outlier, while the second term is the outlier contribution coming from classes other than T uj . Deﬁning a scaled outlier probability b = e/(nc 1), Eq. (10) can be rewritten as

^ðT uj jX u Þ ¼ p ^0 ðT uj jX u Þ ð1 bcÞ þ b p ^0 ðT uj jX u Þ p

ð11Þ

Pnc

Nt X nc 1X T u log ^t uj q u¼1 j¼1

Nw 1X w2 ; 2 r¼1 r

Nw ¼ ðm þ 1Þ n þ ðn þ 1Þ nc

ð13Þ

ð14Þ

In this paper, a is the regularization factor and the second term of Eq. (12) is also called weight decay in the neural network community since it penalizes large weights (MacKay, 1992b). Substitute Eq. (11) into Eq. (13), ED(w) can be rewritten as following:

ED ðwÞ ¼

u

exp½t j ðX Þ ^t u ¼ p ^ðT uj jX u Þ Pnc u j 0 0 j ¼1 exp½t j ðX Þ

ð12Þ

where ED(w) is cross entropy function that is appropriate if we want to interpret the network outputs as probabilities and Ew(w) is the regularization term, as shown in Eqs. (13) and (14), respectively (Rojas, 1996).

k¼1

wHjk

8499

Nt X nc 1 X ^0 ðT uj jX u Þ ð1 bcÞ þ bÞ T u ½logðp Nt u¼1 j¼1

ð15Þ

And Eq. (12) can be formulated into

C w ðwÞ ¼

Nt X nc 1 X ^0 ðT uj jX u Þ ð1 bcÞ þ bÞ þ aEw T u ½logðp Nt u¼1 j¼1

ð16Þ

The gradient is u Nt X nc @C w bc 1 X T u @^tj @Ew ¼ þa u ^ Nt u¼1 j¼1 t j @wl @wl @wl

ð17Þ

and the Hessian,

" # u u Nt X nc @ 2^tuj @2Cw ðbc 1Þ2 X Tu 1 @^tj @^t j ¼ u þ ^u ^t @wl @ws @wl @ws @wl @ws Nt u¼1 j¼1 t j j þa

@ 2 Ew @wl @ws

ð18Þ

It is often desirable for computational reasons to use GaussianNewton approximation of the Hessian instead, u u Nt X nc @2Cw ðbc 1Þ2 X 1 @^tj @^tj @ 2 Ew þ ag u ^ @wl @ws @wl @ws Nt @wl @ws u¼1 j¼1 t j

ð19Þ

If the gradient and Hessian metrics are calculated with Eqs. (17) and (19), there are several optimization algorithms can be executed for weight update. In this paper, the weights are optimized using the UCMINF, i.e., an algorithm for unconstrained, nonlinear optimization, proposed by Hans Brunn Nielsen ( Nielsen, 2001). A brief introduction of weights updating using UCMINF method is given below.

^ u u i¼1 p0 ðT i jX Þ

Since 0 6 6 1 and ¼ 1 due to mutual excluP ^ðT uj jX u Þ 6 1 e and further cj¼1 p ^ðT uj jX u Þ ¼ 1. sive classes, then b 6 p From the statistic vision, the scaled outlier probability b can be estimated from a ﬁnite range [0, 1/nc]. 3.3. Weight optimization using UCMINF algorithm As the main spirit of learning in the multi layer perceptron, the training processing will adjust the network weights and thresholds in order to minimize the cost function. For a classiﬁcation problem with multiple classes, the cost function Cw(w) of Bayesian neural network is constituted of cross entropy function and a regularization term as shown in Eq. (12) (Hintz-Madsen, Hansen, Larsen, Pedersen, & Larsen, 1998)

1. With the training set, we can get the initial neural network architecture, i.e., the number of neurons in input, hidden, and output layer, which are denoted by Ni, Nh and No, respectively. 2. Initialize weights, e.g., random weights drawn from a Gaussian pﬃﬃﬃﬃﬃ distribution with variance is 1= N i . The initial weights, including the input layer to hidden layer and hidden layer to output layer, are denoted by w0. 3. With initial weights, regularization factor a0 = 0, and scaled outlier probability b0 = Ni, from Eqs. (16), (17), and (19) calculate the initial values of regularization cost function C 0w , ﬁrst order gradient matrix G0 and Hessian matrix H0, then compute the inverse Hessian matrix Z0. 4. Update the weights with UCMINF algorithm.

8500

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

The UCMINF algorithm is of quasi-Newton type with BFGS updating of the inverse Hessian and soft line search with a trust region type monitoring of the input to the line search algorithm, where BFGS is the most popular quasi-Newton algorithm, named for its discoverers Broyden, Fletcher, Goldfarb, and Shanno, while line search and trust-region methods both generate steps with the help of a quadratic model of the objective function (Nocedal & Wright, 2006). The algorithm is outlined following. The initial values of the important parameters are set as w :¼ w0; G :¼ G0; Z :¼ Z0; D :¼ D0; t :¼ t0 t

t

t

h :¼ Z ðG Þ

ð20Þ

If ||ht||2 > D, then ht+1 :¼ (Dt/||ht||2) ⁄ ht,

½n; ttþ1 ¼ linesearchðwt ; h

tþ1

; tt Þ

tþ1

wtþ1 ¼ wt þ n h

ð22Þ

c

^0 ðT uj jX u Þð1 bcÞ þ b p

ð27Þ

^ p ^0 ðT uj jX u ÞÞ bð1 ^ðT uj jX u Þ p

ð28Þ

In this paper, if the estimated outlier probability is larger than 0.1, the corresponding sample will be considered as an outlier. If the scaled outlier probability is less than 0.002, i.e., 1/Nt, there is no outlier in the training set. Moreover, because the scaled outlier probability b is estimated from the training set, Eq. (27) is only applicable for the calculation of outlier probability within the same training set. 3.6. Criteria for hyperparameter convergence The training of multi layer perceptron based on Bayesian evidence framework stops when two hyperparameters converge to the predeﬁned criteria:

aðtþ1Þ jbðtþ1Þ bt j bðtþ1Þ

6 105

ð29Þ

6 105

ð30Þ

ð23Þ

We will stop the training of multi layer perceptron on the condition that Eqs. (29) and (30) are met at the same time. If a(t+1) = 0, the convergence criterion for regularization parameter becomes |at+1 at| 6 1/105, which is also true to Eq. (30).

ð24Þ

4. The evidence framework for model evaluation

ð25Þ

In Eq. (25), the terms independent of b have been omitted. We suggest using Brent’s method (Yildirim, Akçay, Okur, & Yildirim, 2003), ^ This is possible approximating F(b) as quadratic function to ﬁnd b. as F(b) is a smooth function and we have an upper and lower bound ^ As Brent’s method does on b setting the range for the search of b. not use gradient information, we avoid evaluating oCw(w)/ob which has the unpleasant property of zero denominator when b = 0. The a is computed by maximizing log p(D|a, b), evaluating o log p(D|a, b)/oa which gives the following update formula

2Ew ðWMPÞ

^0 ðT uj jX u ÞÞ bð1 p

jaðtþ1Þ at j

^ an estimate of b, is done by minimizing Finding b,

a¼

poutlier ¼

^outlier ¼ p

As mentioned above, a and b can be used to describe the distribution of regularization parameter and scaled outlier probability, respectively, which are also called as hyperparameters. The posterior distribution for the hyperparameters is given by

FðbÞ / log pðDja; bÞ 1 ¼ C w ðWMPÞ þ jHj 2

Having estimated the network from the training data, it is possible to evaluate the outlier probability of the labeled samples

The estimated outlier probability can be calculated from

3.4. Adapting hyperparameters

pðDja; bÞpða; bÞ pðDÞ

3.5. Outlier detection

ð21Þ

Update D and D using BFGS and line search algorithm, until ||G||1 6 e1 or ||ut+1 wt||2 6 e2(kwt+1k2 + e2) or tt+1 P tmax. Where D0 is the current ‘‘trust region’’ radius and set as 1, t0 and tmax are the min and max number of function evaluations line search algorithm, which are set as 1 and 1000, respectively. Notations, || ||2 and || ||1, denote the calculation of 2-norm and inﬁnite norm with the input matrix. We set the two controlling parameters, e1 and e2 are as 104 and 108. With UCMINF algorithm, we can ﬁnd a local minimum of the regularization cost function that is considered as a quadratic and continuously differentiable function with weights as independent variables.

pða; bjDÞ ¼

the network to make a lager network. If the larger network has the same ﬁnal c, the smaller network is large enough; and (2) if the network is sufﬁciently large, then a second larger network will achieve comparable values for c (Xu et al., 2006).

ð26Þ

where c = Nw a Trace A1 is the number of effective parameters (‘well-determined’ weights), i.e., the number of weights whose values are determined by the training data rather than by the prior, in the networks. The number of effective parameters is a measure of how many parameters in the network are effectively used in reducing the error function. It can range from zero to Nw. After training, we need to do the following checks: (1) if c is very close to Nw, the network may be not large enough to properly represent the true model. In this case, we simply add more hidden neurons and retrain

In the Bayesian framework, it is natural to consider not just a single neural network model, but a whole ensemble of models. This leads to the use of committees of networks where the overall prediction on a new data point is the combined prediction of many networks. In this paper, the evidence criterion of robust Bayesian neural network is also evaluated with the ﬁnite data. The criterion, ‘evidence’, is a quantity which can be calculated from the training set only. The neural network is selected that has the highest ‘evidence’ (Thodberg, 1996). W.D. Penny also pointed out that model selection using the evidence criterion is only tenable if the number of training examples exceeds the number of network weights by a factor of 5 or 10. For a Bayesian neural network as described above, the log of evidence is

log Ev ¼ C w ðuÞ þ logðOccw Þ þ logðOcca Þ 1 Nk ln jHj þ ln nkh ! þ nkh ln 2 þ w ln ak 2 2 1 4p K lnðln XÞ logðOcca Þ ¼ ln 2 ck

logðOccw Þ ¼

ð31Þ ð32Þ

ð33Þ

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

The ﬁrst term in Eq. (31) is the regularization cost function, i.e., Eq. (16). The next two terms in Eq. (31) are ‘Occam factors’. nkh and ak are the number of hidden neurons in the kth neural network and regularization factor when the Bayesian neural network converged, respectively. ck is the number of ‘well determined weight’ of kth Bayesian neural network. K and X are the number of weight groups and the scale of the prior for each ak. The ﬁrst Occam factor, deﬁned in Eq. (32), is the Occam factor for the weights. The ﬁrst term in Eq. (32) is the negative log determinant of the Hessian matrix. This term measures the posterior volume in weight space. It derives from the assumption that the posterior distribution, p(w|D, S), is Gaussian. The next two terms in Eq. (32) arise from the redundancy of representation in a single hidden-layer MLP having nh hidden neurons. The last term in Eq. (32) measures the prior volume in weight space. The second Occam factor, deﬁned in Eq. (33), is the Occam factor for the weight decay factor. The ﬁrst term in Eq. (33) captures the posterior uncertainty in the hyperparameters. The second term in Eq. (33) captures the prior uncertainty in the hyperparameters. This is expressed in terms of a parameter which captures our prior belief in the range of scales, within which we believe each hyperparameter to lie. This is subjectively set to 103, meaning that before any data are observed, we believe, we know the value of each ak to within three orders of magnitude. Once a network has been trained, Eqs. (31)–(33) can be used to calculate the log evidence for that network. This is used for model selection. In our experiment, the number of training samples is at least the same as or larger than the number of weights. If 100 samples of each grade, 500 of ﬁve grades totally, are selected to consist the training set, the number of hidden neurons should be less than 11, when the nonwoven image is decomposed at level 4 and 24 features extracted as input for Bayesian neural network. So, in this paper, the number of hidden neurons arranges from 1 to 10, and the 10 corresponding networks in the committee are evaluated with the log evidence. However, if the Gaussian approximation of the posterior of the weight distribution is not valid, then the evidence will not be accurately determined. In experiment, although the initial weights are Gaussian distribution, with the training processing, when the Bayesian neural network converges, the weights maybe not are Gaussian distribution yet. In this condition, the evidence criterion is just used as a complementary one for model selection. Additionally, we ﬁnd that the most suitable neural network is near to the one with the highest evidence, if the average identiﬁcation curacy is considered as the main evaluation parameter. Holding this belief in heart, we can compress the selection space to some extent, if the highest value of evidence and its change trend are observed.

images in TIFF format with the resolution 50 dpi. Figs. 1–5 show the 512 512 pixel samples taken from each class. All standard nonwoven images presented in this paper are the copyright property of GEXTEX in ENSAIT and reproduced with permission. From Figs. 1–5, we can ﬁnd that these 5 images have similar texture, especially the ﬁrst 4 images, even though they belong to

Fig. 1. A sample of grade A.

Fig. 2. A sample of grade B.

5. Experiment and discussion 5.1. Experiment materials Before taking photos, all nonwoven samples are classiﬁed into ﬁve grades, according to the visual quality, such as surface uniformity, the condition of pilling, wrinkles, and defects of the reference samples provided by nonwoven manufacturing company, by seven qualiﬁed experts, especially the surface uniformity. In this experiment, the visual quality of the samples declines from grade A to E. To evaluate the proposed method, 625 nonwoven samples of 5 grades, 125 ones of each grade, are used. All samples of the same grade are taken from the same roll formed by needle punching process with polyester ﬁbers (thinness: 1.67 dtex, length: 60 mm), and the basis weight and thickness are 230 g/m2 and 1.55 mm. However, the processing parameters of ﬁve grades are different. The images are captured under reﬂective light and saved as gray

8501

Fig. 3. A sample of grade C.

8502

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

The procedures of the detailed operation are as following:

Fig. 4. A sample of grade D.

1. For each original image, normalized to zero mean and unit variance, and decomposed at level j, 6 ⁄ j features, both L1 and L2, will be calculated from 3 ⁄ j high frequency subbands, which constitute a feature vector. Six hundred and twenty-ﬁve feature vectors will be gotten from all corresponding images. The max decomposition level is 4. 2. Select 100 feature vectors of each grade randomly and combine them together as training set. The left ones, 25 of each grade, totally 125 of ﬁve grades are used as test set. 3. Construct and initialize a Bayesian neural network under the algorithm described above. The class labeling scheme is 1-of-nc coding. 4. With the training set, optimize the weights using UCMINF algorithm. We call each UCMINF optimization as one epoch or hyperparameter update. After each epoch, compute the important parameters, i.e., average classiﬁcation error and cross entropy error of training and test set, Log evidence, three hyperparameters: regularization factor (alpha), scaled outlier probability (beta) and the number of well determined weights (gamma), and record the computation time. 5. Repeat step 4, until Bayesian neural network converges. 6. Change the number of hidden neurons of Bayesian neural network from 1 to 10, repeat steps 1–5, and calculate the evidence to select the most suitable model. If the nonwoven images are decomposed at level 3 and the number of hidden neurons is ﬁxed, the above procedures can be illustrated with Fig. 6. 5.3. Results and discussion

Fig. 5. A sample of grade E.

different visual quality grades. However, from the real samples, we can ﬁnd that the surface of grade A is nearly uniform without messy ﬁbers and pilling. For grade B, the sample has a few messy ﬁbers but pilling. For grade C, the sample has obvious messy ﬁbers and mild pilling. For grade D, the sample has severe messy ﬁbers and mild wrinkles. While, grade E has obvious wrinkles or defects.

5.3.1. Without evidence criterion In this part, we just consider that the number of hidden neurons is ﬁxed and is equal to 5. The evidence framework for model selection will be discussed in Section 5.3.2. In this experiment, each nonwoven image is decomposed at 4 levels. In this part, we take level 3 as example. When decomposed at level 3, each nonwoven image provides 18 features to characterize itself. With the randomly selected training and test set, within

nonwoven image

wavelet transform with sym6 at level 3

5.2. Experiment procedure The proposed approach is performed on MATLAB 2008A platform. All programs are carried out on a PC with Windows XP (SP3) system, 3.25 GB memory, and Intel (R) Core (TM) 2 Duo CPUs (E7500, 2.93 GHz). In our experiment, to eliminate the effect of common range in the gray level of images from a same grade and to make the identiﬁcation of Bayesian neural network less biased, each image is individually normalized to zero mean and unit variance before wavelet transform. We employ 2D discrete wavelet transform with orthogonal wavelet base, sym6, at four levels. For each nonwoven image, two energy-based features L1 and L2 are calculated from each of 12 high frequency subbands with Eq. (1), (2), and (24) features will be calculated in total. Our hypothesis is that these 24 features capture important textural characteristics of the original image and have discrimination power among visual quality recognition.

H1

V1

D1

H2

V2

D2

H3

V3

D3

3 subbands at level 1

3 subbands at level 2

3 subbands at level 3

Norm-1 & norm -2 energy signatures L1s and L2s Robust Bayesian neural network

grade of input sample Fig. 6. The procedure for classiﬁcation of visual quality of nonwovens.

8503

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

1.4 Test set Training set

1.2

Average cross-entropy error

12 epochs, the Bayesian neural network converges with a = 0.189545868578179 and b = 0.000566035100271378. The two hyperparameters a and b values within 12 epochs are shown in Fig. 7. From a curve in Fig. 7, we can ﬁnd that with the hyperparameter updating, regularization factor a declines sharply at the beginning. The initial value of a is 18, and after one epoch, it decreases to 0.731205040024474. After epoch 4, the decrement of a is less than 0.2, and the difference of a of the last two epochs reaches the convergence criterion of Bayesian neural network, i.e., 105. Within the ﬁrst two epochs, the value of b is the same as its initial one, i.e., 0. From epoch 3 to 5, b increases obviously. After epoch 5, b has similar trend to a. In the b curve, the maximum is less than 6 104, in the permitted range. In one word, with the UCMINF optimization, a and b converge to the predeﬁned criteria rapidly. The average classiﬁcation error curve of training and test is demonstrated in Fig. 8. We can ﬁnd that two classiﬁcation error curves have similar decrease trend, which means no overﬁtting occurs during the training process. After one epoch, the max average classiﬁcation error of training and test set are 0.054 and 0.24, respectively. When the Bayesian neural network converges, the average classiﬁcation error of training and test set is 0.002 and 0.016, respectively. What’s very interesting is that after 3 epochs,

1

0.8

0.6

0.4

0.2

0

1

2

3

4

5

6

7

8

9

10

11

12

Number of hyperparameter updates Fig. 9. The cross entropy error curve within hyperparameter updating.

40

35 15 10 5 0

1

2

3

4 5 6 7 8 9 10 Number of hyperparameter updates

11

12

gamma value

alpha value

20

-4

x 10

beta value

6

30

25

20

4

15

2 10 0

1

2

3

4 5 6 7 8 9 10 Number of hyperparameter updates

11

1

2

3

4

5

6

7

8

9

10

11

12

Number of hyperparameter updates

12

Fig. 10. The number of well determined weights within hyperparameter updating.

Fig. 7. Alpha and beta values within hyperparameter updating.

-4

0.25

6

x 10

Test set Training set

5

outlier probability

Classification error

0.2

0.15

0.1

4

3

2

0.05 1

0 1

2

3

4

5

6

7

8

9

10

11

12

Number of hyperparameter updates Fig. 8. The average classiﬁcation error curve within hyperparameter updating.

0

0

50

100

150

200

250

300

350

400

number of training samples Fig. 11. The outlier probability of training samples.

450

500

8504

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

140

-110

120

computation time (in seconds)

-120

Log-evidence

-130

-140

-150

-160

1

2

3

4 5 6 7 8 9 10 Number of hyperparameter updates

11

80

60

40

0

12

1

2

3

4 5 6 7 number of hidden neurons

8

9

10

Fig. 15. The computation time of Bayesian neural network with different structure.

Fig. 12. The evidence versus number of hyperparameter updates.

70

-100 inneus=6 inneus=12 inneus=18 inneus=24

-150

inneus=6 inneus=12 inneus=18 inneus=24

60

50

-200

gamma

Log-evidence

100

20

-170

-180

inneus=6 inneus=12 inneus=18 inneus=24

-250

40

30

-300

20

-350

10

-400 1

2

3

4

5

6

7

8

9

10

1

2

3

4

5

6

7

8

9

10

number of hidden neurons

number of hidden neurons

Fig. 16. The number of well determined weights of different networks. Fig. 13. Log evidence versus number of hidden neurons.

1 300

inneus=6 inneus=12 inneus=18 inneus=24

0.9

weight histgram the fitted Gaussian function

0.8

R=gamma/Nw

250

200

150

0.7 0.6 0.5 0.4

100 0.3 0.2

50

0.1 1

0 -5

-4

-3

-2

-1

0

1

2

3

4

Fig. 14. Histogram of the converged weights and the ﬁtted Gaussian function.

2

3

4

5

6

7

8

number of hidden neurons Fig. 17. The R versus number of hidden neurons.

9

10

8505

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

the average classiﬁcation error of both training and test set reaches the min value. The cross entropy error curve of the training and test set is illustrated in Fig. 9 that has similar trend of the average classiﬁcation error curve. In Figs. 8 and 9, not only the average classiﬁcation error but the cross entropy error, the one of test set is a little larger than training set, and both of them reach the min values at the same epoch. The min value of cross entropy error of training and test are 0.0291553662205282 and 0.13896309169767 at the 8th epoch. From the cross entropy error curve, we also note that no overﬁtting occurs for the one of test set decreases with the training set. The number of ‘‘well determined weights’’ is shown in Fig. 10. The total number of weights of the Bayesian neural network is 125 when the features extracted at level 3 are used as inputs. The number of training samples is 500, which meet the empirical knowledge, i.e., the least number of training samples are much larger than the total number of weights. From Fig. 10, we ﬁnd that the ﬁnal number of ‘‘well determined weights’’ is 35, and the ratio to the total number of weights is 0.28, no more than one of three. With this, we can conclude that the structure of Bayesian neural network with 5 neurons in hidden layer is compact and with little redundancy. Combining Figs. 8–10, we can get an evident conclusion: with the number of ‘‘well determined weights’’ increase, the performance of the Bayesian neural network becomes more and more excellent, and the average classiﬁcation error and the cross entropy error decrease simultaneously. With Eq. (28), we can calculate the outlier probability of each training sample. In our experiment, we ﬁnd that the max outlier probability of training sample is 0.0006, so no outlier is in the training sample set. When the Bayesian neural network is trained with 500 samples, each is described by 18 features, the outlier probability of the 500 training samples is shown in Fig. 11. From Fig. 11, we can observe that at the point of 100, 200, 300, and 400 in x-axis, there are four obvious peaks; the responsible reason for which is that the 100 samples of the same grade are stacked together in the training set, when a sample of another grade is coming, the risk of outlier for it increases a little compared with the ones of the same grades. In our experiments, with the increase of decomposition level, from level 1 to 4, the number of features, L1 and L2, increases from 6 to 24. Directly, the structure of Bayesian neural network becomes more complex due to the increase of neurons in the input layer, although the hidden and output layers are with ﬁxed neurons. When the Bayesian neural network converges with the features extracted at different level from the same samples, some important parameters, i.e., average classiﬁcation error and cross entropy error of training and test, a and b, the number of ‘‘well determined weights’’ and computation time are summarized in Table 1. In Table 1, since we just want to focus on the trend of the important parameters with the increase of decomposition level, the numerical precision is 106. The acronyms are deﬁned here. CETR and CETE denote the classiﬁcation error of the training and test set, respectively. CEETR and CEETE are short for the cross entropy error of training and test set, while TNW means the total number of weights. Cpu time is for the computation time with second as unit.

From Table 1, we can conclude the following points. With the increase of decomposition level, the average classiﬁcation error decreases sharply, especially from level 2 to level 3. At level 4, the average classiﬁcation error of the training set is zero, and the one for test set is 0.032 (3.2%). With the increase of decomposition level, the cross entropy error for both training and test set decreases obviously, especially from level 2 to level 3. We also ﬁnd that the average classiﬁcation error and cross entropy have similar digressive trend, and the sharp descending occurs at the crossing point of level 2 and level 3. Taking a look at regularization parameter a, from level 1 to 4, we ﬁnd that it increases slowly, but the minute increment indicates that the neural network becomes more complex due to the number of input features increases from 6 to 24. All values of the scaled outlier probability b are less than the predeﬁned threshold 0.002, which means that in the training set, no outlier sample exists. We also note that the number of well determined weights changes irregularly, from level 1 to 2, when it increases means more weights are determined by the training data and used to reduce the neural network’s error. It’s same true with the increase of number of input neurons for more texture feature extracted from level 3 to 4. But, what is interesting is that the minimum of gamma appears at level 3. From the next column, we can see that the total number of weights (NWT) increases from level 1 to level 4, while the ratio of gamma to NWT is descending from 0.6615 to 0.2516. From levels 1 to 4, it takes more time to converge the Bayesian neural network, even though from levels 1 to 3, the number of iteration epoch decreases a little. One direct and powerful explanation is that with the increase of input neurons, the neural network becomes more and more complicated. Looking through Table 1 synthetically, especially taking the average classiﬁcation error of test, the complexity of model, and the computation time as the main evaluation criteria, the Bayesian neural network built at level 3 is the best. The recognition accuracy in percentage of ﬁve grades is listed in Table 2 with the increase of decomposition level.

Table 2 Recognition accuracy of ﬁve grades at different levels (in percentage).

Level Level Level Level

1 2 3 4

Grade A

Grade B

Grade C

Grade D

Grade E

Average

64 88 96 92

68 72 100 92

68 72 96 100

96 96 100 100

100 100 100 100

79.2 85.6 98.4 96.8

Table 3 The total weights with the change of input and hidden neurons. hidden input

1

2

3

4

5

6

7

8

9

10

6 12 18 24

17 23 29 35

29 41 53 65

41 59 77 95

53 77 101 125

65 95 125 155

77 113 149 185

89 131 173 215

101 149 197 245

113 167 221 275

125 185 245 305

Table 1 Important parameters of Bayesian neural network.

level level level level

1 2 3 4

epochs

CETR

CETE

CEETR

CEETE

alpha

beta

gamma

TNW

cputime

14 15 12 13

0.06 0.014 0.002 0

0.208 0.144 0.016 0.032

0.33969 0.10991 0.029155 0.024802

0.875021 0.556578 0.138963 0.129342

0.037122 0.054562 0.189546 0.323293

8.31E-05 0.000295 0.000566 0.000894

43 55 35 39

65 95 125 155

17 25.17188 26.96875 57.71875

8506

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

indicated by the arrow 1. When the number of input neurons changes from 6 to 24, the Log evidence increases obviously even though the number of hidden neurons is same, except for the points marked by ellipses. If the Log evidence is a performance criterion to evaluate the discriminative power of neural network, i.e., networks should be selected which have the highest ‘evidence’, the suggestion when the number of training samples and hidden neurons is ﬁxed, more features can improve the classiﬁcation performance seems to be true. However, a classical and powerful contrary of the suggestion is the Hughes phenomenon as mentioned before. So, how many features should be used for an object is also an opening project yet. The maximum of the Log evidence achieves when the number of hidden neurons is 2 or 3 of the four curves, which means for the problem of visual quality recognition with ﬁve classes, when the number of features extracted to capture the textural characters differs distinctly, from 6 to 24, the Bayesian neural network has 2 or 3 hidden neurons is sufﬁcient, if just from the vision of evidence. However, the actual identiﬁcation accuracy of Bayesian neural network with different structure as shown in Table 4 indicates the subtle difference. In Table 4, when the number of input neurons is ﬁxed, with the increase of hidden neurons, there is no obvious difference of identiﬁcation accuracy in the same row, except the one using 1 hidden neuron. While, when the number of hidden neurons is ﬁxed, with the increase of input neurons, from 6 to 18, the identiﬁcation accuracy increases simultaneously. When the number of hidden neurons increases from 18 to 24 as shown in the last two rows, there is no increment of identiﬁcation accuracy with the hidden neurons arrange from 4 to 10. The two reasons for this have been mentioned below Table 2. When the number of input neurons is 18, the maximum of identiﬁcation accuracy is 98.4%, and with the increase of hidden neurons, the identiﬁcation accuracy is unchanged. In Table 4, the highest identiﬁcation accuracy is 99.2% that locates in the crossing point where the number of input and hidden neurons is 24 and 3, respectively. While, taking Fig. 13 into account, we can ﬁnd the largest Log evidence is not achieved at the point where the number of hidden neurons is not 3, but 2, when the neural network has 24 input neurons, i.e., the curve with ‘+’. From this phenomenon, we can infer that the distribution of weights is not as the initial one, strict Gaussian distribution when the Bayesian neural network converges. According the central tenet of evidence framework (the posterior distribution of weight is Gaussian), the evidence criterion for model selection is not always tenable, especially the number of training samples is not much lager than the number of network weights by a factor of ﬁve or ten. In our experiment, the basic premise of evidence criterion is not always met, for example in Table 3, in most cases the number of weights is larger than 100, the 20%of the number of training samples. But, an interesting phenomenon is that the Bayesian neural network that gives the best results is near to the one has the highest evidence, except for the one with 18 input neurons, the highest recognition accuracy of which is achieved when the Bayesian neural network has 9 hidden neurons, as indicated by arrow 2.

From Table 2, we can observe that at any level, the recognition accuracy of grade E is always 100%. For grade D, the recognition accuracy at level 1 and 2 is 96% and at level 3 it increase to 100%. For grades A, B, C, from level 1 to level 4, the recognition accuracy increases obviously, except for grades A and B. We also ﬁnd that, from levels 3 to 4, except grade C, there’s no increment of the recognition accuracy. One reason for this phenomenon is that most of the texture discrimination information based on energy features of nonwoven image lives in the ﬁrst three scales of wavelet decomposition, and at level 4, the 3 subbands provide less useful, even interferential information, despite the maximum decomposition level of nonwoven image is 4 in theory when whose size is 512 512. Taking the average recognition accuracy into consideration, we note that from level 3 to level 4, there is no increment of recognition accuracy, which is known as Hughes phenomenon, i.e., keeping the number of training samples and increasing the dimension of features (input neurons) result in an increase the classiﬁcation error (Theodoridis & Koutroumbas, 2006). 5.3.2. Model evaluation with evidence criterion In Tables 1 and 2, we use the decomposition level as a replacement of the number of input neurons to access its effect on the performance of Bayesian neural network. As the conclusion before-mentioned, the decomposition level has signiﬁcant impacts on the identiﬁcation result. While, in the below discussion to demonstrate the relationship between the structure and log evidence, we will use the corresponding number of input neurons to replace the decomposition level of wavelet transform. The number of input neurons is six times of the decomposition level. When the Log evidence is used as a criterion for model evaluation, if the number of input and output neurons of the Bayesian neural network is ﬁxed, the change of structure depends on the number of hidden neurons. Holding the basic rule in heart, i.e., the number of training samples is larger than the total number of weights, we change the number of hidden neurons from 1 to 10. In our experiment, with the increment of decomposition level, the number of features used to describe a sample will increase from 6 to 24. All Bayesian neural networks involved in this experiment have 5 output neurons. Table 3 shows the relationship between the input neurons, hidden neurons and total weights. In Table 3, the max one of the total weight number is 305 that is less than the training samples. In our experiment, we note that when the number of input and hidden neurons is unchanged, the Log evidence tends to converge to a minimum with the update of hyperparameters, as Fig. 12 shows the change of Log evidence of a Bayesian neural network with 18 input neurons and 5 hidden neurons. The maximum, 113.6378, and minimum, 170.9687, are achieved at epoch 2 and 10, respectively. When Bayesian neural networks converge, the curves of Log evidence are illustrated in Fig. 13. The acronym ‘inneus’ means the number of input neurons. The four curves in Fig. 13 nearly have the same trend, an inverse p tick , except for the one of ‘inneus = 24’ that has an obvious sink when the Bayesian neural network has 8 hidden neurons, as

Table 4 The identiﬁcation results of Bayesian neural network with different structure (per centage). Input

6 12 18 24

Hidden 1

2

3

4

5

6

7

8

9

10

71.2 64.8 80 84

76.8 79.2 91.2 97.6

79.2 78.4 98.4 99.2

78.4 82.4 98.4 98.4

79.2 85.6 98.4 96.8

79.2 86.4 98.4 98.4

78.4 85.6 98.4 96.8

78.4 87.2 98.4 96.8

78.4 88.8 98.4 98.4

78.4 87.2 98.4 98.4

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

This is helpful to limit the size of committee for model selection. When the Bayesian neural network with 18 input and 9 hidden neural neurons converges, the histogram of the weights and the ﬁtted Gaussian function, the mean and standard deviation of which is estimated using maximum likelihood method as described in Liu et al. (2009). For Fig. 14, the mean and standard deviation of the initial weights are 0 and 0.2357, i.e., the reciprocal of square root of Ni. When the Bayesian neural network converges, the mean and standard deviation of weights are 0.0251 and 0.8472. Considering the weights as a Gaussian series, the estimated mean and standard deviation using max likelihood method are 0.0251 and 0.04913, respectively. Although, the estimated mean is the same as the actual one, while the values of standard deviation have obvious differences, which is demonstrated visibly in Fig. 14. The distinct differences, especially at the tails, between the histogram and the ﬁtted Gaussian curve implies that the weights of the converged Bayesian neural network is not normal distributed, which is far away from the central conception of evidence framework. So, for the ﬁnite industrial data, the evidence framework for model selection is not always tenable. In our experiment, the computation time is also considered as an evaluation parameter. As shown in Fig. 15, the computation time of Bayesian neural network when it converges. With the increase of input and hidden neurons, the structure of the network becomes more and more complex, and it will take more time to adjust its weights to meet the convergence criterion. Comparing the four curves, we can ﬁnd that the neural network that has more input neurons needs more time to update the structure to meet the convergence criterion, except for the red line with star and the blue line with circle at the two points where the number of hidden neurons is 3 and 4, respectively. While, if the number of input neurons is ﬁxed, the increase of hidden neurons will cause the network to need more time before convergence, especially when the number of hidden neurons is larger than 4. One important parameter of Bayesian neural network is the number of well determined weights, i.e., ck. The ck of the 40 networks involved in our experiment is shown in Fig. 16. When the number of hidden neurons is larger than 3, the value of ck increases except for the two points as indicated by the arrows. From the trend of the four curves, we can infer that with the increase of the total number of weights, more and more weights are determined by the training data but the prior. But the increase of well determined weights does not improve the performance of the network, if the structure of which is becoming more and more complex for more input and hidden neurons are considered, which has been conﬁrmed by the results in Table 4. Even though the number of well determined weights increases as the network becomes more and complex, the ratio R between ck (gamma, or the number of well determined weights) and Nw (the number of total weights) of different Bayesian neural network decreases as illustrated in Fig. 17. From Fig. 17, we can ﬁnd that with the increase of structure complexity of Bayesian neural network, the proportion of the number of well determined weights decrease sharply, except for the point as indicated by arrow. It is a sufﬁcient and powerful argument to explain why the increment of gamma cannot improve the performance of Bayesian neural network. 6. Conclusion This research indicates the recognition of visual quality of nonwoven with wavelet energy signatures and robust Bayesian neural network. To detect the outlier of training set, the scaled outlier probability is introduced in the algorithm of Bayesian neural network. The Log evidence is used as a criterion to evaluate the performance of the candidates of committee, except the

8507

identiﬁcation accuracy. Experiments with 625 nonwoven images of ﬁve grades show that the proposed method is sufﬁcient and successful. When the nonwoven image is decomposition at level 3, 18 features used to describe the texture of nonwoven, the identiﬁcation of grades B, D and E is 100%, and the average accuracy is 98.4%. When the nonwoven image is decomposed at level 4, i.e., 24 features are employed, the average accuracy of ﬁve grades is 99.2% The experiment in this paper can be considered as a problem of classiﬁcation with ﬁnite samples and Bayesian neural network. From the experiment, we can conclude that when the number of training samples is not much larger than the number of parameters of neural network, the model selection using evidence criterion is not always applicable, but it’s still helpful to limit the selection space effectively to some extent. With the increase of input and hidden neurons, the Bayesian neural network becomes more and more complex, while the ratio between the number of ‘well determined’ weights and the total number of weights decreases obviously, which is a sufﬁcient and powerful argument for why the complicated neural network can not function as expected. This is also an example of ‘Occam’s rule’, i.e., when you have two competing models that make exactly the same predictions, the simpler one is the better (Jaynes, 2003). Acknowledgement This work was supported by ‘‘the Fundamental Research Funds for the Central Universities’’ [Grant No. JUSRP11103]. References Bridle, J. S. (1990). Probabilistic interpretation of feedforward classiﬁcation network outputs with relationships to statistical pattern recognition. Neuron-Computing: Algorithms, Architectures and Applications, 6, 227–236. Do, M. N., & Vetterli, M. (2002). Wavelet-based texture retrieval using generalized Gaussian density and Kullback–Leibler distance. IEEE Transactions on Image Processing, 11(2), 146–158. Hintz-Madsen, M., Hansen, L. K., Larsen, J., Pedersen, M. W., & Larsen, M. (1998). Neural classiﬁer construction using regularization, pruning and test error estimation. Neural Networks, 11, 1659–1670. Jaynes, E. T. (2003). Probability theory: The logic of science (1st ed.). United Kingdom: The Press Syndicate of the University of Cambridge. Lampinen, J., & Vehtari, A. (2001). Bayesian approach for neural networks – Review and case studies. Neural Networks, 14, 257–274. Larsen, J., Nonboe, L., Hintz-Madsen, M., & Hansen, L. K. (1998). Design of robust neural network classiﬁers. Proceedings of the 1998 IEEE international conference on acoustics. Speech and Signal Processing, 2, 1205–1208. Liu, J. L., Zuo, B. Q., Vroman, P., Rabenasolo, B., & Zeng, X. Y. (2009). Visual quality recognition of nonwovens based on wavelet transform and LVQ neural network. In Proceeding of the 39th international conference on computers & industrial engineering (CIE39) (pp. 1885–1890). Liu, J. L., Zuo, B. Q., Zeng, X. Y., Vroman, P., & Rabenasolo, B. (2010). Nonwoven uniformity identiﬁcation using wavelet texture analysis and LVQ neural network. Expert Systems with Applications, 37(3), 2241–2246. Liu, J. L., & Zuo, B. Q. (2007). The identiﬁcation of fabric defects based on discrete wavelet transform and back-propagation neural network. Journal of Textile Institute, 98(4), 355–362. MacKay, D. J. C. (1992a). The evidence framework applied to classiﬁcation networks. Neural Computation, 4, 720–736. MacKay, D. J. C. (1992b). A practical Bayesian framework for back-propagation networks. Neural Computation, 4(3), 448–472. MacKay, D. J. C. (1995). Bayesian neural network and density networks. Nuclear Instruments and Methods in Physics Research A, 354, 73–80. Militky´, J., Rubnerová, J., & Klicka, V. (1999). Surface appearance irregularity of nonwovens. International Journal of Clothing Science and Technology, 11, 141–152. Müller, P., & Insua, D. R. (1998). Issues in Bayesian analysis of neural network models. Neural Computation, 10(3), 749–770. Nielsen, H. B. (2001). UCMINF – An algorithm for unconstrained, nonlinear optimization. Technical Report. IMM, Technical University of Denmark. Nocedal, J., & Wright, S. J. (2006). Numerical optimization (2nd ed.). New York: Springer. Orre, R., Lansner, A., Bate, A., & Lindquist, M. (2000). Bayesian neural networks with conﬁdence estimations applied to data mining. Computational Statistics & Data Analysis, 34, 473–493. Penny, W. D., & Roberts, S. J. (1999). Bayesian neural networks for classiﬁcation: How useful is the evidence framework? Neural Networks, 12, 877–892.

8508

J. Liu et al. / Expert Systems with Applications 38 (2011) 8497–8508

Rojas, R. (1996). Neural network: A systematic introduction. Berlin: Springer. Sigurdsson, S., Larsen, J., Hansen, L. K., Philipsen, P. A., & Wulf, H. C. (2002). Outlier estimation and detection application to skin lesion classiﬁcation. In Proceedings of the 2002 IEEE international conference on acoustics, speech, and signal processing (ICASSP ‘02) (Vol. 1, pp. 1049–1052). Theodoridis, S., & Koutroumbas, K. (2006). Pattern recognition (3rd ed.). Singapore: Elsevier Pte Ltd. Thodberg, H. H. (1996). A review of Bayesian neural network with an application to near infrared spectroscopy. IEEE Transactions on Neural Networks, 7(1), 56–72. Van de Wouwer, G., Scheunders, P., & Van Dyck, D. (1999). Statistical texture characterization from discrete wavelet representations. IEEE Transactions on Image Processing, 8(4), 592–598.

van Hinsbergen, C. P. IJ., van Lint, J. W. C., & van Zuylen, H. J. (2009). Bayesian committee of neural networks to predict travel times with conﬁdence intervals. Transportation Research Part C, 17, 498–509. Xu, M., Zeng, G. M., Xu, X. Y., Huang, G. H., Jiang, R., & Sun, W. (2006). Application of Bayesian regularized BP neural network model for trend analysis. Acidity and chemical composition of precipitation in north Carolina. Water, Air, and Soil Pollution, 172, 167–184. Yildirim, N., Akçay, F., Okur, H., & Yildirim, D. (2003). Parameter estimation of nonlinear models in biochemistry: A comparative study on optimization methods. Applied Mathematics and Computation, 140(1), 29–36. Zhang, J. M., & Wang, X. G. (2007). Objective grading of fabric pilling with wavelet texture analysis. Textile Research Journal, 77(11), 871–879.

Wavelet energy signatures and robust Bayesian neural network for visual quality recognition of nonwovens

Wavelet energy signatures and robust Bayesian neural network for visual quality recognition of nonwovens

Recommend Documents