Advanced Engineering Informatics 16 (2002) 179–191 www.elsevier.com/locate/aei
Probabilistic inference with maximum entropy for prediction of flashover in single compartment fire E.W.M. Lee, R.K.K. Yuen*, S.M. Lo, K.C. Lam Department of Building and Construction, City University of Hong Kong, Tat Chee Avenue, Kowloon, Hong Kong, People’s Republic of China Received 12 December 2001; revised 7 February 2002; accepted 28 June 2002
Abstract This paper presents the development of a new artificial neural network model using probabilistic mapping with maximum entropy (PEmap). By maximizing the entropy of the conditional probabilities between the input and output vectors, it can carry out prediction with minimum prejudice. In this study, PEmap was employed as a binary classifier for prediction of flashover occurrence in single compartment fire. The results are compared and verified against data obtained by fuzzy ARTMAP (FAM). The performance of PEmap in this study agrees extremely well with that of FAM. q 2002 Published by Elsevier Science Ltd. Keywords: PEmap; Probabilistic inference; Entropy; Flashover; Fuzzy ARTMAP
1. Introduction 1.1. Flashover Flashover is an intermediate stage in fire development. There are several definitions of flashover. In practice, the most frequently used criterion for determination of occurrence of flashover is that the hot gas temperature at 10 mm below ceiling reaches 600 8C [1]. This criterion had been used in the computer simulation of this research. A typical compartment fire growth characteristic curve is shown in Fig. 1. At the initial stage (i.e. growth stage), the fire continues to grow with the presence of sufficient oxygen content and fuel. This is called the growth phase. Fire plume forms and transfers heat energy to the upper part of the compartment. When the hot gas reaches the ceiling, it spreads out and accumulates at the upper part of the compartment. The heat radiation from the hot gas layer and the fire plume then reaches other combustible material and raises the temperature of the material. Should the temperature be higher than the spontaneous ignition temperatures of the combustible materials, they will be ignited. In practice, there is a stage called flashover at which all the combustible material inside the compartment will be ignited simul* Corresponding author. Tel.: þ 852-2194-2307; fax: þ 852-2788-4655. E-mail address:
[email protected] (R.K.K. Yuen). 1474-0346/02/$ - see front matter q 2002 Published by Elsevier Science Ltd. PII: S 1 4 7 4 - 0 3 4 6 ( 0 2 ) 0 0 0 0 9 - 5
taneously. There are many criteria for the occurrence of flashover. McCaffrey et al. [2] proposed a well-known empirical formula for the prediction of hot gas temperature in compartment fire. It can be used to estimate the occurrence of flashover if the temperature reaches 600 8C. However, the empirical equation is too simple to simulate the real situation. More sophisticated mathematical models (e.g. zone models, field models) have been developed based on fluid dynamics theories and equations of physical and chemical reactions. Since the interactions between the mathematical models are so complicated, it can only be executed by a high-speed computer. Computational fluid dynamics (CFD) is a tool for this application. It divides the space into finite number of small volumes and applies the mathematical models to simulate the interactions between the volumes. It should also be noted that CFD requires extensive computer resources for the execution. 1.2. Fuzzy ARTMAP architecture There are many different architectures of Artificial Neural Network (ANN) with different algorithms. The common characteristic of them for prediction is to simulate the general behavior of a phenomenon via network training by learning the history of the phenomenon. There are, in general, two kinds of network learning named supervised
180
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
Fig. 1. Profile of compartment fire growth. The temperature vs. time graph illustrates the profile of fire growth in compartmental fire with identification of growth, flashover, full developed and decay stages. The hidden line show the fire profile of which the heat release rate of the fire is not sufficient to cause flashover.
and unsupervised. In supervised learning, input and output data of system history are known for network training. In unsupervised learning, training data are presented as patterns to be self-organized into one of the clusters by the network itself. FAM [3] is a powerful classification tool with supervised learning. It was developed on the basis of adaptive resonance theory (ART) developed by Grossberg [4]. Fig. 2 shows the basic architecture of FAM. An input vector, a ¼ {a1 ; a2 ; …; aMa } [ RMa with ai [ ½0; 1 for i ¼ 1; 2; …; Ma and Ma is the dimension of the input vector, will be complementary coded in field F0a as A ¼ {a; ac } [ R2Ma ; where ac ¼ {1 2 a1 ; 1 2 a2 ; …; 1 2 aMa }: The complement coded vector, A, will
be presented to field F1a and compared with node j (where j ¼ 1; 2; …; Na) stored in field F2a at which a prototype waj [ R2Ma is stored in the links between all nodes at field F1a and node j at field F2a : The value of Na represents the number of prototypes stored in field F2a : The choice function Tj ¼ lA ^ waj l=a þ lwaj l will be used to measure the similarity between A and waj by fuzzy subsethood [5], where ^ is defined by ðp P ^ qÞi ; minðpi ; qi Þ and the norm l·l is defined by lpl ; m i¼1 £ pi ; where p ¼ {p1 ; p2 ; …; pm }: The prototype J will be chosen if TJ ¼ max{Tj : j ¼ 1; …; Na}: When node J at field F2a is chosen, the activity vector of field F2a ; ya ¼ {y1 ; y2 ; …; yNa } will be updated by yJ ¼ 1; and yJ ¼ 0 for j – J: The activity vector, xa, at field F1a will also be updated by ( A if F2a is inactive xa ¼ A ^ waJ if the Jth F2a is chosen Resonance occurs if the prototype J also satisfies the match criterion lA ^ wJ llAl21 $ r; where 0 # r [ R # 1 is the vigilance parameter that is initially set to zero. The same algorithm in ARTb ; according to the output of the training pattern, will also create a winning node K at field F2b : For the map field, denote the vector ab ab ab xab ¼ {xab 1 ; x2 ; …; xNb } as the output of the field F : ab ab ab ab Also denote the vector wj {wj1 ; wj2 ; …; wjNb } as the ab weights linking the jth node at F2a to node xab j at F : ab The value of elements of wj is either 1 or 0 which, in fact, records the first established matching path from the prototypes stored in F2a to prototypes stored in F2b : That
Fig. 2. Fuzzy ARTMAP architecture. It shows the connection between ARTa, ARTb and mapping field and also the signal path for match-tracking. Vigilance parameters of ARTa and ARTb are to be adjusted during the course of winning node searching and in case of mismatch between ARTa and ARTb.
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
is, if j ¼ J and k ¼ K; where J and K are the winning nodes of ARTa and ARTb ; respectively, wab jk ¼ 1 else wab jk ¼ 0: The match-field activation is triggered by the activation of either ARTa and ARTb : If node J of F2a is chosen, then its ab b weights wab j activate F : If node K in F2 is active, the node ab K in F is activated by 1-to-1 pathways between F2b and F ab : If both ARTa and ARTb are active, Fab becomes active only if ARTa predicts the same category as ARTb by the ab weights wab output vector xab follows J : The F 8 b y ^ wab > J > > > > < wab j xab ¼ > b > y > > > : 0
if the Jth F2a node is active and F2b is active
181
probabilities. The aim of using mutual information is to reduce the proliferation of category growth in F2a and F2b : Although all the above models use frequency (or counting) of matching between categories at ARTa and ARTb for building up the probability distribution to emulate the actual prior probability distribution, if the number of data sets be limited, the distribution may not be accurate enough to describe the prior distribution. In this paper, the fuzzy Cmean clustering (FCM) [10] technique is proposed to build the prior probability distribution. 1.3. Fuzzy C-mean clustering
if the Jth F2a node is active and F2b is inactive if F2a is inactive and F2b is active if F2a is inactive and F2b is inactive
According to the first condition of the above equation, xab ¼ 0 if the prediction wab J is not equal to the output vector yb of ARTb : The result is called mis-match. Match-tracking algorithm will then be triggered and the resonance of ARTa will be reset with the winning node J of ARTa being depressed with setting TJ ¼ 0: Another prototype will be chosen again by the choice function. The match function will be applied on the prototype being chosen to determine the winning node with a small increment in the vigilance parameter. If the winning nodes of ARTa and ARTb do not match with each other again, the match-tracking algorithm will continue until the value of the vigilance parameter is increased to one. If all existing prototypes of both ARTa modules are not matched, a new prototype will be created to code the training sample. If one of the prototypes matches with the training sample, the prototype will be updated by wðnewÞ ¼ bðI ^ wJðoldÞ Þ þ ð1 2 bÞwJðoldÞ ; where b is a learning J parameter. Fast learning corresponds to setting b ¼ 1:0: The complete principle and operation of the FAM model is described in Refs. [3,6] from which the readers can have a detail picture of the whole FAM architecture. PROBART developed by S. Marriott [7] is a modified version of FAM which records the frequency of match between categories at field F2a and F2b during network training and forms a probability distribution. A link with the highest frequency of matches is the principle mapping path from ARTa to ARTb : It initiates the consideration of probability in the FAM architecture and has been proven to the efficient. Boosted ARTMAP (BARTMAP) was proposed by Stephen Verzi [8]. It has not only taken the frequency of matches between categories into consideration but also the error of matching by considering the probability of a category being wrongly chosen and also the probability of ARTa and ARTb being wrongly mapped. It is a further step to apply the probabilistic analysis in the FAM algorithm. Sa´nchez [9] proposed a conditional probability mapping model in the ARTMAP architecture as mARTMAP. Minimization of mutual information was used to constraint the evaluation of the matrix of conditional
FCM is an algorithm to categorize data by the Picard iteration. The number of categories is predefined. It measures the inverse of the Euclidean distance between an input vector and the stored centroid vector of each of the categories as expressed in the form of membership values. The input vector is classified to one of the categories to which the Euclidean distance from the input vector is the shortest. The centroid vector of that category is then updated by including the input vector into the category. The general algorithm of FCM is presented as follows. Let c: number of categories of which c $ 2; n: number of input vectors to be categorized; uik: membership value of the kth vector with respect to the ith cluster; vi: the centroid vector of the ith cluster; xk: the kth input vector; dik: Euclidean distance between the ith vector and the kth cluster centroid vector. All membership values, uik, should satisfy the following criteria uik [ ½0; 1; c X
1 # i # c; 1 # k # n
uik ¼ 1;
1#k#n
uik . 0;
1#i#c
i¼1 n X k¼1
For a good classification task, the total sum of all Euclidean distances from the input vectors to the centroid vectors should be minimized. An objective function is defined as below Jm ðU; vÞ ¼
n X c X
ðuik Þm lxk 2 vi l2
k¼1 i¼1
where 1 # m # 1 and l·l is any inner product induced norm on the category field. This should be minimized. Bezdek [11] proposed the following necessary conditions for minimizing Jm ðU; vÞ: The membership values and centroid
182
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
Fig. 3. Architecture of PEmap. It shows the field of input vector, input category and the output category. Fuzzy C-mean clustering is employed to create membership values for all input categories at field F2a :The Output category, for simplicity, is an extreme probability distribution of which the desired output class is one with others zero. The links between field F2a and F2b are elements of the conditional probability matrix.
destinating pðbk laj Þ as weight of pðaj Þ to pðbk Þ: The mapping can be written in matrix form as follows
vectors of categories are to be updated as follows n X
2
um ik xk
vi ¼ i¼1 n X
and uik ¼ 4 um ik
c X j¼1
dik djk
!1=ðm21Þ 321 5
2
pðb1 la1 Þ
6 6 pðb la Þ 6 2 1 6 6 .. 6 6 . 4 pðbn la1 Þ
i¼1
1.4. Conditional probability matrix Conditional probability tells the probability of occurrence of one event given the occurrence of another event. It denotes pðulvÞ as the probability of occurrence of event u given that event v has occurred. The simplest form of the conditional probability is pðu > vÞ pðulvÞ ¼ pðvÞ Assume A and B events contain m and n components, respectively, i.e. A ¼ {a1 ; a2 ; …; am } and B ¼ {b1 ; b2 ; …; bn } Since every bk, referring to field F2b of Fig. 3, is connected to all components of A at field F2a ; for every, bk pðbk Þ ¼ p½ðbk > a1 Þ < ðbk > a2 Þ < · · · < ðbk > am Þ ¼ pðbk > a1 Þ þ pðbk > a2 Þ þ · · · þ pðbk > am Þ ¼ pðbk ; a1 Þ þ pðbk ; a2 Þ þ · · · þ pðbk ; am Þ
pðb1 la2 Þ
···
pðb2 la2 Þ
···
.. .
..
pðbn la2 Þ
···
.
pðb1 lam Þ
32
pða1 Þ
2
3
pðb1 Þ
3
76 7 6 7 6 7 6 7 pðb1 lam Þ 7 76 pða2 Þ 7 6 pðb2 Þ 7 76 7 6 7 76 . 7 ¼ 6 . 7 .. 76 . 7 6 . 7 76 . 7 6 . 7 . 54 5 4 5 pðbn lam Þ
pðam Þ
pðbn Þ
or in abbreviated format PðBlAÞPðAÞ ¼ PðBÞ
ð2Þ
where 2
pðb1 la1 Þ pðb1 la2 Þ · · · pðb1 lam Þ
3
7 6 6 pðb la Þ pðb la Þ · · · pðb la Þ 7 6 2 1 2 2 1 m 7 7 6 7 6 .. .. .. 7 6 PðBlAÞ ¼ 6 7; . . . 7 6 7 6 .. .. .. .. 7 6 7 6 . . . . 5 4 pðbn la1 Þ pðbn la2 Þ · · · pðbn lam Þ 3 3 2 2 pða1 Þ pðb1 Þ 7 7 6 6 6 pða Þ 7 6 pðb Þ 7 6 6 2 7 2 7 7 7 6 6 PðAÞ ¼ 6 . 7; PðBÞ ¼ 6 . 7 6 . 7 6 . 7 6 . 7 6 . 7 5 5 4 4 pðam Þ pðbn Þ
¼ pðbk la1 Þpða1 Þ þ pðbk la2 Þpða2 Þ þ · · · 1.5. Maximum entropy
þ pðbk lam Þpðam Þ ¼
m X
pðbk laj Þpðaj Þ
ð1Þ
j¼1
In fact, Eq. (1) is a typical arrangement in feedforward multilayer perceptron as the input of activation function by
There are many ways to find the mapping matrix PðBlAÞ in Eq. (2). The most basic approach is error minimization. However, if the numbers of equations (i.e. number of training data sets) are more than unknowns (i.e. number of entries of the mapping matrix), there will be more than one
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
solution. Assume the input and output probability distributions are PðAÞ ¼ ½1 0T and are PðBÞ ¼ ½1 0T ; respectively. According to Eq. (1), the entries of the mapping matrix as shown in Eq. (2) can be obtained by solving the equation " #" # " # 1 1 pðb1 la1 Þ pðb1 la2 Þ ¼ 0 0 pðb2 la1 Þ pðb2 la2 Þ It can be solved by introduction of arbitrary values and the mapping matrix becomes " # 1 l1 PðBlAÞ ¼ 0 l2 where l1 and l2 are the arbitrary values.Since the number of unknowns (i.e. 4) is more than the number of equations (i.e. 2), there are infinite number of solutions can satisfy the equation. In this case, the uncertainty embedded in pða2 Þ ¼ 0 (i.e. no information of a2 component is provided to describe the system behavior with respect to) does not provide any information for the evaluation of the mapping matrix. This uncertainty will not be reflected if l1 and l2 are allowed to be chosen arbitrarily, although the predicted output error can be zero. The method of error minimization is unable to describe the general relation between the two probability distributions. Shannon [12] introduced entropy as a measure of uncertainty being embedded in information transmission. The Shannon entropy, h, of a probability, p, is defined as hðpÞ ¼ 2p log p;
hð0Þ ¼ 0
The entropy of a discrete probability distribution, H can be P expressed as HðPÞ ¼ 2 ni¼1 pi logðpi Þ; where P ¼ {p1 ; p2 ; …; pn } is the discrete probability distribution. The entropy of the conditional probability distribution in Eq. (2) is given by X X ð3Þ H½PðBlAÞ ¼ 2 pðaj Þ pðbk laj Þlog pðbk laj Þ j
k
constraint of minimum error for the evaluation of the mapping matrix is illustrated in Appendix A.
2. PEmap model PEmap is tailored for classification task. The basic architecture of PEmap is shown in Fig. 3. The input vector field, F1a receives the input data set and then categorizes the input vector to different categories at F2a by the FCM algorithm. Upon presence of the input data, membership values are evaluated to each category at F2a : It shows the discrete prior probability distribution at field F2a with respect to the input vector at field F1a : For the output category field, F2b ; the probability distribution over the categories is ‘extreme’ (i.e. the probability of winning category is one with others zero). According to Eq. (2), the probabilistic mapping matrix is proposed to be evaluated by the two discrete probability distribution obtained (i.e. P(A) and P(B). For obtaining a mapping matrix with less prejudice in prediction, H½PðBl AÞ as shown in Eq. (3) should be maximized. Concurrently, the prediction error of the probabilistic mapping matrix should be minimized. The following two conditions are defined for evaluation of the probabilistic mapping matrix. Condition 1: Minimize the prediction error. Condition 2: Maximize the entropy of probabilistic mapping For simplicity, pðbk laj Þ is denoted as fklj : The sum-ofsquare error of prediction can be written as: 20 32 1 n m X 1 X 4 @ pðaj Þfklj A 2 pðbk Þ5 E¼ 2 k¼1 j¼1
ð4Þ
The entropy of the probabilistic mapping matrix, by Eq. (3), can be written as: H¼2
m X
pðaj Þ
j¼1
The entropy of a probabilistic distribution, graphically, indicates how close the distribution to the uniform probability distribution (i.e. pi ¼ 1=n for i ¼ 1; 2; …; n). Souza [13] stated that the minimally prejudiced probability distribution is that which maximizes the entropy subject to constraints supplied by the given testable information. The given constraint and testable information [14], in this study, is the minimum prediction error and training data, respectively. Souza [13] also stated that the principle encompasses the well-known Principle of Insufficient Reason which is an axiom enunciated by P. Laplace as a special case. The principle states that if we are ignorant of the ways an event can occur and therefore have no reason to believe that one way will occur preferentially to another, it will occur equally likely in any way. The necessity of employing maximization of entropy in addition to the
183
n X
ðfklj logn fklj Þ
ð5Þ
k¼1
For the minimized E and maximize H, the following objective function J is proposed. J¼
E H
ð6Þ
This function has to be minimized to satisfy Conditions 1 and 2. The gradient decent technique has been employed to develop an iterative model for evaluation of the probabilistic mapping matrix. The first step is to observe the change of J with respect to change of fklj :
›J ¼ ›fklj
H
›E ›H 2E ›fklj ›fklj H2
ð7Þ
184
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
›E=›fklj can be evaluated as follows 8 20 32 9 1 n m = X ›E › <1 X 4 @ pðaj Þfklj A 2 pðbk Þ5 ¼ ; ›fklj : 2 k¼1 ›fklj j¼1 20 ¼ pðaj Þ4 @
m X
3
1
pðaj Þfklj A 2 pðbk Þ5
ð8Þ
adjustment of a is 8 > aðoldÞ =2 > > > < aðnewÞ ¼ 1 > > > > : aðoldÞ =2
fðnewÞ , 0; redo the iteration step klj 0 # fðnewÞ # 1; next iteration step klj
fðnewÞ . 1; redo the iteration step klj ð12Þ
j¼1
Stopping criteria of the iteration are
›H=›fklj can be evaluated as follows 2 3 m n X ›H › 4 X ¼ 2 pðaj Þ ðfklj logn fklj Þ5 ›fklj ›fklj j¼1 k¼1 1=fklj ¼ 2pðaj Þ logn fklj þ fklj loge n ¼ 2pðaj Þ
;k [ ½1; n and j [ ½1; m; ’1 [ Rþ such that fðnewÞ 2 fðoldÞ #1 klj klj
!
loge fklj þ 1 loge n
ð9Þ
Substituting Eqs. (8) and (9) into Eq. (7), we have
›J ›fklj 20 Hpðaj Þ4 @ ¼
3 loge fklj þ 1 A pðaj Þfklj 2 pðbk Þ5 2 E 2pðaj Þ loge n j¼1 1
m X
H2
P Since ½ð m j¼1 pðaj Þfklj Þ 2 pðbk Þ is the error of the kth output category which is only related to training data sets, it can be simply denoted as dk, i.e. 0 1 m X dk ¼ @ pðaj Þfklj A 2 pðbk Þ j¼1
loge fklj þ 1 pðaj Þ ›J ¼ H dk þ E loge n ›fklj H2
ð10Þ
Eq. (10) describes the increase in J per unit increase in fklj : By Gradient Decent approach, if ›J=›fklj is positive, fklj should be decreased for reducing the value of J. Conversely, if ›J=›fklj is negative, fklj should be increased for reducing the value of J. Hence, the form of iterative model is proposed as follows
fðnewÞ ¼ fðoldÞ 2 aðnewÞ klj klj 0
where 1 is an arbitrary small positive real number. Eqs. (11) –(13) construct the complete picture of the iterative model. For a classification problem, each training pattern has its own corresponding winning class and mapping matrix. Simply averaging the mapping matrices entrywise may bias the prediction result to a class to which majority of the training patterns refer. It is proposed to take average over the matrices for each class to obtain characteristic matrices of all the classes prior to taking average over all the characteristic matrices for the determination of the mapping matrix. The following averaging models are proposed to obtain the probabilistic mapping matrix. Let Mc ¼ {Mc1 ; Mc2 ; …; McNc }; where c ¼ 1; 2; …; n be a set of matrices obtained from training data sets of which the output class is c. Nc is the number of matrices in Mc. The characteristic matrix of class c, Mcp is proposed to be obtained by: Nc X
Mcp ¼
Mci
i¼1
Nc
ð14Þ
The probabilistic mapping matrix, M, can be obtained by averaging the characteristic matrices. n X
M¼
Mip
i¼1
n
ð15Þ
where n is the total number of output categories
pðaj Þ H ðoldÞ
2
@H ðoldÞ dðoldÞ þ EðoldÞ k
ð13Þ
loge fðoldÞ þ1 klj loge n
3. Modeling
1 A
ð11Þ
where a (new) is a learning parameter for controlling the change of fklj during each iteration. For adaptive iterative scheme, it should be suitably adjusted to avoid jumping of the iteration result outside the valid range of fklj (i.e. [0,1]). If it is the case, a should be adjusted. The proposed
3.1. Fire modeling Since it is considered expensive to obtain real fire data from a full-scale model experiment, computational fire simulations by use of computer software were employed to generate data sets for PEmap training. The fire compartment, for simplicity, is assumed to be rectangular in shape with door opening as illustrated in Fig. 4.
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
185
Fig. 4. Room configuration for simulation of compartmental fire. The fire is assumed to be located on floor at the center of the compartment. Four Parameters (i.e. room length, width, height and maximum fire size) are varied to obtain both training and test samples.
The relation between fire and environmental parameters has been proposed by McCaffrey et al. [2]. In his model, the upper hot gas layer temperature is a function of the room geometry, dimensions of opening, properties of gas, wall conduction and heat release rate. In this study, flashover is defined as that the temperature of upper hot gas layer exceeding 600 8C. This flashover criterion was inputted into the computer package ‘FAST’ [15] for estimating the occurrence of flashover. The following parameters were generated randomly to create different scenarios for the network training: 1. 2. 3. 4.
Room length (varies randomly from 2 to 10 m) Room width (varies randomly from 2 to 10 m) Room height (varies randomly from 2 to l0 m) Maximum heat release rate (varies randomly from 10 to 6000 kW).
Fast growth t 2-fire (i.e. 150 s from ignition to reach 1 MW) was assumed throughout the simulation. Also, the ceiling and walls were assumed to be 5/8 in. gypsum and the floor is assumed to be 1/2 in. plywood. For different combination of the room dimensions and the maximum heat release rate, occurrence of flashover determined by FAST and recorded as historical data for the network training. Total numbers of 375 samples of data were generated for network training and testing of which the number of samples of flashover and non-flashover are 190 and 185, respectively. Fig. 5 shows the distribution of the data sets. 3.2. Network model Flashover prediction is in fact a classification problem to judge the occurrence of flashover under a set of given parameters. The designed input parameters of this model are
dimensions of the rectangular compartment and the maximum heat release rate. The output of the network is the occurrence of flashover (i.e. yes or no). Therefore, No. of nodes at F1a ¼ 4 No. of nodes at F2b ¼ 2 Number of nodes at F2b is the number of output categories to which the input vector to be classified. The ratio of data samples of ‘flashover’ and ‘non-flashover’ were adjusted to observe the performance of the PEmap under biased sample ratio. 250 out of 375 data sets were used for network training. The designed proportion of training data is listed in the first column of Table 1. From test 1 to 9, the number of neurons at layer F2a was fixed to be 6. Another test for observing the effect of different number of neurons at layer F2a to the prediction results was designed. The numbers of neurons at layer F2a were adjusted from 2 to 10 to observe the change of network performance by fixing the proportion of data sets between flashover and non-flashover to be 1:1. For network training and test, Matlab [16] with fuzzy toolbox [17] was used for executing fuzzy C-mean clustering on the training data. The probabilistic mapping matrices with maximum entropy were evaluated by the Cþ þ source code developed by the authors.
4. Result and discussion The distribution of training samples and prediction results are shown in Fig. 7. Table 1 shows the result of testing. The maximum percentage of successful prediction is 96.8% at the sample ratio of 85/165 (i.e. 0.515). The prediction result is very accurate. It is graphically presented in Fig. 8. It shows that the percentage of overall successful prediction decreases when the number of non-flashover
186
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
Fig. 5. Data distribution for network training and test. The graph shows the flashover data, ‘W’ and non-flashover data, ‘ þ ’ created by FAST for PEmap network training and test. Almost 2/3 area of the input domain is occupied by flashover data.
training data increases and the number of flashover training data decreases. This result is directly related to the data distribution density on input domain. The training and test data sets obtained from FAST are overlapped in Fig. 5 of which the number of flashover data and non-flashover data are 190 and 185, respectively. It can be observed that the area of the input domain (i.e. total room surface area and fire size) occupied by flashover data is about twice to that of non-flashover. The data distribution density in the region of flashover is about half of that of non-flashover. Fig. 8, indicates that the maximum prediction accuracy can be achieved when the data distribution densities of the two
classes are almost the same. Appendix B provides a mathematical interpretation of this result. Prediction error with different numbers of neurons at layer F2a were compared as shown in Fig. 6 with the same sets of data for network training and testing as test no. 5 of Table 1. It is observed that, when the number of neurons at field F2a is more than five, the prediction error does not change significantly along with any further increase in the number of neurons at layer F2a : It can be explained by the membership values created by FCM. Upon the presentation of an input vector, membership values will be created by FCM for every cluster (i.e. neurons at field F2a ). The
Table 1 Result of prediction by PEmap Test number
No. of training samples for non-flashover
No. of training samples for flashover
No. of test samples for non-flashover (a)
No. of test samples for flashover (b)
No. of successful prediction of non-flashover (c)
No. of successful prediction of flashover (d)
% total successful prediction ¼ [(c) þ (d)]/[(a) þ (b)]
1 2 3 4 5 6 7 8 9
85 95 105 115 125 135 145 155 165
165 155 145 135 125 115 105 95 85
100 95 80 70 60 50 40 30 20
25 35 45 55 65 75 85 95 105
100 89 80 69 54 50 39 29 20
21 30 36 45 61 65 72 84 87
96.8 95.2 92.8 91.2 92.0 92.0 88.8 90.4 85.6
Nine tests were carried out. In every test, the ratio of the sample data of the two classes (i.e. column 1 and 2) was varied to observe the performance of PEmap under biased data distribution density. It shows that, in general, the prediction accuracy decreases with increase in non-flashover samples and decrease in flashover samples.
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
187
Fig. 6. Prediction error of PEmap under different number of neuron in field F2a : It can be observed that the percentage error becomes steady when number of neuron is more than five. Since the dimension of the input domain is four (i.e. room length, width, height and max. fire size), the required number of neurons for identification of an input vector is five in this case. Further increase in number of neuron is redundant.
Fig. 7. Distribution of training data and prediction result by PEmap. It shows the distribution of training data (i.e. þ : non-flashover and A: flashover) and test result (i.e. £ : non-flashover and S: flashover). The results predicted by PEmap agrees very well with that of the training data.
188
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
Fig. 8. Comparison of prediction results of PEmap and FAM. It shows that the performance of PEmap is competitive to FAM. However, it can be observed that the uneven data distribution densities of the classes jeopardize the performance of both models.
membership values, in fact, represent the ratio of the inverses of the distances from the locations of neurons at field F2a to the input vector. Assume there are two neurons A ¼ ða1 ; a2 Þ and B ¼ ðb1 ; b2 Þ created in a two-dimensional input domain. FCM creates membership values ma and mb for the neurons upon the presence of input X ¼ ðx1 ; x2 Þ: The only relation between A, B and X can be written as: kX 2 Ak2 kX 2 Bk2
¼
ðx1 2 a1 Þ2 þ ðx2 2 a2 Þ2 1=ma ¼ 1=mb ðx1 2 b1 Þ2 þ ðx2 2 b2 Þ2
This single equation is not sufficient for determination of the co-ordinate of X since the number of unknown (i.e. x1 and x2) is more than the equation available. By introduction of a new neuron C ¼ ðc1 ; c2 Þ; where C – A and C – B; an additional equation can be obtained as 2
kX 2 Ak
2
kX 2 Ck
¼
ðx1 2 a1 Þ2 þ ðx2 2 a2 Þ2 1=ma ¼ 1=mc ðx1 2 c1 Þ2 þ ðx2 2 c2 Þ2
Now the number of equations is equal to the number of unknowns. The co-ordinate of X can be determined. In theory, n þ 1 numbers of neurons are minimally required to determine the location of a vector in n dimensions. In this flashover problem, the dimension of the input vector is 4 (i.e. room length, width, height and fire size). The minimum number of neuron required should be 5. Further increase in number of neuron in field F2a is redundant. This had been proven by the result of this experimental study (Fig. 6). It is also interesting to compare the result of prediction by
PEmap and Fuzzy ARTMAP. ART-GALLERY [18], which is a FAM simulation tool, was used for the prediction. Fig. 8 shows the comparison. The trends of prediction of both models as shown in Fig. 8 are considered roughly the same and the prediction accuracy as well. Both models perform better at the sample ratio of which the data distribution densities of flashover and nonflashover are almost the same. It can be observed that the data distribution density also affects the prediction accuracy of Fuzzy ARTMAP.
5. Conclusions The performance of PEmap is considered excellent and competitive to the Fuzzy ARTMAP. It also confirms the feasibility of employing ANN as an alternative approach in fire phenomena analysis. By considering the data distribution spreading over the input domain under consideration, the prediction accuracy can be improved to 96.8% if the data distribution densities of the two classes (i.e. flashover and non-flashover) were almost the same. A simple mathematical interpretation is illustrated in Appendix B. There are several modified architectures of Fuzzy ARTMAP (e.g. PROBART) which setting up a probability mapping by frequentist approach. However, for a particular class of which the corresponding number of training samples is more than the others, the frequency of
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
mapping between pair of neurons in the network representing the link of the particular class will be higher than others. PEmap, which employs the computation of characteristic mapping matrices for each class via the averaging models prior to the determination of the combined mapping matrix, is considered as an effective approach to overcome this bias of the training data. The merits of PEmap model are summarized. The network architecture of PEmap is very simple. No parameter is required to be adjusted for network training. Also, number of internal neurons required is only dependent on the dimensions of input and output vectors. The maximum entropy confirms a less prejudice prediction. The prediction results show that PEmap is an efficient classification tool for determination of flashover with a high degree of accuracy. The model can be modified from off-line learning to on-line learning by introduction of adaptive clustering technique instead of the traditional fuzzy C-mean clustering. The maximum entropy concept may also be applied on the discrete prior probability distribution at field F2a for determining the growth of the number of neurons if the entropy is higher than a predefined threshold value.
Acknowledgements The authors acknowledge the support of the Research Grant No. 7001187 and 7001090 from City University of Hong Kong. The authors would also acknowledge Dr Lim C.P. for his comments on the architecture and algorithm of PEmap.
Appendix A. Explanatory notes on the criteria of minimum error and maximum entropy The following example illustrates the evaluation of the mapping matrix of Eq. (2) by PEmap. Assume Mr X driving a car to home by either road A or B. The probability of choosing road A is 0.9 and that of road B is 0.1. The road he driven to is either blocked or unblocked. Mr X experienced that the probability of road blockage is 0.8 and that of no blockage is 0.2 no matter which road he chose. Mr X would like to know what is the probability of the road blockage given that he has chosen road B. Let the input vector be PðAÞ ¼ ½pða1 Þ pða2 ÞT ; where pða1 Þ ¼ 0:9 is the probability of road A being chosen by Mr X and pða2 Þ ¼ 0:1 is the probability of road B being chosen by Mr X. Also, let the output vector be PðBÞ ¼ ½pðb1 Þ pðb2 ÞT ; where pðb1 Þ ¼ 0:8 is the probability of the blockage of the road he chose and pðb2 Þ ¼ 0:2 is the probability of no blockage of the road he chose. The input and output vectors of this problem can be written as Input vector: PðAÞ ¼ ½pða1 Þ pða2 ÞT ¼ ½0:9 0:1T ; Output vector: PðBÞ ¼ ½pðb1 Þ pðb2 ÞT ¼ ½0:8 0:2T :
189
The task is to evaluate the conditional probability pðb1 la2 Þ of the mapping matrix P(BlA). There are infinite numbers of matrices can satisfy the condition PðBlAÞPðAÞ ¼ PðBÞ if the only constraint is zero prediction error of the output vector. The following matrices are examples to satisfy the condition " # " # 0:8 0:8 0:805936 0:748767 ; ; 0:2 0:2 0:194064 0:251232 " # 0:878049 0:097561 ; etc: 0:121951 0:902439 The above matrices evaluated by the only constraint of zero prediction error are not sufficient to reflect the uncertainty embedded in the training pattern (i.e. less experience in driving on road B). Instead, PEmap can give a more meaningful result as # # " " #" 0:9 0:80000 0:829209 0:537121 ¼ 0:20000 0:170791 0:462879 0:1 The sum-of-square error of the output vector is zero with entropy of the mapping matrix to be 0.6932. It can be observed that for pðai Þ (where i ¼ 1 or 2) is close to zero, the corresponding conditional probabilities pðbk lai Þ; where k ¼ 1 or 2, are almost the same. That is, if Mr X did not choose road R (R is either A or B) before, he should have no idea about the road blockage of road R. The probability of the road blockage or no blockage should be the same (i.e. pðbk lai Þ ¼ 0:5Þ: If pðai Þ;where i ¼ 1 or 2, is close to unity, the corresponding conditional probabilities, pðbk lai Þ are close to pðbk Þ; where k ¼ 1 or 2. That is, if Mr X always chooses road R (R is either A or B), the probability of road blockage or unblockage should be the same as he experienced in road R. The result obtained by PEmap is a natural interpretation similar to human judgment. What is the probability of Road B blockage (i.e. pðb1 la2 Þ) if Mr X has driven on road A nine times more than road B? According to Mr X’s experience, the any road he driven is very likely blocked (i.e. pðb1 Þ ¼ 0:8). Then, he believes, without prejudice, that the probability of road B blockage should be higher than 0.5. However, his experience on road B is much lesser than road A (i.e.pða2 Þ ¼ 0:1Þ). He has less confidence to tell the blockage of road B. Without prejudice, the probability of road B blockage he can tell should be only a bit higher than 0.5. PEmap gives the probability to be 0.537121. Mr X can also tell the probability of road A unblockage (i.e.pðb2 la1 Þ). Since he drove on road A frequently (i.e.pða1 Þ ¼ 0:9) and the road he driven is very likely blocked no matter which road he chose (i.e.pðb1 Þ ¼ 0:8), the probability of road A is unlikely to be unblocked. The probability of pðb2 la1 Þ should be much lower than 0.5. PEmap gives the probability to be 0.170791. This example shows the less prejudice prediction made by PEmap.
190
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191
Appendix B. Mathematical interpretation of maximum prediction accuracy on classification problem (2 classes) with the same data distribution densities of the classes For classification task of ANN, without loss of generality, the prediction accuracy will be increased if the number of available training samples is increased (i.e. more historical data available). It may be assumed that the prediction accuracy of an ANN model on class k prediction fk ¼ fk ðNk ; Ak Þ $ 0 is a function with independent variables Nk and Ak, where Nk and Ak are the number of samples of class k and domain coverage of class k, respectively. For simplicity, we may further assume that f is a separable function and written as fx ¼ hNxp Aqx ; where h, p and q [ R: Considering a binary classification problem for class a and classb. The probabilities of choosing class a and b denoting as P(a ) and P(b ) are given by PðaÞ ¼
Aa ; Aa þ Ab
PðbÞ ¼
Ab Aa þ Ab
The expected accuracy, E, is E ¼ PðaÞfðNa ; Aa Þ þ PðbÞfðNb ; Ab Þ ¼
Ab Aa fðNa ; Aa Þ þ fðNb ; Ab Þ Aa þ A b A a þ Ab
›E ¼ 0; ›Nb
›E ¼ 0; ›A a
›E ¼0 › Ab
Ab ›fðNb ; Ab Þ ›E Aa ›fðNa ; Aa Þ ¼ þ ¼0 ›N a Aa þ Ab ›Na Aa þ A b ›Na
›ðhNbp Aqb Þ ›ðhNap Aqa Þ þ Ab ¼0 ›N a ›Na
By Na þ Nb ¼ constant (i.e. total number of samples) pNap21 Aqþ1 ¼ pNbp21 Aqþ1 a b Nap Aqa
fa
Ab Aa ¼ Nbp Aqb Na Nb
Ab Aa ¼ fb Na Nb
fa N =A ¼ a a fb Nb =Ab
›Ab fb ðNb ; Ab Þ ›A a
¼0 By Aa þ Ab ¼ constant (i.e. total input domain)
hNap ðq þ 1ÞAqa ¼ hNbp ðq þ 1ÞAqb Nap Aqa ¼ Nbp Aqb
fa ¼ fb
ðB3Þ
It should be noted that Eq. (B3) is independent of the values of h; p and q. The same result as Eq. (B3) can be obtained by put ›E=›Ab ¼ 0: From Eqs. (B2) and (B3), we may conclude that the prediction accuracy of the ANN model is proportional to the data distribution density. The maximum overall prediction accuracy should exist when the data distribution densities of the two classes are the same.
References
To obtain the maximum of E with respect to Aa, consider the partial derivative
Aa
›E 1 ›Aa fa ðNa ; Aa Þ 1 ¼ þ Aa þ Ab A a þ Ab › Aa ›A a
ðB1Þ
To obtain the maximum of E, 7E ¼ 0 which implies
›E ¼ 0; ›N a
To obtain the maximum of E with respect to Aa, consider the partial derivative
ðB2Þ
It should be noted that Eq. (B2) is independent of the values ofh, p and q. The same result as Eq. (B2) can be obtained by put ›E=›Nb ¼ 0:
[1] Ha¨gglund B, Jansson R, Onnermark B. Fire development in residential rooms after ignition from nuclear explosions. FOA report C 20016-D6(A3). Forskningsanstalt, Stockholm; 1974. [2] McCaffrey BJ, Quintiere JG, Harkleroad MF. Estimating room temperatures and the likelihood of flashover using fire test data correlations. Fire Technol 1981;17:98–119. 18:122. [3] Carpenter GA, Grossberg S, Markuzon N, Reynolds JH, Rosen DR. Fuzzy ARTMAP: a neural network architecture for incremental supervised learning of analog multidimensional maps. IEEE Trans Neural Network 1992;3(5). [4] Grossberg S. Adaptive pattern recognition and universal recoding. II. Feedback expectation, olfaction and illusions, biological cybernetics. Biol Cybern 1976;23:187–202. [5] Kosko B. Fuzzy entropy conditioning, information science. Inform Sci 1986;40:165 –74. [6] Carpenter GA, Grossberg S. A Self-organizing neural network for supervised learning, recognition, and prediction. IEEE Commun Mag 1992;. [7] Marriott S, Harrison RF. A modified fuzzy ARTMAP architecture for the approximation of noisy mappings. Neural Network 1995;8(4): 619 –41. [8] Verzi SJ, Heileman GL, Georgiopoulos M, Healy MJ. Neural Networks Proceedings. IEEE World Congress on Computational Intelligence; 1998. [9] Sa´nchez GE, Dimitriadis YA, Cano lzquierdo JM, Coronado JL. Neural networks, IJCNN 2000. Proceedings of IEEE–INNS–ENNS International Joint Conference, vol. 6; 2000. [10] Ball G, Hall D. A clustering technique for summarizing multivariate data. Behav Sci 1967;12:153– 5. [11] Bezdek JC. Fuzzy mathematics in pattern classification. PhD Dissertation. Department of Applied Mathematics, Cornell University, Ithaca, NY; 1973. [12] Shannon CE, Weaver W. The mathematical theory of communication. Urbana: University of Illinois Press; 1949.
E.W.M. Lee et al. / Advanced Engineering Informatics 16 (2002) 179–191 [13] Souza RS. A Bayesian entropy approach to forecasting. PhD Thesis. Statistics Department, University of Warwick, UK; 1978. [14] Jaynes ET. Prior probabilities. IEEE Trans Syst Sci Cybern 1968;4: 227–41. [15] FAST ver. 3.1.6 developed by National Institute of Standards and Technology (NIST). [16] MATLAB Release 12 developed by The Mathworks Inc., USA. [17] Fuzzy Toolbox of MATLAB developed by The Mathworks Inc., USA. [18] The ART-GALLERY ver. 1.0 developed by Lars H. Liden, Boston University, Boston, MA, USA.
Eric W.M. Lee has been working in building industry in Hong Kong for over 12 years. He received the BEng degree from City University of Hong Kong in 2000. He is currently a PhD student in the Department of Building and Construction of the same university. His research interest includes artificial neural network model development, fire modeling and fire risk assessment.
Richard K.K. Yuen obtained his PhD in Fire Dynamics and Engineering from the University of New South Wales. He has been in the Department of Building and Construction at the City University of Hong Kong since 1990. He is a Chartered and Registered Professional Engineer and member of various professional bodies including the HKIE, CIBSE and IEAust. His research interests include fire safety and engineering, pyrolysis and combustion, applications of computational fluid dynamics, neural network modeling applications in fire engineering, building energy conservation, lighting and ventilation, HVAC systems and indoor air quality.
191
S.M. Lo has joined the Department of Building and Construction at the City University of Hong Kong since 1994. He is a Registered Professional Building Surveyor and the Authorized Person from the Hong Kong Government. He is currently an associate professor in the City University of Hong Kong. His research interests include human behavior, evacuation, fire risk assessment and fire engineering.
K.C. Lam obtained his PhD in construction economics from the University of New South Wales, Australia. He has been teaching at the City University of Hong Kong since 1990. Dr Lam is a registered professional engineer in both Hong Kong and Australia, and a chartered builder in United Kingdom. Dr Lam is the member of Building and Civil Engineering Industry Training Board of the Vocational Training Council, HKSAR, from 2000. Besides, he is the director of the Construction Management Research Center, and actively involving in research projects related to construction economics modeling, artificial intelligence in construction management and civil engineering.