Fuzzy Sets and Systems 112 (2000) 371–380
www.elsevier.com/locate/fss
Fuzzy regression by fuzzy number neural networks James P. Dunyaka;∗ , Donald Wunsch b b
a Department of Mathematics, Texas Tech University, Lubbock, TX 79409, USA Department of Electrical Engineering, Texas Tech University, Lubbock, TX 79409, USA
Received April 1997; received in revised form October 1997
Abstract In this paper, we describe a method for nonlinear fuzzy regression using neural network models. In earlier work, strong assumptions were made on the form of the fuzzy number parameters: symmetric triangular, asymmetric triangular, quadratic, trapezoidal, and so on. Our goal here is to substantially generalize both linear and nonlinear fuzzy regression using models with general fuzzy number inputs, weights, biases, and outputs. This is accomplished through a special training technique for fuzzy number neural networks. The technique is demonstrated with data from an industrial quality control c 2000 Elsevier Science B.V. All rights reserved. problem. Keywords: Fuzzy regression; Neural networks; Back propagation
1. Introduction Tanaka, Uejima, and Asai rst posed the problem of fuzzy regression in [14]. They considered the fuzzy linear regression model Y (x) = A0 + A1 x1 + · · · + An xn
(1)
with symmetric triangular fuzzy parameters Ai chosen to match given n-dimensional input vectors x j = (x1j ; x2j ; : : : ; xnj ) with fuzzy output y j , j = 1; 2; : : : ; m. In this paper, the notation Y indicates a fuzzy set with membership Y (y) : < → [0; 1]. The corresponding -cut sets are Y = {y : Y (y)¿}. A fuzzy number Y has special structure: each -cut is a closed and convex subset of <, so each -cut can be written as a closed interval with Y = [Y 1 ; Y 2 ]. Only normal fuzzy numbers will be considered, so the = 1 cuts are not empty. The parameters in Eq. (1) were chosen, through a linear programming solution method, to meet for each input–output pair (x j ; y j ) y jh ⊂ Y (x j )h ;
∗
j = 1; 2; : : : ; m;
Corresponding author. Tel.: + 1806 742 2566; fax: + 1806 742 1112; e-mail:
[email protected].
c 2000 Elsevier Science B.V. All rights reserved. 0165-0114/00/$ - see front matter PII: S 0 1 6 5 - 0 1 1 4 ( 9 7 ) 0 0 3 9 3 - X
(2)
372
J.P. Dunyak, D. Wunsch / Fuzzy Sets and Systems 112 (2000) 371–380
where Y (x)h is the = h cut for a speci ed level h. This technique was extended by Tanaka and Ishibuchi to fuzzy numbers with quadratic membership functions [13] and fuzzy numbers de ned by a more general shape function L(·) [12]. The problem was simpli ed and recast as linear interval regression by Ishibuchi and Tanaka in [8]. These interval regression models are closely connected to standard linear fuzzy regression in Eqs. (1) and (2). In interval regression, the linear programming problem is Minimize yw (x1 ) + yw (x2 ) + · · · + yw (x m ) subject to y jh ⊂ Y (x j )h ;
j = 1; 2; : : : ; m;
(3)
where yw (x) is the width of the interval Y (x) = A0 + A1 x1 + · · · + An xn
(4)
and Y and Ai are interval variables. Fuzzy models with trapezoidal membership functions are easily derived from these interval models [9]. Nonlinear fuzzy regression techniques then followed by application of neural network models with interval weights and biases [7, 9, 10]. The nonlinear programming problem in Eq. (3) is solved by a form of backpropagation with the linear model in Eq. (4) replaced by a neural network. Again, nonlinear fuzzy regression models may be developed from interval models [9]. In all of this fuzzy regression work, strong assumptions are made on the form of the fuzzy number parameters: symmetric triangular, asymmetric triangular, quadratic, trapezoidal, and so on. Our goal here is to substantially generalize both linear and nonlinear fuzzy regression using models with general fuzzy number inputs, weights, biases, and outputs. We consider m input–output fuzzy pattern pairs (x j ; y j ) with j = 1; 2; : : : ; m. Here x j and y j are of dimension n and k, respectively. Each component of x j and y j is described by its own membership function. Of course, the fuzzy regression models can only approximate the input–output relationship, and other restrictions prevent fuzzy regression models from being universal approximators [1]. Let wi1 (h) and wi2 (h), i = 1; 2; : : : ; k be nonnegative real weighting functions. Our model h ; Yi 2 (x) h ] = Y i (x) h for i = 1; 2; : : : ; k. We has k fuzzy outputs with = h cuts described by the interval [Yi 1 (x) pose the problem k Z 1 m X X h h wj1 (h)[yi 1j − Yi 1 (x j )h ]2 + wj2 (h)[Yi 2 (x j )h − yi j2 ]2 dh Minimize (5) j=1 i=1 0 jh j h subject to yi ⊂ Y i (x ) ; j = 1; 2; : : : ; m; h ∈ [0; 1]: Note that the constraints in Eq. (5) require, as in fuzzy linear regression, that the -cuts for all of the output data y j be contained in the -cuts of the tted model Y (x j ). With this constraint in mind, the minimization looks for the best t among the model and data. The rst summation in the minimization is over the m data input–output pairs. The second summation is over the k elements of the model output. The integration provides a weighted measure of the tting error. In concept we would like to calculate the integral over h in Eq. (5); in practice, we have only a discrete representation of the fuzzy inputs, outputs, and parameters. The discretization we use is in terms of the interval endpoints of the fuzzy number -cuts. In this case, the integral in Eq. (5) is approximated by a summation. 2. Using neural networks for fuzzy regression We now consider the model used in the minimization in Eq. (5). Neural networks provide a powerful and general method for modeling the relationships between data. Following the lead of [9, 10, 7], we rst develop our concept of a neural network with fuzzy number inputs, weights, and outputs.
J.P. Dunyak, D. Wunsch / Fuzzy Sets and Systems 112 (2000) 371–380
As in our work in [6], we use the extension principle to generalize a crisp neuron ! k X Xi Wi O = f W0 +
373
(6)
i=1
as O = f W 0 +
k X
! Xi W i
i=1
and O(y) =
sup
Pk
y=f w0 +
i=1 xi wi
min[W0 (w0 ); W1 (w1 ); : : : ; Wk (wk ); X1 (x1 ); X2 (x2 ); : : : ; Xk (xk )] :
Here f(·) is the nonlinearity, Wi are weights, and Xi are inputs. Now, consider inputs and weights that are all normal fuzzy numbers. We assume that the nonlinearity f(·) is strictly increasing and see, in terms of -cut endpoints [6], that " ! k X min(Wk1 Xk1 ; Wk1 Xk2 ; Wk2 Xk1 ; Wk2 Xk2 ) ; O = [O1 ; O2 ] = f W0 1 + i=1
f
W02
+
k X
(7)
!#
max(Wk1 Xk1 ; Wk1 Xk2 ; Wk2 Xk1 ; Wk2 Xk2 )
:
i=1
This output is also a normal fuzzy number. In the example below, the usual sigmoidal nonlinearity is used, but f(x) = x may also be useful when a linear neuron is desired. Derivatives of output endpoints with respect to input and weight endpoints are easily calculated. Multilayer networks are then easily constructed using this fuzzy number neuron. In Fig. 1, we show a network used in the example below. This example may be written as 3 1 X X Xj W 2ij W 31 1 : f W 2i 0 + (8) O net = f W 31 0 + i=1
j=1
Here W ijk is the weight k from node j of level k and Xi is the ith component of the fuzzy input X . Despite the frequent discussions of fuzzy neural networks in the literature, few training techniques are available for even simple feedforward networks [2]. In [6], the authors developed a training technique for general fuzzy number neural networks; a straightforward modi cation of this technique will allow the development of nonlinear fuzzy number regression models based on fuzzy number neural networks. 3. Training fuzzy number neural networks In concept, the network may be trained to solve Eq. (5) by choice of appropriate weights. Unfortunately, fuzzy set theory imposes a constraint on the behavior of the fuzzy weights. In general, a fuzzy number network that weight is described by its -cuts. Fuzzy set theory requires, for a fuzzy number F, F 1 ⊂ F 2
374
J.P. Dunyak, D. Wunsch / Fuzzy Sets and Systems 112 (2000) 371–380
Fig. 1. This neural network is used for the quality evaluation problem.
when 1 ¿2 . Thus, F12 6F11 6 F21 6 F22
(9)
when 1 ¿2 . This requirement imposes an enormous number of constraints during training. Our training technique eliminates the weight constraints by use of an unconstrained weight representation, as described below. Normally, the -cuts of a fuzzy number weight are represented, for discretized levels 061 62 6 · · · 6n = 1, in terms of interval endpoints by
3 3 n n 1 2 2 1 [Wijk ; Wijk ; Wijk ; : : : ; Wijk(n−1); Wijk ; Wijk ; Wijk(n−1); : : : ; Wijk ; Wijk ; Wijk ]: 1
1
1
1
1
2
2
2
2
2
(10)
To eliminate constraints, we use a transformation and represent the weight W ijk as W˜ijk = [W˜ijk1 ; W˜ijk2 ; : : : ; W˜ijk2n ] with the transformation 1 W˜ijk 1 = Wijk
1
2 1 (W˜ijk 2 )2 = Wijk − Wijk
(W˜ijk 3 )
2
1
1
3 = Wijk 1
2 Wijk 1
−
.. . n − Wijk(n−1) (W˜ ijk n )2 = Wijk 1
1
n n (W˜ ijk (n+1) )2 = Wijk − Wijk
(W˜ ijk (n+2) )
2
2 (n−1) = Wijk 2
(11) 1
n − Wijk
2
.. . 3 2 − Wijk (W˜ ijk (2n−1) )2 = Wijk 2
1 2 (W˜ ijk 2n )2 = Wijk − Wijk : 2
2
2
J.P. Dunyak, D. Wunsch / Fuzzy Sets and Systems 112 (2000) 371–380
375
This unconstrained representation W˜ ijk will always result in fuzzy weight -cuts that meet the constraints in Eq. (9). We now must consider the constraints in Eq. (5). Following [10], we use a penalty method to impose the constraint. Consider the indicator function 1 a¿b; Ia¿b = 0 a6b: Let be a small number. Consider the problem Minimize
k Z m X X j=1 i=1
+(IY
i 2 (x
0
1
(Iy jh ¿Y i1
h j )h ¿y j i2
i 1 (x
+ IY
j )h
+ Iy jh ¡Y i1
h
i 1 (x
j )h
)wj1 (h)[yi j1 − Yi 1 (x j )h ]2 h
i 2 (x
h j )h ¡y j i2
)wj2 (h)[Yi 2 (x j )h − yi j2 ]2 dh:
(12)
Evaluation of Y (x) is accomplished by a multilayer neural network such as in Eq. (8). The minimization is then performed through choice of fuzzy weights W ij . This equation is written for a network with one hidden layer; larger networks simply add more summation terms in the de nition of Y (x). Again, for numerical implementation the integral is approximated with a sum at discrete levels. Note in Eq. (12) that choice of small will force the system to (approximately) meet the constraint in Eq. (5). Note that, for small , violation of the constraint is heavily penalized. This forces the solution to (approximately) meet the constraint. Meanwhile, the error measure in Eq. (5) is minimized within these constraints. The authors have veri ed this behavior in many numerical experiments, such as the one described below, as have the authors in [9, 10, 7]. Derivatives follow by repeated application of the chain rule, as is done in the standard back propagation algorithm. Gradient descent (or another algorithm) may then be used to select unconstrained weights. Because the fuzzy weights are now unconstrained, many popular and well-understood crisp neural network training methods may be applied to the fuzzy number regression problem. The training dynamics tend to have two distinct stages: the weights move rapidly to approximately satisfy the constraints, then move much more slowly to minimize the regression error. The smaller the value of , the more accurately the constraints are met but the slower the reduction in regression error. To improve numerical eciency, the penalty parameter is reduced in stages during training. The best choice for the reduction schedule is highly problem dependent.
4. A quality evaluation problem Since quality control often requires subjective rankings and evaluation, it provides an excellent example for fuzzy regression analysis. Kuroda and Shimohira [11] studied the quality evaluation problem in injection molding. (This problem was also analyzed with fuzzy regression in Ishibuchi and Tanaka [9].) Fifteen moldings were examined by three experts, who assigned each molding a rank in {1; 2; 3; 4; 5} based on the depth of gap in the molding. Rank 5 represented the highest evaluation (no evident gap) and rank 1 the lowest evaluation. The gaps depths were then measured. Table 1 lists the data from the experiment, with the experts labeled A, C, and E as in [11]. Our goal is to model the relationship between the measured depths and the ranks. To fully demonstrate our algorithm, we will use fuzzy numbers for both the depths and ranks. Due to the subjective nature of the ranking process, a fuzzy (possibilistic) model is appropriate. For a rank r ∈ {1; 2; 3; 4; 5}, we use the fuzzy
376
J.P. Dunyak, D. Wunsch / Fuzzy Sets and Systems 112 (2000) 371–380 Table 1 Gap depths and observation ranks Sample
Measured depth (m)
Rank by Expert A
Rank by Expert C
Rank by Expert E
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
0.9 0.9 0.9 1.1 1.2 1.3 1.4 1.5 1.7 1.7 1.8 1.8 2.6 4.0 4.0
5 5 5 4 4 4 4 3 3 4 3 3 2 1 1
5 5 4 3 4 3 3 3 2 3 2 2 2 1 1
5 5 5 3 3 4 4 3 3 4 2 2 2 1 1
variable R with membership function 2(x − r) + 2 (r − 1)6 x6 (r − 0:5); 1 (r − 0:5)6 x6(r + 0:5); = R(x) −2(x − r) + 2 (r + 0:5)6 x6 (r + 1); 0 otherwise: Fig. 2 shows the membership function for a fuzzy rank of 1. The other four rank membership functions have similar shape and are simply shifted to the right. The gap measurements in Table 1 also have some uncertainty, which we represent with the standard possibilistic viewpoint. For a gap with crisp measurement g in Table 1, we represent the fuzzy gap G by membership function 2 e−200(x−g+0:05) x6(g − 0:05); (g − 0:05)6x6(g + 0:05); G(x) = 1 e−200(x−g−0:05)2 (g + 0:05)6 x: Fig. 3 shows the membership function for g = 0:9 in Table 1. The membership functions for other gaps are simply a shifted version of Fig. 3. Now, we examine the relationship between the measured gap and the subjective ranking. Two dierent viewpoints will be used: a conservative rank estimate based on fuzzy regression and a least-squares error estimate. 4.1. A conservative fuzzy estimate of rank We rst address the classic fuzzy regression problem in Eq. (5), with the model determined by the weights in Eq. (8). This is a three-layer network, as shown in Fig. 1, with one input, three hidden layer nodes, and one output. The levels were discretized in the training to the levels {0:1; 0:2; 0:3; 0:4; 0:5; 0:6; 0:7; 0:8; 0:9; 1:0}. Thus only ten fuzzy weights, each described by 20 parameters, were determined in the training. (Much larger fuzzy number networks were demonstrated in [6].) The network was trained using back propagation with random initial weights and = 0:5 and a learning rate of 0:2 initially. Each 200 training steps, was reduced by half until a value of 0:0078125 was reached. At this point, was xed for the remainder of the training.
J.P. Dunyak, D. Wunsch / Fuzzy Sets and Systems 112 (2000) 371–380
377
Fig. 2. Uncertainty in subjective evaluation of rank is represented by a membership function. This plot shows the membership function for a rank of 1.
Fig. 3. Gap measurements also have experimental uncertainty, as re ected in this membership function centered at a measurement of 0:9 m.
378
J.P. Dunyak, D. Wunsch / Fuzzy Sets and Systems 112 (2000) 371–380
Fig. 4. A fuzzy regression model provides the closest t that still contains the -cuts of all the data points. This plot shows the boundaries of the -cuts of the modeled rank Y (x) for fuzzy inputs centered between 0:5 and 4:5 m. Note that the model does (approximately) meet the data constraint at the = 1 cut.
The classic fuzzy regression viewpoint is a conservative one, as can be seen by the constraint in Eq. (5). This model may be viewed as a maximum uncertainty model; one would hope that a new set of measurementrank data would fall within its -levels. In Fig. 4, we show the network output membership function as the center value of the weld gap measurement is varied. Note that the constraint in Eq. (5) is approximately maintained. The uncertainty in rank evaluation (as measured by the width of the model -cuts) is smaller for larger weld gaps. This is as expected, since larger gaps are much easier for the experts to see. The nonlinear nature of the model is also evident. 4.2. The best least-squares map A nonlinear least-squares t is also of interest. This follows easily, using the fuzzy number neural network, from the unconstrained problem m
Minimize
k
1 XX 2 j=1 i=1
Z 0
1
h
h
wj1 (h)[yi 1 j − Yi 1 (x j )h ]2 + wj2 (h)[Yi 2 (x j )h − yi 2 j ]2 dh:
(13)
Now no penalty method is required, and straightforward back propagation may be applied. This model may be viewed as the measurement-rank relationship with minimum error. Fig. 5 shows the modeled rank membership function as a function of gap measurement. As you can see from comparison with Fig. 4, the least-squares model has much smaller -cuts but does not meet the constraint in Eq. (5). Note that the natural relationship
J.P. Dunyak, D. Wunsch / Fuzzy Sets and Systems 112 (2000) 371–380
379
Fig. 5. The least-squares model provides much less rank uncertainty than is seen in the regression model in Fig. 4. Unfortunately, for many data points, the = 1 cuts fall outside the least-squares model.
between smaller gap size and larger uncertainty in rank has also been lost. A least-squares fuzzy regression viewpoint has also been considered in Celmins [3, 4] and Diamond [5]. 5. Conclusions In this paper, we propose a fuzzy regression technique based on fuzzy number neural networks. The method allows the building of nonlinear regression models with general fuzzy number inputs, outputs, and parameters. This is in contrast to earlier techniques that heavily restricted the functional form of membership functions. Training is accomplished with the general approach discussed in the authors’ paper on fuzzy neural networks [6], with a modi cation to use penalty methods to meet regression constraints. The method is demonstrated with an example relating actual measurements to subjective rank evaluations in a quality control problem. References [1] J.J. Buckley, Y. Hayashi, Can fuzzy neural networks approximate continuous fuzzy functions, Fuzzy Sets and Systems 61 (1993) 43 – 52. [2] J.J. Buckley, Y. Hayashi, Fuzzy neural networks: a survey, Fuzzy Sets and Systems 66 (1994) 1–13. [3] A. Celmins, Least squares model tting to fuzzy vector data, Fuzzy Sets and Systems 22 (1987) 245 –269. [4] A. Celmins, Multidimensional least-squares tting of fuzzy models, Math. Modeling 9 (1987) 669 – 690. [5] P. Diamond, Fuzzy least squares, Inform. Sci. 46 (1988) 141–157. [6] J.P. Dunyak, D. Wunsch, Fuzzy number neural networks, Fuzzy Sets and Systems, 108 (1999) 49–58.
380
J.P. Dunyak, D. Wunsch / Fuzzy Sets and Systems 112 (2000) 371–380
[7] H. Ishibuchi, M. Nii, Fuzzy regression analysis by neural networks with non-symmetric fuzzy number weights, Proc. of the Internat. Conf. on Neural Networks, vol. II, 1996, pp. 1191–1196. [8] H. Ishibuchi, H. Tanaka, Several formulations of interval regression analysis, Proceedings of Sino-Japan Joint Meeting on Fuzzy Sets and Systems, Beijing, China, 1990, B2-2. [9] H. Ishibuchi, H. Tanaka, Fuzzy regression analysis using neural networks, Fuzzy Sets and Systems 50 (1992) 257–265. [10] H. Ishibuchi, H. Tanaka, An architecture of neural networks with interval weights and its application to fuzzy regression analysis, Fuzzy Sets and Systems 57 (1993) 27– 39. [11] H. Kuroda, K. Shimohira, Quanti cation of weld-line and quantitative evaluation of its molding factors in injection molding (in Japanese), J. Japan Soc. Polymer Process. 2 (1990) 159–165. [12] H. Tanaka, Fuzzy data analysis by possibilistic linear models, Fuzzy Sets and Systems 24 (1987) 363 – 375. [13] H. Tanaka, H. Ishibuchi, Identi cation of possibilistic linear systems by quadratic membership functions of fuzzy parameters, Fuzzy Sets and Systems 41 (1991) 145 –160. [14] H. Tanaka, S. Uejima, K. Asai, Linear regression analysis with fuzzy model, IEEE Trans. Systems Man Cybernet. 12 (1982) 903 –907.