I N F O R M A T I O N SCIENCES 2, 15-33 (1994)
15
Fuzzy Logic Controlled Neural Network Learning QING HU and DAVID B. HERTZ
Computer Information Systems Department, University of Miami, Coral Gables, Florida 33124
ABSTRACT The slow and uncertain convergence of multilayer feedforward neural networks using the backpropagation training algorithm is caused mainly by the iterative nature of the dynamic process of finding the weight matrices with static control parameters. This study investigates the use of fuzzy logic in controlling the learning processes of such neural networks. Each learning neuron in the neural networks suggested here has its own learning rate dynamically adjusted by a fuzzy logic controller during the course of training according to the output error of the neuron and a set of heuristic rules. Comparative tests showed that such fuzzy backpropagation algorithms stabilized the training processes of these neural networks and, therefore, produced 2 to 3 times more converged tests than the conventional backpropagation algorithms. The sensitivities of the training processes to the variations of fuzzy sets and membership functions are examined and discussed.
1. I N T R O D U C T I O N Multilayer feedforward neural networks have been the most widely used and studied artificial neural network systems since the b a c k p r o p a g a t i o n (BP) learning algorithm was formalized by R u m e l h a r t et al. [14]. However, it also has been recognized that the convergence of such systems is often slow or uncertain. A n u m b e r of modifications and alternatives to the backpropagation algorithm have been p r o p o s e d to accelerate the convergence process [3, 7, 9, 12, 13, 16, 19]. T h o u g h many of the "adjustments" c o m p a r e d favorably over the original b a c k p r o p a g a t i o n algorithm in testing applications, few seem to be as widely used as the original. M o r e general solutions, preferably with lower computational complexity and fewer parameters, are yet to be developed. © Elsevier Science Inc. 1994 655 Avenue of the Americas, New York, NY 1 0 0 1 0
0020-0255/94/$7.00
16
Q. HU AND D. B. H E R T Z
We propose, in this paper, a fuzzy logic solution to the slow and uncertain BP convergence problem. Our major objective has been to develop a method of dynamically adjusting the learning rate, a key parameter to convergence efficiency, of each learning neuron in a network using fuzzy logic control mechanisms in order to improve the convergence of the BP algorithm. In Section 2, we analyze the problem and the solutions proposed by others. Based on this analysis we introduce, in Section 3, fuzzy logic controlled backpropagation (FBP) and its implementations. In Section 4, we compare our FBP with the standard BP in a variety of mapping problems. In Section 5, we examine the sensitivities of the FBP training processes to the variations of fuzzy set definitions and membership functions. In Section 6, the conclusions from this study are presented. 2. ANALYSIS OF T H E P R O B L E M The basic function of a neural network is to map a set of input vectors R n to a set of target vectors }'i c R m:
Xi E
q~: X ~ T ,
(1)
where q) is an unknown mapping function, X = ( x 1 . . . . . X~.), and T = (t' 1..... t' ). For a given neural network with k hidden layers, N = ~: h ( L i, L h1..... Lk, L o ), this mapping is accomplished by (2) where y'p is the neural network's estimate of the p t h target vector }'p, ~'p is the transpose of the p t h input vector, f°(t) is the transfer function of the output layer, fib(t) is the transfer function of the ith hidden layer, W ° is the weight matrix of the output layer, and W~h is the weight matrix of the ith hidden layer. Thus, instead of seeking the direct mapping function ~, which is often nonlinear and complex, a neural network tries to find W ° and W~h from the initial, usually randomized, matrixes W~ and whi. It approaches this with the iterative operations:
W°(t + 1) = W ° ( t ) + AW °,
(3)
Wih(t + 1) = w/h(t) + AW,.h .
(4)
F U Z Z Y LOGIC C O N T R O L L E D LEARNING
17
The convergence rate of a neural network depends largely on the value of each AW and its direction. The backpropagation algorithm uses the gradient descent of the output errors to find these AWs [5, 14]:
,~,w~)
=
Apw~i =
- 7 ~
-
o ,
Wkj
3Ep 7 Owl;'
(5)
(6)
where 7/is the learning rate, Ep is the square error of output neurons with pattern p, wk°j is the weight on the connection between k th output neuron and jth hidden neuron, and w~ is the weight on the connection between the jth hidden neuron and ith input neuron. Equations (5) and (6) show where the problems may lie and how their resolutions may be approached (e.g., [12, 13, 18]). Here we will focus on one key variable: the learning rate 7. It is clear that other factors being equal, the learning rate 7 controls the convergence rate of the gradient descent method. In theory, the learning rate 7 has to be infinitesimal in order for the gradient decent method to be mathematically valid [14]. This implies infinite training time and, therefore, the condition must be relaxed. The consequences of this relaxation are (1) the convergence becomes uncertain and (2) an 71 for every problem and every network configuration has to be provided by the user. Selecting an appropriate 7 for a specific problem is not trivial. Excessively small or large 7/ may effectively cause a training process to be very slow or not to converge at all. For such a key parameter, an analytical approach to its determination would certainly be desirable. Thus far, only some heuristic [5, 10] or semianalytical methods [4] have been proposed. A way to get around the ~/ problem is to start with a random value and dynamically adjust it during the training process. Considering a neural network as a dynamic and parallel system where information is processed locally in each neuron, it is reasonable to assume that if every neuron (or every weight) were to have a dynamically adjusted 7, the training process would be more efficient than the process with a fixed 7 for all neurons. This approach has been proven to be successful [1, 8, 9, 12, 19]. However, a problem with some of these methods is that, in order to dynamically adjust the learning rate, new ad hoc parameters are introduced (e.g., [9, 12, 19]). Thus, the slow convergence problem of backpropagation neural networks lies in the iterative nature of the dynamic process of finding appropriate weight matrices using static process control parameters. Using
18
Q. H U AND D. B. H E R T Z
adaptive learning rates for every neuron seems to be a promising solution. However, the implementation of such a solution using traditional methods often leads to new problems. A method is needed that can dynamically adjust the learning rates without requiring user intervention for each new problem. 3. A FUZZY LOGIC SOLUTION Fuzzy set theory was developed by Zadeh [20] in 1965. One of the most developed and widely used applications of fuzzy set and fuzzy logic theory is fuzzy logic control [2] in which heuristic and imprecise knowledge can be used to aid the control of complex processes. This is of particular interest because neural network learning is a dynamic process influenced by many process parameters about which we have only empirical knowledge. For example, it is desirable that the value of a learning rate depends on the magnitude of pattern errors [10]: if error is large, 7/should be large, and if error is small, ~7 should be small. Using the conventional approach, this relationship does not provide much assistance in determining an efficient value of a learning rate, but under fuzzy logic control, such linguistic statements may be effectively utilized. Arabshahi et al. [1] proposed a fuzzy controlled backpropagation method in which learning rate r/is incremented at every iteration based on current network error and the change of the error. However, the suggested fuzzy control mechanism is not efficient and not well defined. Two system variables (E n and CEn) are used to generate an increment value to update a learning rate. It is not clear which learning rate is updated: for the entire network, as in standard backpropagation or for every weight as in Jacobs' delta-bar-delta method [9]. In this section we propose a fuzzy logic controlled backpropagation method in which a simpler but effective fuzzy control mechanism is used to dynamically adjust the learning rate of every learning neuron in a neural network. A relatively formal definition of the fuzzy control process and its implementations are presented. 3.1. FUZZY VARIABLES AND FUNCTIONS
A fuzzy logic controller uses three steps in fuzzy control: fuzzification of nonfuzzy input variables, fuzzy inference to generate fuzzy control outputs, and defuzzification of the fuzzy outputs to generate nonfuzzy control outputs (see [2, 10, 11]). In order to implement fuzzy logic control, we first
FUZZY LOGIC CONTROLLED LEARNING
19
define the input and output variables to and from the controller and their fuzzy membership functions. The inputs I = {11,12 ..... I;} are ~" independent variables used to represent the state of the target system. They are often sensor signals and, therefore, I i ~ R n, where R n is an n dimensional real number space. The outputs O = { O 1 , O 2..... Oe) are ~: control signals that adjust the state of the target system and, therefore, 0 i ~ R m, where R m is an m dimensional real number space. For every variable V, such as I i or 0 i, defined in the system, a set of fuzzy membership functions {/xj(v),v ~ V) is defined in the universe of discourse of V: /zj(v) : v ~ [0,1],
j= 1,2,...,n,
(7)
where j is the jth fuzzy set defined in V, /zj(v) is the fuzzy membership function of the jth fuzzy set, and n is the number of fuzzy sets defined in V. Therefore, a fuzzy membership function txj(v) maps a nonfuzzy input value v c V to a membership value /z ~ [0,1] of a fuzzy set j. In this neural network system, since the goal is to control the learning rates of both the output neurons and the hidden neurons, we need two control signals that directly adjust the state of our system, that is, the two sets of learning rates defined by r/~' ~ r/° ,
k = l , 2 ..... m,
(8)
r/h ~ r/h ,
j = 1 , 2 ..... l,
(9)
where ~7° ~ [0,1], 7/h ~ [0,1], m is the number of output neurons, and l is the number of hidden neurons. As stated earlier, learning rates control is based on the output error of each neuron in the neural network. Thus, the input variables to the fuzzy controller should be directly related to the output errors of the neurons. Two error factors could be involved in the control process: the rate and the magnitude. In this study, only the magnitude factor is considered. The rate factor will be a subject for future research. The input variable for the learning rates control of output neurons is defined as epk = tpk --Ypk ~ e°,
(10)
n _ ~ epk o * Wkj o ~ e h, epj-
(11)
k=l
20
Q. H U AND D. B. H E R T Z
where e ° ~ ( - 1,1) and e h ~ ( - ~, + w), assuming that the sigmoid function is used as the transfer function for every neuron. The errors backpropagated to output neurons [defined in (5) and (6)] could be, but are not used as the input variables, because this would lead to the same problems in the backpropagation algorithm discussed in Section 2. Equations (10) and (11) provide more direct and sensitive control of the learning rates. Given the variables 'oo, 'oh, e o, and e h as defined, the fuzzy membership functions for each may then be defined using our knowledge of these variables. For example, if the learning rate 770 is 0.3, we may define it as small; it is defined as medium if it is 0.7 and large if it is 1.0. The same reasoning may be applied to "Oh. Thus, we define the same set of fuzzy membership functions for both 71o and 'oh in the universe of discourse [0,1]. The graphic representation of these membership functions is shown in Figure 1, where Z E represents zero, S small, M medium, and L large. The fuzzy membership functions for e ° may be defined in a similar O fashion. For example, if epk is 0.2, we may consider it a small error; if it is 0.4, we may consider it a medium error; if it is 0.6 or larger, we may consider it a large error. The graphic representation of these membership functions is shown in Figure 2, where N represents negative and P positive; the other labels are the same as in Figure 1. Defining the fuzzy membership functions for e h c a n be difficult, because we usually do not have direct experience with both the magnitude and the behavior of the errors of hidden neurons during a training process. We have observed that e h, as defined by (11), is usually on the order of 10 3 to 10 4, whereas the e °, as defined by (10), is usually on the order of 10 1 to 10 2. On the other hand, the effects of learning rates on the convergence of both output neurons and hidden neurons are the same. Therefore, rather than define a new set of membership functions for the hidden neurons, we use the maximum learning rate of all output neurons
NL
..0.6
NM
-0.5
Fig. 1.
-0.4
ZE
NS
..0.3
.,0..2
'.0.1
I~
0.1
0.2
PM
0.3
0.4
0.5
The fuzzy membership functions for learning rates.
0.6
F U Z Z Y LOGIC C O N T R O L L E D L E A R N I N G
ZE
S
21
M
L
1.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1.0
Fig. 2. The fuzzymembership functions for output errors.
as the learning rate of all hidden neurons, that is, h __ o ~?pjmax{~Tpk, k = 1,2 ..... m},
j = 1,2 ..... n.
(12)
3.2. FUZZY CONTROL PROCESSES The first process in fuzzy control is fuzzification, which maps an input value v ~ V to a set of fuzzy membership values {/xj ~ [0,1]} according to the membership function defined in the universe of discourse of V, as shown in (7): 19: L,~ {(ml,/xl(v)) ..... (mf,txf( c))},
(13)
where m i is the membership of v in the ith fuzzy set, /xi is the membership value of v in the ith fuzzy set, and f is the number of fuzzy sets applicable to v. The results of this fuzzification process are the fuzzy memberships {m~} and the respective fuzzy membership values {/x~} of the nonfuzzy input variables. For example, an input epk°_-0.32 may be classified as positive small with membership value 0.20 and positive medium with membership value 0.47. The fact that one input value may be mapped to more than one fuzzy set and, therefore, more than one fuzzy membership value is a powerful character of fuzzy sets, accomplished by overlapping the domain of fuzzy sets, as shown in Figures 1 and 2. With the fuzzy set memberships and the fuzzy membership values available, fuzzy inference proceeds using a fuzzy production system, which provides fuzzy control actions by matching fuzzy information with fuzzy
22
Q. HU AND D. B. H E R T Z
actions: I': {(ml,/Zl) .....
(mf,l~f)}--){(Cl,tXl)
..... (cf,l~f)},
(14)
where c i is the center value of the ith fuzzy action set determined by the fuzzy membership functions of the output variable O and /zi is the strength of the ith action determined by the fuzzy membership functions of the input variable I. In this rule system, or the fuzzy rule base, the heuristic and linguistic control knowledge may be coded in the form of IF conditions THEN actions. These rules represent expert knowledge and can be obtained through knowledge acquisition processes. For example, we might have the following fuzzy rules for our system: IF error is small
THEN learning rate is small.
Because the same set of membership functions is used for the two input variables (e ° and eh), the same set of membership functions is used for the two output variables (r/o and r/h), and the control principles are the same for both ~7° and r/h, we use the same set of fuzzy rules for the fuzzy inferences of both sets of neurons. Instead of listing every rule defined for the system, a concise tabular representation is used here to represent the heuristic control knowledge. The method is similar to that introduced by Kosko [10], which is especially powerful when multiple input a n d / o r output variables are involved. The complete fuzzy rule base for the system described here is shown in Table 1. In this table, e = {e°,e h} and r/={rt °, r/h}. The elements in the second row are conditions and the elements in the last row are respective actions. For any given input v ~ V, one or more fuzzy rule will be fired and, therefore, one or more fuzzy control action will be recommended, as indicated by (14), depending on how many fuzzy sets and fuzzy membership values are associated with v in the fuzzification process. For example, the input epk = 0.3 will fire rule #5 that recommends r/~k= small with a
TABLE 1 The Fuzzy Rule Base for Learning Rate Control Rule #
1
2
3
4
5
6
7
e
NL L
NM M
NS S
ZE ZE
PS S
PM M
PL L
FUZZY LOGIC CONTROLLED LEARNING
23
strength of 0.20 and rule # 6 that recommends %k = m e d i u m with a strength of 0.47. Therefore, the outputs from the fuzzy inference process are a set of fuzzy control actions associated with strengths. However, the control signal for one variable cannot be fuzzy and must be single-valued. This is accomplished by the defuzzification process in which a set of fuzzy information is mapped into a nonfuzzy control value:
~l~' {(Cl, jIZI) .....
(C/,]&f)}----)"~,
(15)
where ~ is the nonfuzzy control signal for learning rate 77. Many mapping functions have been proposed [2, 10]. Here we use the centroid defuzzification function [10] defined by
2f= lCj /J~j" "~-- y:f:, "j
(16)
Given an e~k = 0.3, for example, then f = 2, c s and c M can be found equal to 0.3 and 0.7 from the fuzzy membership functions of ~7°, and the /xs and /xM are 0.20 and 0.47 determined by the fuzzy membership functions of e ° in the fuzzification process. The new learning rate for the output neuron k will be calculated as
=
0.3 * 0.2 + 0.7 * 0.47 0.2 + 0.47 = 0.58.
(17)
The value 0.58 will be assigned to output neuron k as its learning rate for the next training epoch. The same procedure is applied to each output neuron and hidden neuron at every training epoch. 3.3. F U Z Z Y BACKPROPAGA TION ALGORITHMS
A fuzzy backpropagation algorithm (FBP) is designed based on the original backpropagation algorithm (BP) and the control mechanisms presented in foregoing subsections: FBP:(BP:
=.I.(r(O(e;k))).
h _ max{rtpk, o k = 1 , 2 , . . . , m } , p = 1,2, .. . , ~ ) . ~Tpj-
24
Q. HU AND D. B. H E R T Z
The FBP algorithm utilizes the precise BP algorithm except a preselected r/. In the FBP algorithm, "qpkS are adjusted using the fuzzy logic controller at every training epoch. The r/phjs are determined by the maximum of all output neurons' learning rates for all hidden neurons at every training epoch. Adding a momentum term in the weight update equations [14] of the two algorithms produces two variations--BPM and FBPM:
(18)
W(t + 1) = W ( t ) + ~ W + c~~ W ( t ) ,
where a is the coefficient of the momentum term. Evaluations of the four algorithms using the same set of problems and the same set of network configuration parameters are presented in the next section. 4. RESULTS OF COMPARATIVE TESTING It is difficult to design satisfactory comparative tests that both subjectively and validly compare the performances of different neural network algorithms proposed in various studies. One reason for this is that the implementation details (which usually have a significant effect on the convergence rate) of the proposed algorithms may not be fully disclosed in the published materials. The use of nonstandard terminologies makes this problem more difficult to resolve. In this study we do not directly compare our methods with others. Rather we use the standard backpropagation algorithm [14] as a benchmark. The following definitions are used to detail the implementations of suggested algorithms. DEFINITION 1. For a given network N =(L i, L h..... L h, L°), a training set X = (i'l,...,x';), and a target set T = (~l,...,t'e), a training epoch, t, is defined as the completion of the following operations in the sequence:
DEFINITION 2. For a given network N = (L i, Lh, ''', L h, L°), a training set X = ( g 1..... go), and a target data set T = ( i ' 1..... t'~), N is considered converged if and only if
max{max{
..... m } , p = l , 2 , . . . ,
~:}~e
(20)
FUZZY LOGIC CONTROLLED LEARNING
25
is reached within a predefined maximum training epochs ~ , where E is a predefined error level of convergence for a given problem. One major concern in comparing BP algorithms with FBP algorithms is that BPs require preselected learning rates that have a significant effect on the convergence rate. Thus, each BP or BPM network is pretested using learning rate 0.2, 0.4, 0.6, 0.8, and 1.0 on every testing data set. The learning rates that result in fastest convergence are selected for the BP and the BPM in the subsequent comparative tests. The effect of initial values of the two weight matrices on the convergence rate is another question. Experiments have shown that other factors being equal, this effect can be significant. To minimize this side effect, every neural network is tested 25 times with the weight matrices being randomly initialized in the same range [ - r , r] for each test data set using the system clock value as the random seed. Because the random number generator does not produce the same set of random values each time it is used, the training cycles at convergence vary from each individual test even with the same test data set. Under these conditions, a number of pattern mapping problems are used to test and compare each of the algorithms discussed in Section 3. Only the results of three representative problems, i.e., the parity problem, the character problem, and the ASCII problem, are presented here. The parity problem [12, 14] requires neural networks to output 1 if the input pattern contains an odd number of ls and 0 otherwise. Therefore, it is a more general form of the X O R problem. Our test network for the parity problem has three input neurons, six hidden neurons, and one output neuron. ~/= 1.0 produces the fastest convergence for both BP and BPM algorithms, r = 0.1, e = 0.1, l~ = 20,000 and c~= 0.5. The training set consists of seven patterns. Many applications of learning algorithms require large numbers of input and output neurons and training patterns. The algorithms that perform well in small test problems do not necessarily work well for large applications, especially those that require intensive computation and massive storage space for intermediate results. The character problem is designed as a medium range application problem, which requires the neural network to encode and decode the images of 26 characters in the alphabet. Each image is represented by a 7 by 7 bit map. The networks for this problem have 49 input neurons, 15 hidden neurons, and 49 output neurons. ~/= 1.0 for BP and r/= 0.6 for BPM produce the fastest convergence, r = 0.5, • = 0.1, l~ = 500, and c~= 0.5. The training set consists of 26 patterns. The ASCII problem converts a character's 49-element image vector to
26
Q. H U A N D D. B. H E R T Z
its 8-element ASCII code. Like the character problem, it is a typical medium range application problem. The network has 49 input neurons, 10 hidden neurons, and 8 output neurons, r/= 1.0 produces the fastest convergence for both BP and BPM algorithms, r = 0 . 5 , e = 0 . 1 , f l = 5 0 0 , and a = 0.5. The training set consists of 26 patterns Table 2 shows the results of the tests conducted using these three problems and the four backpropagation algorithms. Note that the maxim u m and minimum training epochs are the values of converged tests. If the training epochs of a test reached the limit ~ and the network was not converged, the test was aborted and counted as a nonconverged test. The results of the three groups of comparative tests have demonstrated some interesting characteristics of the fuzzy backpropagation learning algorithms. First and foremost, the fuzzy logic controller stabilizes the training processes and reduces the probability of falling into local energy minima, which results in very slow or no convergence processes. This is clearly indicated by the larger number of converged tests out of the 25 trials and the smaller ~r in converged tests. The convergence ratios of the FBPs are about 2 to 3 times higher than the BPs in the three test problems. H e r e the use of the fuzzy logic controller greatly improves the uncertainty factor of the standard backpropagation algorithm. By using the fuzzy logic controller to dynamically adjust the learning rates of every learning neuron, the need for a preselected learning rate is eliminated while good convergence rates are retained. The convergence rates of the FBPs match the best rates of the BPs. This is particularly important to applications where prior knowledge of the training data set is not available or exhaustive searches for the best learning rate are too costly. It also frees the user from nonproductive training processes due to the use of inappropriate learning rates.
TABLE 2 Test Result of the Fuzzy Algorithms Parity Char ASCII Algorithm BP BPM FBP FBPM BP BPM FBP FBPM BP BPM FBP FBPM Tested 25 25 25 25 Converged 7 7 22 21 max 3445 698 1462 985 min 978 540 1006 634 1343 619 1231 730 858 54 127 82
25 25 9 5 335 256 44 103 122 171 89 61
25 25 136 54 73 19
25 20 275 42 88 47
25 25 10 16 362 494 81 48 137 122 77 112
25 25 221 76 114 36
25 24 236 52 105 43
FUZZY LOGIC CONTROLLED LEARNING
27
It is also demonstrated in the results that the m o m e n t u m term accelerates convergence but also decreases the stability of the training processes, as indicated by fewer training epochs and fewer converged tests. This is consistent with the general knowledge of the effect of m o m e n t u m terms in the backpropagation algorithms. The preceding results disguise an important effect of the learning rate on the convergence of BP and BPM. It seems that r/= 1.0 is best for all the problems. In fact, a large learning rate usually results in an unstable training process that is more sensitive to other parameters, for example, the initial values of weights. The consequence is that it may either converge faster or not at all. The results in Table 3 clearly demonstrate a typical pattern of ~'s influence on convergence of backpropagation methods: a conflict between convergence rate and convergence ratio, with T/= 0.6 for BP and 77= 0.4 for BPM producing the most converged tests, while r t = 1.0 for BP and 77= 1.0 for BPM produce the fastest convergence rate. 5. S E N S I T I V I T Y A N A L Y S I S A significant issue in the fuzzy logic solution is the sensitivities of fuzzy logic controlled training processes to the variations of fuzzy rules and fuzzy membership functions. If a unique set of rules and parameters of membership function have to be used for every new problem, then the fuzzy logic solution may have little practical value. In this section, we address this issue by examining the effect of variations of fuzzy set definitions on the convergence rate and the convergence ratio of the testing problems. We will not consider the effect of variations of fuzzy rules because there are, in essence, only four rules in this system. They represent the common-sense relationship between the output error and the learning rate of a neural
TABLE 3 The Effect of Learning Rate on the Character Problem Algorithm Converged max min
0.2
0.4
BP 0.6
0.8
1.0
0.2
0.4
BPM 0.6
0.8
1.0
9 362 276 311 25
12 392 128 202 76
15 342 77 160 81
14 397 58 138 101
9 335 44 122 89
3 448 232 336 88
10 385 111 210 90
5 256 103 171 61
3 382 71 238 128
5 385 73 217 117
28
Q. HU AND D. B. H E R T Z
network at any time in a training process. Significant variations seem unlikely to occur. On the other hand, there exist many ways to define the fuzzy membership functions and their parameters. Therefore, we focus on the variations of fuzzy membership functions. Any triangular fuzzy set defined in an x-y plane can be fully represented by a triple (l, c, u), where l, c, and u are the x coordinators of the fuzzy set's lower, center, and upper boundaries, assuming the height of the triangle is always 1. The fuzzy membership function defied by this fuzzy set is v -- l k Ck--lk,
I k <~V ~ C k ,
U --U k Ck--Uk ,
Ck ~ U ~ U k ,
m(v) =
(21)
where v is the fuzzy variable. In this paper, v is either e °, e h, or r/. Because we have two sets of fuzzy sets defined in this study, we will use subscript i for fuzzy sets of the output errors and j for fuzzy sets of the learning rate. The effect of the changes of fuzzy set definitions on the convergence rates of the fuzzy logic controlled training processes may be examined by the two derivatives c)~
O-~
0% '
OIXj "
(22)
From (16), and because at most two fuzzy sets overlap in the system, we get o5 Ocj
txj + txj+ 1 '
0~
cjlxj+l-cj+llXj+l
o~j
(~j+&+l)2
(23)
'
(24)
where cj is the center value of the jth fuzzy set defined for learning rate r/, as shown in Figure 1, and /zj is the strength of the action defined by the jth fuzzy set of output error e ° o r e h, tzj = Idbi in this particular case. Because O ~ / O c j > O, shifting the fuzzy sets in Figure 1 inward will decrease the value of the learning rate generated by the fuzzy controller at each training epoch, and shifting the fuzzy sets outward will increase the
FUZZY LOGIC CONTROLLED LEARNING
29
value of the learning rate generated by the fuzzy controller at each training epoch. To determine the sign of the second derivative, more information about ~ is required. The partial derivative of p~ with respect to the center value of the ith fuzzy set of the output error is
U
--
li
(ci_li)2 ~0,
li<~t'<~Ci,
~t-Li - 3C i
t'--r i (Ci--bli) 2 >/0,
(25)
Ci~U~l.l i.
If l i ~ u ~ c i , t h e n ~.~i//~ci~O, c i > c i 1, cj>cj 1, and a-~/ a~j >1 o. Therefore, shifting the fuzzy sets in Figure 2 outward will decrease the membership value/~i and subsequently decrease the learning rate and vice versa. If ci~t~'~u i, then c~i/Oci>~O, Ci
30
O. HU AND D. B. HERTZ ASCII
TEST
160
1SO
1,to
230
.~ U
120
110
200
Z 9o
8o
70
6o
L
I
I
o
C
l
• [3
llCll- i
+
motm otis-
CHAR
C
0
litlL-o
O
et&-o
TEST
120
110
1.00
•
90
~
7o
J~ U
Z
60
S0
40
I
I
o
c • eca-i
Fig. 3.
+
metm ota-c
The effect o f fuzzy set variations on convergence rate.
FUZZY
LOGIC CONTROLLED
LEARNING
PARITY
31
TEST
1.8
1.7
1.6
1.5 m 1.4
1.3
X
1.2
i.i
1
0.9
I
I
o
c
@ 0
Ota
+
- :1.
IIot
1 l
II
otah-c
0
Ota-o
Fig. 3. Continued.
o f n e u r o n l e a r n i n g rates d u r i n g t h e training process. A l t h o u g h the variations o f fuzzy sets i n d e e d c h a n g e t h e m a g n i t u d e o f t h e s e a d j u s t m e n t s , the d y n a m i c n a t u r e o f fuzzy c o n t r o l is n o t changed. W e w o u l d expect that t h e v a r i a t i o n s on the fuzzy sets w o u l d n o t have a significant effect on the stability o f the training processes. This e x p e c t a t i o n is s u p p o r t e d by t h e e m p i r i c a l tests. T h e effects o f fuzzy set v a r i a t i o n s on the c o n v e r g e n c e ratios o f the t h r e e p r o b l e m s a r e p r e s e n t e d in T a b l e 4. It can b e s e e n that
TABLE 4 The Effect of Fuzzy Set Variations on Convergent Ratio Fuzzy Set (eset) 0 c i
o
Parity c i
o
Char c i
o
ASCII c i
Parity-m o c i
Char-m o c i
ASCII-m o c i
25 25 25 25 24 25 25 25 24 25 25 24 24 24 22 25 25 25 25 25 25 25 25 24 25 25 25 25 23 21 20 19 23 25 24 24 25 23 23 25 23 23 25 25 25 22 19 13 18 15 15 22 23 21
32
Q. HU AND D. B. HERTZ
the effect of the momentum term on the convergence ratio is more significant that the effect of the variations of fuzzy sets. 6. CONCLUSION The slow and uncertain convergence of neural network training processes originates from the iterative nature of finding the weight matrices using the steepest gradient descent and the static control parameters for these dynamic processes. One solution to this problem has been the use of dynamically adjusted learning rates for each of the learning neurons in the neural network systems. However, the implementations of such adaptive mechanisms in traditional analytical methods introduced new problems. Fuzzy logic offers a better mechanism of dynamic control with heuristic knowledge. The resulting fuzzy backpropagation algorithms have demonstrated significant improvement on the stability of backpropagation training in a variety of pattern mapping problems. Overall, the FBP algorithms have produced 2 to 3 times more converged tests than the BP algorithms while retaining the best convergence rate the BP algorithm can reach. The results from the sensitivity analysis and testing clearly indicate that the variations in the definitions of fuzzy sets have some effect on the convergence rates of the fuzzy logic backpropagation algorithms. However, the major advantage of the fuzzy algorithms, that is, the ability to avoid local energy minima in the training processes, is not significantly influenced. This is an important conclusion of this study. It implies that the similar fuzzy set definitions can be used for a variety of problems without having difficulties in obtaining converged training processes. However, extreme cases, such as a few fuzzy sets defined over a wide domain of a variable, are not considered. In the proposed system, the learning rate is defined in the domain of [0,1], and the output error is defined in the domain of [-1,1]. Under these constraints, it is not desirable to define fewer than three or more than five fuzzy sets for each variable. The testing results of all three problems do not support the conclusion [1] that fuzzy backpropagation is much faster than the standard backpropagation. Fuzzy logic does not change the nature of the backpropagation, but helps the backpropagation method to reach better performance through dynamic adjustment of learning rates. REFERENCES 1. P. Arabshahi, J. J. Choi, R. J. Marks, and T. P. Caudell, Fuzzy control of backpropagation, in: Proceedings of the IEEE International Conference on Fuzzy Systems (San Diego, CA), 1992, pp. 967-972.
FUZZY
LOGIC
CONTROLLED
LEARNING
33
2. H. R. Berenji, Fuzzy logic controller, in: R. Yager and L. A. Zadeh (eds.), An Introduction to Fuzzy Logic Applications in the Intelligent Systems, Kluwer Academic, Norwell, MA, 1992, pp. 69-96. 3. R. P. Brent, Fast training algorithms for multilayer neural nets, IEEE Trans. Neural Networks 2(3):346-354 (1991). 4. H. A. C. Eaton and T. L. Oliver, Learning coefficient dependence on training set size, Neural Networks 5:283-288 (1992). 5. J. A. Freeman and D. M. Skapura, Neural Networks: Algorithm, Applications, and Programming Techniques, Addison-Wesley, Readings, MA, 1991. 6. O. Fujita, Optimization of the hidden unit function in feedforward neural networks, Neural Networks" 5:755-764 (1992). 7. M. S. Gyer, Adjuncts and alternatives to neural networks for supervised classification, IEEE Trans. Systems, Man, Cybernetics 22(1):35-46 (1992). 8. Q. Hu and D. B. Hertz, Developing fuzzy adaptive learning rates for neural networks, in: Proceedings of the First International Conference on Fuzzy Theory and Technology (Durham, NC), 1992. 9. R. A. Jacobs, Increased rates of convergence through learning rate adaption, Neural Networks" 1:295-307 (1988). 10. B. Kosko, Neural Networks and Fuzzy Systems, Prentice-Hall, Englewood, Cliffs, N J, 1992. 11. E. H. Mamdani and S. Assilian, An experiment in linguistic synthesis with a fuzzy logic controller, Int. J. Man-Machine Studies 7(1):1-13 (1975). 12. C. Noe and L. J. Kohout, Accelerating the backpropagation algorithm by increasing the sharpness of the sigmoid function, in: Proceedings of the 5th Florida Artificial Intelligence Research Symposium (Ft. Lauderdale, FL), 1992. 13. A. van Ooyen and B. Nienhuis, Improving the convergence of the back-propagation algorithm, Neural Networks 5:465-471 (1992). 14. D. E. Rumelhart, G. E. Hinton, and R. J. Williams, Learning internal representations by error propagation, in D. E. Rumelhart and J. L. McClelland (eds.), Parallel Distributed Processing, vol. 1, MIT Press, Cambridge, MA, 1986. 15. P. Saratchandran, Dynamic programming approach to optimal weight selection in multilayer neural networks, IEEE Trans. Neural Networks 2(4):465-468 (1991). 16. S. Shah, F. Palmieri, and M. Datun, Optimal filtering algorithms for fast learning in feedforward neural networks, Neural Networks 5:779-787 (1992). 17. S. Singhal and L. Wu, Training feed-forward networks with the extended Kalman filter, in: Proceedings of the IEEE International Conf. on Acoustics Speech and Signal Processing (Glasgow, Scotland), IEEE Press, New York, 1989, pp. 1187-1190. 18. W. S. Stornetta and B. A. Huberman, An improved three layer backpropagation algorithm, in: Proceedings of the IEEE First International Conf. on Neural Networks (San Diego, CA), 1987. ! 9. T. P. Vogl, J. K. Mangis, A. K. Rigler, W. T. Zink, and D. L. Alkon, Accelerating the convergence of the back-propagation method, Biol. Cybernet. 59:257-263 (1988). 20. L. A. Zadeh, Fuzzy sets, Inform. Control 8:338-353 (1965). Received April 1992; rel,ised April 1993; September 1993; accepted November 1993