Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
Contents lists available at ScienceDirect
Neurocomputing journal homepage: www.elsevier.com/locate/neucom
Common nature of learning between BP-type and Hopfield-type neural networks$ Dongsheng Guo a,b,c, Yunong Zhang a,b,c,n, Zhengli Xiao a,b,c, Mingzhi Mao a,b,c, Jianxi Liu a,b,c a
School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China SYSU-CMU Shunde International Joint Research Institute, Shunde, Foshan 528300, China c Key Laboratory of Autonomous Systems and Networked Control, Ministry of Education, Guangzhou 510640, China b
art ic l e i nf o
a b s t r a c t
Article history: Received 29 August 2014 Received in revised form 14 April 2015 Accepted 14 April 2015 Communicated by J. Zhang
Being two famous neural networks, the error back-propagation (BP) algorithm based neural networks (i. e., BP-type neural networks, BPNNs) and Hopfield-type neural networks (HNNs) have been proposed, developed, and investigated extensively for scientific research and engineering applications. They are different from each other in a great deal, in terms of network architecture, physical meaning and training pattern. In this paper of literature-review type, we present in a relatively complete and creative manner the common natures of learning between BP-type and Hopfield-type neural networks for solving various (mathematical) problems. Specifically, comparing the BPNN with the HNN for the same problem-solving task, e.g., matrix inversion as well as function approximation, we show that the BPNN weight-updating formula and the HNN state-transition equation turn out to be essentially the same. Such interesting phenomena promise that, given a neural-network model for a specific problem solving, its potential dual neural-network model can thus be developed. & 2015 Elsevier B.V. All rights reserved.
Keywords: Neural networks Common nature of learning BP-type Hopfield-type Problem solving
1. Introduction As a branch of artificial intelligence, artificial neural networks (ANNs) have attracted considerable attention as candidates for novel computational systems [1–5]. Benefiting from parallel processing capability, inherent nonlinearity, distributed storage and adaptive learning ability, a rich repertoire of ANNs (such as those shown in Fig. 1) have been developed and investigated [6–17]. They have been applied widely in many scientific and engineering fields, such as data mining, classification and diagnosis, image and signal processing, control system design, and equations solving. Generally speaking, an ANN is an interconnected group of artificial neurons that uses a mathematical or computational model for information processing based on a connectionistic approach to computation [4,5]. In view of this point, ANNs can also be termed as simulated neural networks or simply neural networks. Note that, according to the definition of ANN, the combination of inputs, outputs, neurons, and connection weights constitute the architecture of a neural network. Therefore,
☆ This work is supported by the 2012 Scholarship Award for Excellent Doctoral Student Granted by Ministry of Education of China (No. 3191004), and by the Foundation of Key Laboratory of Autonomous Systems and Networked Control, Ministry of Education, China (No. 2013A07). Besides, kindly note that all authors of the paper are jointly of the first authorship. n Corresponding author at: School of Information Science and Technology, Sun Yat-sen University, Guangzhou 510006, China. E-mail address:
[email protected] (Y. Zhang).
ANNs can be classified in different categories in terms of architecture. Specifically, according to the nature of connectivity, neural networks can be classified into two categories: feedforward neural networks and feedback neural networks [4,5]. In general, the working-signal is not fed back between neurons of feedforward neural networks; while the working-signal is fed back between neurons in recurrent neural networks. That is, feedforward neural networks have no loops; while feedback neural networks have loops because of feedback connections [3,5]. More specifically, in a feedback neural network, signals propagate in only one direction, from an input stage through intermediate neurons to an output stage. Therefore, data from neurons of a lower layer are sent to neurons of an upper layer by feedforward connection networks. By contrast, in a feedback neural network, signals propagate from the output of any neuron to the input of any neuron, and thus they bring data from neurons of an upper layer back to neurons of a lower layer [4,5]. Being one classical feedforward neural network, the one based on the error back-propagation (BP) algorithm, i.e., BP neural network, was developed through the work of Werbos [18], Rumelhart, McClelland and others [8] in the mid 1970s and 1980s. By following such inspiring thoughts, more and more neural networks based on the BP algorithm or its variants (termed, BP-type neural networks, BPNNs) have been developed and involved in many theoretical analyses and real-world applications [13–15,19–29]. In general, with a number of artificial neurons connected, a feedforward neural network can be constructed, e.g., the one shown in the left of Fig. 1. Then, by means of the iterative BP training procedure, such a neural
http://dx.doi.org/10.1016/j.neucom.2015.04.032 0925-2312/& 2015 Elsevier B.V. All rights reserved.
Please cite this article as: D. Guo, et al., Common nature of learning between BP-type and Hopfield-type neural networks, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.04.032i
D. Guo et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
2
problem; and, secondly, an HNN is developed to evolve along the typical negative-gradient direction of the energy function until the minimum of the energy function is reached. In mathematics, the dynamics-model description of such an HNN for solving a specific problem f ðXÞ ¼ 0 [with f ðÞ being a smooth linear or nonlinear mapping] can be summarized simply and readily as
Artificial Neural Networks
Connection? . . . ...
BPNNs
.. . ...
Link?
. ... .. ...
HNNs
Fig. 1. Two main categories in the researches of neural networks, where the term “BPNNs” represents BP-type neural networks, and the term “HNNs” represents Hopfield-type neural networks. Like different neurons in a neural-network model connecting with each other, there may also exist links between different types of neural networks. Thus, a special and interesting question would be asked: Does there exist a connection/link between such two famous neural networks (i.e., BPNNs and HNNs)?
network would have the marvelous approximation, generalization and prediction abilities. Note that, theoretically speaking, a 3-layer feedforward neural network can approximate any nonlinear continuous function with an arbitrary accuracy. Besides, for these BPNNs, the classical error back-propagation algorithm can be summarized simply as ∂eðWÞ Wðk þ 1Þ ¼ WðkÞ þ ΔW ¼ WðkÞ η ð1Þ ∂W W ¼ WðkÞ where W denotes a vector or matrix of neural weights, and k ¼ 0; 1; 2; … denotes the iteration index during the training procedure. In addition, ΔW denotes the weight-updating value at the kth iteration of the training procedure, with η denoting the learning rate (or termed, learning step-size) which should be small enough. Furthermore, e(W) denotes the error function that is used to monitor and control such a BP-training procedure. By iteratively adjusting the weights and biases of the neural network for the sake of minimizing the network-output error e(W), such a BPNN can be trained for mathematical problems solving, function approximation, system identification, or pattern classification. Nowadays, the BPNN is one of the most widely used neural-network models in the computational-intelligence research and engineering fields [13–15,21–29]. For example, Xu et al. developed a BPNN model to map the complex non-linear relationship between microstructural parameters and elastic modulus of the composite [25]. In [26], a BPNN improved by genetic algorithm (GA) was established by Zhang et al. to model the relation between welding appearance and the characteristics of the molten-pool-shadows. Besides, for integrating the BPNN with GA, an iteration optimization approach was presented and investigated by Huang et al. [27]. Being another classical neural network but with feedback, Hopfield neural network was originally proposed by Hopfield in the early 1980s [9]. This seminal work has inspired many researchers to investigate alternative neural networks for solving a wide variety of mathematical problems arising in numerous fields of science and engineering (e.g., matrix inversion in robotic redundancy resolution). Thus, lots of Hopfield-type neural networks (HNNs) have been proposed, developed and investigated [4,5,10,15–17,30–33], e.g., the one shown in the right of Fig. 1. Note that such HNNs are, sometimes, called gradient neural networks, in the sense that the gradientdescent method is generally exploited in the HNN design process. Traditionally speaking, to obtain an HNN for a specific problem solving, a norm-based scalar-valued lower-bounded energy function is firstly constructed such that its minimal point is the solution to the
∂εðXÞ X_ ¼ γ ; ∂X
ð2Þ
where γ 4 0 A R is used to scale the convergence rate of the HNN model, and X_ denotes the time derivative of state X of the HNN. In addition, the energy function εðXÞ ¼ ‖f ðXÞ‖2 =2 with ‖ ‖ denoting the two-norm of a vector or the Frobenius norm of a matrix accordingly. Evidently, derived from (2), the resultant HNN models are generally depicted in explicit dynamics. In other words, such HNNs would exhibit dynamical behavior; i.e., given an initial state, the state of an HNN evolves as time elapses [4,5]. If the HNN is stable without oscillation, an equilibrium state can eventually be reached, which is exactly the solution to the problem f ðXÞ ¼ 0. Recently, due to the in-depth research on neural networks, the artificial neuraldynamic approach based on HNNs has been viewed as a powerful alternative to online computation and optimization [4,5,15–17, 30–33]. Especially, being a class of HNNs, fractional/fractional-order neural networks have been designed, analyzed, and applied to different applications [34–39]. Comparing BPNNs and HNNs (with their general architectures shown in Fig. 1), we find that they are different from each other in a great deal, e.g., with different origins, concepts, definitions, physical meanings, structures, and training patterns. However, as we may realize, the presented BP algorithm (1) is essentially a gradient-descent based error-minimization method, which adjusts the weights to bring the neural-network input/output behavior into a desired mapping as of some specific application tasks or environments. In addition, as mentioned previously, the gradientdescent method is exploited to construct HNNs for problems solving. Thus, we have asked ourselves a special question: does there exist a relationship between BP-type and Hopfield-type neural networks for a specific problem solving? The answer appears to be YES, which is illustrated in this paper of literaturereview type through six positive and different examples together with an application to function approximation. In detail, this paper presents the common natures of learning between BPNNs and HNNs for solving various mathematical problems encountered in science and engineering fields. More specifically, comparing the BPNN and the HNN for the same problemsolving task (e.g., matrix inversion or function approximation), we show that the BPNN weight-updating formula and the HNN statetransition equation turn out to be essentially the same (or say, mathematically the same). In other words, such two neural networks can essentially possess the same mathematical expressions and computational results, though they are deemed completely different. The interesting phenomenon makes us believe that, given a neural-network model for a specific problem solving, its potential dual neural-network model can be developed correspondingly. Note that HNNs and BPNNs have stood in the fields of neural networks for decades (specifically, since the work of Hopfield [9] in 1982, the work of Werbos [18] in 1974 and the work of Rumelhart and McClelland et al. [8] in 1986). Differing from the separate researches on such two different-type neural networks, this paper, in a relatively complete and creative manner, reveals the connections/links (or termed, common natures of learning) between BPNNs and HNNs for various (mathematical) problems solving which are enriched with an application to function approximation. It is an evident state-of-the-art merit, as it establishes the significant and unique research bridge between BPNNs and HNNs.
Please cite this article as: D. Guo, et al., Common nature of learning between BP-type and Hopfield-type neural networks, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.04.032i
D. Guo et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
3
a matrix) and exploiting the gradient-descent design method, we have the following continuous-time HNN model for matrix inversion [4,5,41]: X_ ðtÞ ¼ γ AT ðAXðtÞ IÞ;
ð4Þ
where γ is defined as before [i.e., below (2)]. In addition, XðtÞ A R denotes the state matrix corresponding to theoretical inverse A 1. Note that the architecture of such an HNN is similar to that of the HNN shown in the right of Fig. 1, and is thus omitted here (due to the figure similarity and space limitation). In order to compare with the BPNN model for matrix inversion, we can discretize HNN (4) via Euler forward-difference rule [42] so as to have the corresponding discrete-time HNN model: nn
Xðk þ 1Þ ¼ XðkÞ ηAT ðAXðkÞ IÞ; Fig. 2. Simple and effective two-layer BPNN architecture for matrix inversion.
2. BPNN and HNN for matrix inversion In this section, let us take the matrix inversion (which may be encountered in robotic redundancy resolution) [4,5,40,41] as a case study. We can now develop the corresponding BPNN and HNN via their characteristic design procedures. As we know, in mathematics, the matrix inversion problem can be generally formulated as AX ¼I, where coefficient matrix A A Rnn , identity matrix I A Rnn , and X A Rnn denote the unknown matrix to be obtained. To lay a basis for further discussion, matrix A is assumed to be nonsingular hereafter (which implies that the transpose of matrix A, i.e., AT , is nonsingular as well). On one hand, to obtain the inverse of nonsingular matrix A A Rnn (i.e., A 1), we can construct a two-layer BPNN model with its architecture depicted in Fig. 2. For structure and implementation simplicity, the linear activation function is adopted to construct all neurons used in the BPNN model with each threshold set to be zero. According to Fig. 2, the input vector y A R1n , weight matrix W A Rnn and output vector z A R1n of the neural network can be defined respectively as 2 3T 2 3 2 3T y1 w11 w12 ⋯ w1n z1 6 7 6 w21 w22 ⋯ w2n 7 6 z2 7 6 y2 7 6 7 6 7 7 y¼6 7; z ¼ 6 7 : 6⋮ 7 ; W ¼6 ⋮ ⋱ ⋮ 5 4⋮ 4⋮ 5 4 5 yn wn1 wn2 ⋯ wnn zn
where η 4 0 A R should be set appropriately small, and iteration index k ¼ 0; 1; 2; …. For presentation convenience, (5) is called the HNN model in this paper for matrix inversion. Besides, Eq. (5) is also termed the neural-state transition equation, showing the neural state changing from X(k) to Xðk þ 1Þ in the presented HNN model. Evidently, after a sufficient number of neural-state-matrix (or say, state-matrix) transitions, the following convergence result can be guaranteed for HNN model (5): limk- þ 1 XðkÞ ¼ X n ¼ A 1 . In other words, the HNN model (5) can solve the problem of matrix inversion as well, in the manner that its state matrix X(k) converges to theoretical inverse A 1 after a sufficient number of state-matrix transitions.
3. Unification, comparison and illustration Comparing the BPNN weight-updating formula (3) and the HNN state-transition Eq. (5) carefully, we can observe that, although the presented BPNN and HNN are completely different, they essentially possess the same mathematical expressions. Specifically, for matrix inversion, the weight matrix W(k) of (3) in BPNN corresponds exactly to the state matrix X(k) of (5) in HNN. Therefore, unifying via the same computational-intelligence governing equation, we find out a common nature of learning (or say, connection, link, relation, unification) between BP-type and Hopfield-type neural networks. That is, the BPNN and HNN governing equations for a specific problem solving can finally be unified to be the same one. More specifically, for matrix inversion, the unified governing equation is written as Uðk þ 1Þ ¼ UðkÞ ηAT ðAUðkÞ IÞ;
Therefore, the relationship between the input y and output z of the BPNN model can be formulated as z¼ yW. When the BPNN model shown in Fig. 2 is applied to computing the inverse of A, based on (1), the weight-updating formula for such a BPNN model is formulated explicitly as follows (of which the details are given in Appendix A): Wðk þ1Þ ¼ WðkÞ ηAT ðAWðkÞ IÞ;
ð3Þ
where k and η are defined as before. Evidently, compared with conventional BP learning algorithms as variants, the weight-updating formula (3), being the basis, may show a beautiful simplicity of the presented BPNN model. It is worth pointing out that, after the model shown in Fig. 2 being trained through the BPNN weight-updating formula (3) in a sufficient number of iterations, we can have the following convergence result: limk- þ 1 Wðk þ 1Þ ¼ limk- þ 1 WðkÞ ¼ W n ¼ A 1 . In other words, this simple-structure BPNN model can solve the problem of matrix inversion in the manner that its weight matrix W converges to theoretical inverse A 1 after a sufficient number of weight-updating iterations. On the other hand, by defining the norm-based energy function ε ¼ ‖AX I‖2F =2 (with symbol ‖ ‖F denoting the Frobenius norm of
ð5Þ
ð6Þ
where U(k) represents the weight [i.e., W(k)] in the presented BPNN model or the state [i.e., X(k)] in the presented HNN model. For further investigation, comparison and illustration, the authors have simulated the BPNN and HNN unified governing Eq. (6) for matrix inversion, in which the matrix A is formulated as " # sin ð2Þ cos ð3Þ A¼ ð7Þ A R22 : cos ð4Þ sin ð3Þ The corresponding numerical results are shown in Fig. 3. Specifically, as seen from Fig. 3(a), U(k) (corresponding to the weight matrix W in the BPNN model or the state matrix X in the HNN model) converges to a unique steady-state solution, which is almost the same as the theoretical inverse A 1 ¼ ½0:1820; 1:2767; 0:8430; 1:1726 rounded to 4 decimal digits and written in MATLAB notation [15,42] for presentation convenience. In addition, from Fig. 3(b), we can see that the solution error synthesized by the BPNN and HNN unified governing Eq. (6) can reach the goal error (i.e., the error precision 10 10) with 77 iterations. This illustrates the efficacy of the presented BPNN and HNN models for matrix inversion. Note that the MATLAB neural network toolbox can be used to simulate the presented BPNN model
Please cite this article as: D. Guo, et al., Common nature of learning between BP-type and Hopfield-type neural networks, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.04.032i
D. Guo et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
4
Fig. 3. Numerical results synthesized by the BPNN and HNN unified governing equation depicted in (6) for inverting the matrix A depicted in (7). (a) Solution trajectories. (b) Solution-error convergence.
for matrix inversion, with the related MATLAB code and numerical results given in Appendix B. Besides, it is worth pointing out that, by following the previous design procedure (for matrix inversion), we can develop different BPNN and HNN models to solve other mathematical problems (e.g., linear system solving, and convex quadratic minimization) [15,43]. For better understanding and comparison, the governing equations of such BPNN and HNN models are shown in Table 1. As seen from the table, the significative result that there exists a common nature of learning between BP-type and Hopfield-type neural networks for each problem solving is proved again. Although the BPNN and HNN are quite different from each other, their derived expressions (i.e., the weight-updating formula and the state-transition equation) turn out to be essentially the same (for a specific problem solving). Therefore, we may have the important conjectural conclusion that, for the same problem-solving task, the BPNN and the HNN could mutually be dual models. In other words, once the HNN (or BPNN) is developed for solving a problem, the corresponding BPNN (or HNN) with the same governing equation can thus be developed, due to their common nature of learning.
and v ¼ ½v1 ; v2 ; …; vm T . Note that the target function for investigation is selected as cos ð3πτÞexpðτÞ=ð sin ðτÞ þ 2Þ þ 20. Then, the input–output relation for the BPNN shown in Fig. 4(a) is P v ¼ nj¼ 1 wj φj ðτÞ, where wj denotes the connecting weight from the jth hidden-layer neuron to the output-layer neuron. Thus, we obtain v ¼ aT ðτÞw, where aðτÞ ¼ ½φ1 ðτÞ; φ2 ðτÞ; …; φn ðτÞT is the inputactivation vector and w ¼ ½w1 ; w2 ; …; wn T is the weight vector. Furthermore, based on the literature [46], the following weightupdating formula of the BPNN model for function approximation is obtained:
4. Application to function approximation
x_ i ¼
For the purpose of further substantiating the common natures of learning between BPNNs and HNNs, two models of more general BPNN and HNN are constructed and applied to the function approximation in this section. Specifically, such two models for function approximation are shown in Fig. 4. As for the BPNN, we set all thresholds of the neural network to be zero and fix all connecting weights from the input layer to the hidden layer to be one. Besides, the input layer and the output layer are activated by linear functions. In addition, the hidden layer is activated by a group of order-increasing power functions and the number of neurons is 20 determined by the weights-and-structure-determination algorithm [44,45] (i.e., n¼20 shown in Fig. 4). To lay a basis for further discussion, the hiddenlayer activation functions are expressed as φj ðτÞ ¼ τj 1 with j ¼ 1; 2; …; 20 and τ A ½ 1; 1, and the data set of sample pairs is defined as ðτi ; vi Þj m i ¼ 1 with m¼ 2001 and the gap size being 0.001. Besides, in this paper, τi A ½ 1; 1 corresponding to the sample argument is the ith input, and vi A R corresponding to the function value is the ith target output. Therefore, the vectors of inputs and target outputs of BPNN are written respectively as τ ¼ ½τ1 ; τ2 ; …; τm T
where a~ ij denotes the ijth element of matrix AT A. In addition, b~ i ¼ a1i b1 þ a2i b2 þ ⋯ þ ami bm with asp and bl denoting the spth and lth elements of matrix A and vector b, respectively. Furthermore, via Euler forward-difference rule [42], the discrete-time HNN model is obtained as follows:
wðk þ1Þ ¼ wðkÞ ηAT ðAwðkÞ bÞ;
ð8Þ
where A ¼ ½aðτ1 Þ; aðτ2 Þ; …; aðτm Þ and b ¼ v. In addition, η and k are defined as before. On the other hand, as for the HNN, by defining the norm-based energy function ε ¼ ‖Ax b‖22 =2 (with symbol ‖ ‖2 denoting the two-norm of a vector and x denoting the state vector of the HNN) and exploiting the gradient-descent design method [4,5,41], we have the continuous-time HNN model x_ ¼ γ AT ðAx bÞ, where γ 4 0 is defined as before and x_ denotes the time derivative of x. With xi being defined as the ith element (or say, processing element, neuron) of state vector x, we have the following equation: T
dxi ¼ γ ða~ i1 x1 þ a~ i2 x2 þ ⋯ þ a~ in xn Þ þ γ b~ i ; dt
xðk þ 1Þ ¼ xðkÞ ηAT ðAxðkÞ bÞ;
ð9Þ
where iteration index k and learning rate η are defined as before. Note that (9) is called the HNN model for function approximation. Thus, we obtain xi ðk þ 1Þ ¼ xi ðkÞ ηða~ i1 x1 ðkÞ þ a~ i2 x2 ðkÞ þ ⋯ þ a~ in xn ðkÞÞ þ ηb~ i : Comparing (8) and (9), we can see that the BPNN weightupdating formula and the HNN state-transition equation are essentially the same. In other words, for function approximation, the weight vector w(k) of (8) in the BPNN corresponds exactly to the state vector x(k) of (9) in the HNN. Therefore, we find out the common nature of learning between BP-type and Hopfield-type neural networks again, but for function approximation. More
Please cite this article as: D. Guo, et al., Common nature of learning between BP-type and Hopfield-type neural networks, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.04.032i
D. Guo et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
5
Table 1 Different governing equations of BPNN and HNN for solving various mathematical problems, where U(k) [or u(k)] represents the weight in the BPNN model or the state in the HNN model at the kth iteration. No.
Mathematical problem
Governing equation
1
Matrix inversion AX ¼I
Uðk þ 1Þ ¼ UðkÞ ηAT ðAUðkÞ IÞ
2
Left Moore–Penrose inverse
Uðk þ 1Þ ¼ UðkÞ ηAT AðAT AUðkÞ AT Þ
3
AT AX ¼ AT Right Moore–Penrose inverse
Uðk þ 1Þ ¼ UðkÞ ηðUðkÞAAT AT ÞAAT
4
XAAT ¼ AT Linear system Ax¼b
5 6
Convex quadratic minimization min. xT Qx=2 þ pT x Convex quadratic program
uðk þ 1Þ ¼ uðkÞ ηAT ðAuðkÞ bÞ uðk þ 1Þ ¼ uðkÞ ηðQuðkÞ þ pÞ
min. xT Qx=2 þ pT x
uðk þ 1Þ ¼ uðkÞ ηC T ðCuðkÞ þ dÞ " # Q AT with C ¼ A 0
s.t. Ax ¼b
uðkÞ ¼ ½xT ðkÞ; λT ðkÞT , d ¼ ½pT ; b T
T
Fig. 4. Models of three-layer BPNN and HNN with common nature of learning for function approximation. (a) Three-layer BPNN model. (b) HNN model.
importantly, the unified governing equation for the function approximation is formulated as
5. Conclusions
uðk þ 1Þ ¼ uðkÞ ηAT ðAuðkÞ bÞ;
This paper has presented and investigated the common natures of learning between BPNNs and HNNs for various (mathematical) problems solving. That is, there exists a common nature of learning between the BPNN being a feedforward neural network and the HNN being a feedback neural network for solving a specific problem (e.g., matrix inversion or function approximation). Such a novel and significative result makes us believe that links possibly exist among many different types of neural networks (just like different neurons in the neural networks connecting with each other); e.g., at least between the presented BP-type and Hopfield-type neural networks. More importantly, this work may open a new research direction (to see if the same characteristic holds in many other cases), may provide a research method and theory for further investigation, and may promise to become a major inspiration for the studies in neural networks.
ð10Þ
where u(k) represents the weight [i.e., w(k)] in the presented BPNN model or the state [i.e., x(k)] in the presented HNN model. From Table 1, we can confirm that (10) is also the unified governing equation for the linear system solving. That is to say, the function approximation problem is transformed to a linear system problem (and then solved). For further illustration, we simulate the BPNN and HNN unified governing Eq. (10) for function approximation, and the corresponding numerical results are shown in Fig. 5. Specifically, Fig. 5(a) presents that the output synthesized by the BPNN and HNN unified governing Eq. (10) can approximate the target function cos ð3πτ ÞexpðτÞ=ð sin ðτÞ þ 2Þ þ 20 with acceptable small error. In addition, Fig. 5(b) shows that the approximation error decreases, as the iteration index k increases. Besides, the final element values of U(k) for the function approximation are presented in Table 2. Evidently, these results illustrate the efficacy of the presented BPNN and HNN models for the function approximation (or say, linear system solving). Therefore, from the above results, we draw a conclusion that the common nature of learning between the more general BP-type and Hopfield-type neural networks for the same problem-solving task can still hold true. These results further substantiate the common natures of learning between BPNNs and HNNs for various (mathematical) problems solving.
Acknowledgments The authors would like to thank the editors and anonymous reviewers for their valuable suggestions and constructive comments which have really helped the authors improve much further the presentation and quality of the paper.
Please cite this article as: D. Guo, et al., Common nature of learning between BP-type and Hopfield-type neural networks, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.04.032i
D. Guo et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
6
21
45 Target
40
Approximation
35
20.5
−3
30 25
20
4 20 2
20 15 19.5
10 5
19 −1
−0.8 −0.6 −0.4 −0.2
0
0.2
0.4
0.6
0.8
1
x 10
10 0 7
0 0
7.5
8
50
x 10
6
Iteration index
0
0
2
4
6
8
10
12
14 x 10
Fig. 5. Numerical results synthesized by the BPNN and HNN unified governing equation cos ð3πτÞexpðτÞ=ð sin ðτÞ þ 2Þ þ 20. (a) Approximation trajectories. (b) Approximation-error convergence.
Table 2 Final element values of U(k) for approximating the target function. 20.4470 23.0542 95.9707 2.4503
0.0241 180.8551 14.0349 54.2790
17.9721 30.6588 16.7354 7.9085
3.7860 43.7943 10.7934 49.1492
105.0854 4.7758 58.6832 1.3665
Appendix A A.1. Design procedure of BPNN for matrix inversion As presented previously, we have developed the BPNN model as shown in Fig. 2 for matrix inversion, together with its weightupdating formula (3). In this appendix, for completeness and better understanding, we present the detailed design procedure of such a BPNN for matrix inversion. 1. Network architecture: As shown in Fig. 2, the relationship between the input y and output z of the BPNN model is formulated as z ¼ yW:
ðA:1Þ
2. Training samples: As we know, the inverse of the matrix A A Rnn (i.e., A 1) satisfies the equation AA 1 ¼ I A Rnn with I being the identity matrix. When the BPNN model is applied to computing the inverse of A, matrix A can be rewritten as the following form: 2 3 2 3 a11 a12 ⋯ a1n a1 6 a21 a22 ⋯ a2n 7 6 a2 7 6 7 6 7 A¼6 7 ¼ 6 7; ⋮ ⋱ ⋮ 5 4⋮ 5 4⋮ an1 an2 ⋯ ann an
depicted
in
(10)
for
approximating
the
6
target
function
input, yðiÞ , for the presented BPNN model, and its corresponding sample output (or termed, target output, desired output) is zðiÞ ¼ δi . In mathematics, the ith sample input yðiÞ and sample output zðiÞ can be expressed in pairs as ðai ; δi Þ (with i ¼ 1; 2; …; n), where ðiÞ ðiÞ yðiÞ ¼ ½yðiÞ 1 ; y2 ; …; yn ¼ ai ;
ðA:2Þ
ðiÞ ðiÞ zðiÞ ¼ ½zðiÞ 1 ; z2 ; …; zn ¼ δi :
ðA:3Þ
Evidently, based on the relation (A.1) between the network input y and output z, in view of the training samples (A.2), we can readily define a scalar-valued nonnegative (or at least lowerbounded) training-error function for the BPNN model to perform the training (or say, adjusting). 3. Weight-updating formula: To lay a basis for further discussion, the neural-network weight matrix W A Rnn is rewritten as the following form: W ¼ ½w1 ; w2 ; …; wn ; where wi A Rn1 denotes the ith column vector of matrix W, with i ¼ 1; 2; …; n. Then, the training-error function e(W) for the presented BPNN model is defined as follows (note that, in the neuralnetwork terminology, such a BPNN is trained in a batch-processing mode) !2 n n X 1 Xn X ðiÞ ðiÞ eðWÞ ¼ zj yk wkj i¼1 2 j¼1 k¼1 0 1 n n 2 X 1X ðiÞ @ ¼ z yðiÞ wj A 2i¼1 j¼1 j n 1X ðδ ai WÞðδi ai WÞT 2i¼1 i 1 ¼ tr ðI AWÞðI AWÞT : 2
¼
where ai A R1n denotes the ith row vector of matrix A, with i ¼ 1; 2; …; n. Correspondingly, matrix I can be rewritten as the following form: 2 3 2 3 δ1 1 0 ⋯ 0 60 1 ⋯ 07 6 7 δ2 7 6 7 6 7; I¼6 7¼6 6 4⋮ ⋮ ⋱ ⋮ 5 4⋮ 7 5 δn 0 0 ⋯ 1
By following the back-propagation idea of training-error e(W) via the gradient-descent method, the weight-updating formula for the BPNN model can be set as ∂eðWÞ ΔW ¼ Wðk þ 1Þ WðkÞ ¼ η ∂W W ¼ WðkÞ
where δi A R1n denotes the ith row vector of matrix I, with i ¼ 1; 2; …; n. Thus, the row vector ai is used as the ith sample
where the neural-network learning rate η 4 0 should be set appropriately small, and iteration index k ¼ 0; 1; 2; …. To obtain
Please cite this article as: D. Guo, et al., Common nature of learning between BP-type and Hopfield-type neural networks, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.04.032i
D. Guo et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
the first-order partial derivative of e(W) with respect to W [i.e., ∂eðWÞ=∂W], the following three lemmas [43] are presented and employed. Lemma 1. Given matrices B1 A Rmm , B2 A Rnm and M A Rnm , we have T
∂ trðB1 M B2 Þ ¼ B2 B1 : ∂M
7
1 ¼ ð AT AT þAT AW þ AT AWÞ 2 ¼ AT ðAW IÞ: Therefore, the weight-updating formula for the BPNN model depicted in Fig. 2 is formulated explicitly as Wðk þ 1Þ ¼ WðkÞ ηAT ðAWðkÞ IÞ:
nn
Lemma 2. Given matrices B1 A R have
, B2 A R
mn
nm
and M A R
, we
∂ trðB1 MB2 Þ ¼ BT1 BT2 : ∂M
Besides, it is worth pointing out here that, by means of the above design procedure, various BPNNs can similarly be developed for solving other mathematical problems (e.g., those shown in Table 1). The corresponding design procedures are thus omitted due to the derivation similarity.
Lemma 3. Given appropriately dimensioned matrices C1, C2, C3 and M, if C 1 MC 2 M T C 3 A Rnn , then we have ∂ trðC 1 MC 2 M T C 3 Þ ¼ C T1 C T3 MC T2 þ C 3 C 1 MC 2 : ∂M
Based on the above three lemmas, we have ∂eðWÞ 1 ∂ trððI AWÞðI AWÞT Þ ¼ ∂W 2 ∂W ¼
1 ∂ trðI W T AT AW þ AWW T AT Þ 2 ∂W
Appendix B B.1. Additional MATLAB verification of BPNN for matrix inversion In this appendix, the MATLAB neural network toolbox is used to simulate the presented BPNN model (in Fig. 2) with the following MATLAB code.
Fig. B1. Numerical results of the presented BPNN model for matrix inversion achieved via MATLAB neural network toolbox, in which the final training error has met the prescribed error requirement.
Please cite this article as: D. Guo, et al., Common nature of learning between BP-type and Hopfield-type neural networks, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.04.032i
D. Guo et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎
8
From the above-presented MATLAB code, we know that the learning-function “learngd” and the training-function “traingd” are employed to update the network weights based on the conventional gradient-descent method. In addition, “mse” in the MATLAB code stands for the mean square error and is set as the neural-network measurement index (with the goal error being 10 10), which is an illustration given in this part intentionally showing that the model depicted in Fig. 2 does belong to the BP neural-network type. The corresponding numerical results of the BPNN model via the MATLAB neural network toolbox are shown in Fig. B1, where the final training error is about 9:06 10 11 , having met the prescribed error requirement. Note that rounded to 5 decimal digits, the neural weight matrix W at the 153rd training epoch (or say, iteration in our research terminology) is Wð153Þ ¼
0:18199
1:27671
0:84295
1:17265
A R22 ;
which is quite consistent with and almost the same as the theoretical inverse A 1. These results illustrate again the efficacy of the presented BPNN model for matrix inversion. References [1] W. Chen, L. Tseng, C. Wu, A unified evolutionary training scheme for single and ensemble of feedforward neural network, Neurocomputing 143 (2014) 347–361. [2] M.A.Z. Raja, R. Samar, Numerical treatment for nonlinear MHD Jeffery–Hamel problem using neural networks optimized with interior point algorithm, Neurocomputing 124 (2014) 178–193. [3] D.P. Mandic, J.A. Chambers, Recurrent Neural Network for Prediction, Wiley, Singapore, 2001. [4] Y. Zhang, Analysis and design of recurrent neural networks and their applications to control and robotic systems (Ph.D. dissertation), The Chinese University of Hong Kong, Hong Kong, 2002. [5] Y. Zhang, C. Yi, Zhang Neural Networks and Neural-Dynamic Method, NOVA Science Publishers, New York, 2011. [6] S.M. Siniscalchi, T. Svendsen, C. Lee, An artificial neural network approach to automatic speech processing, Neurocomputing 140 (2014) 326–338. [7] M.M. Khan, A.M. Ahmad, G.M. Khan, Fast learning neural networks using Cartesian genetic programming, Neurocomputing 121 (2013) 274–289. [8] D.E. Rumelhart, J.L. McClelland, PDP Research Group, Parallel Distributed Processing, MIT Press, Cambridge, 1986. [9] J.J. Hopfield, Neural networks and physical systems with emergent collective computational abilities, Proc. Natl. Acad. Sci. USA 79 (8) (1982) 2554–2558. [10] J.J. Hopfield, D.W. Tank, Computing with neural circuits: a model, Science 233 (4764) (1986) 625–633. [11] M. Arvandi, S. Wu, A. Sadeghian, On the use of recurrent neural networks to design symmetric ciphers, IEEE Comput. Intell. Mag. 3 (2) (2008) 42–53. [12] G. Tanaka, Complex-valued neural networks: advances and applications, IEEE Comput. Intell. Mag. 8 (2) (2013) 77–79.
[13] J.J. Rubio, P. Angelov, J. Pacheco, Uniformly stable backpropagation algorithm to train a feedforward neural network, IEEE Trans. Neural Netw. 22 (3) (2011) 356–366. [14] R. Zhang, Z.-B. Xu, G.-B. Huang, D. Wang, Global convergence of online BP training with dynamic learning rate, IEEE Trans. Neural Netw. Learn. Syst. 23 (2) (2012) 330–341. [15] Y. Zhang, D. Guo, Z. Li, Common nature of learning between BP and Hopfieldtype neural networks for generalized matrix inversion with simplified models, IEEE Trans. Neural Netw. Learn. Syst. 24 (4) (2013) 579–592. [16] D. Guo, Y. Zhang, A new variant of the Zhang neural network for solving online time-varying linear inequalities, Proc. R. Soc. A 468 (2144) (2012) 2255–2271. [17] D. Guo, Y. Zhang, Novel recurrent neural network for time-varying problems solving, IEEE Comput. Intell. Mag. 7 (4) (2012) 61–65. [18] P.J. Werbos, Beyond regression: new tools for prediction and analysis in the behavioral sciences (Ph.D. dissertation), Harvard University, Cambridge, MA, 1974. [19] J. Buessler, P. Smagghe, J. Urban, Image receptive fields for artificial neural networks, Neurocomputing 144 (2014) 258–270. [20] U. Seiffert, ANNIE—artificial neural network-based image encoder, Neurocomputing 125 (2014) 229–235. [21] J.J. Rubio, D.M. Vázquez, J. Pacheco, Backpropagation to train an evolving radial basis function neural network, Evol. Syst. 1 (3) (2010) 173–180. [22] A. Khashman, A modified backpropagation learning algorithm with added emotional coefficients, IEEE Trans. Neural Netw. 19 (11) (2008) 1896–1909. [23] Z. Man, H.R. Wu, S. Liu, X. Yu, A new adaptive backpropagation algorithm based on Lyapunov stability theory for neural networks, IEEE Trans. Neural Netw. 17 (6) (2006) 1580–1591. [24] F. Yu, X. Xu, A short-term load forecasting model of natural gas based on optimized genetic algorithm and improved BP neural network, Appl. Energy 134 (2014) 102–113. [25] Y. Xu, T. You, C. Du, An, integrated micromechanical model and BP neural network for predicting elastic modulus of 3-D multi-phase and multi-layer braided composite, Compos. Struct. 122 (2015) 308–315. [26] Y. Zhang, X. Gao, S. Katayama, Weld appearance prediction with BP neural network improved by genetic algorithm during disk laser welding, J. Manuf. Syst. 34 (2015) 53–59. [27] H.-X. Huang, J.-C. Li, C.-L. Xiao, A proposed iteration optimization approach integrating backpropagation neural network with genetic algorithm, Expert Syst. Appl. 42 (1) (2015) 146–155. [28] F. Zhang, Q. Zhou, Ensemble detection model for profile injection attacks in collaborative recommender systems based on BP neural network, IET Inf. Secur. 9 (1) (2015) 24–31. [29] R. Ahmed, M. El Sayed, S.A. Gadsden, J. Tjong, S. Habibi, Automotive internalcombustion-engine fault detection and classification using artificial neural network techniques, IEEE Trans. Veh. Tech. 64 (1) (2015) 21–33. [30] Y. Zhang, K. Chen, H.Z. Tan, Performance analysis of gradient neural network exploited for online time-varying matrix inversion, IEEE Trans. Autom. Control 54 (8) (2009) 1940–1945. [31] J. Dong, W. Ma, Sufficient conditions for global attractivity of a class of neutral Hopfield-type neural networks, Neurocomputing 153 (2015) 89–95. [32] T. Wang, S. Zhao, W. Zhou, W. Yu, Finite-time state estimation for delayed Hopfield neural networks with Markovian jump, Neurocomputing 156 (2015) 193–198. [33] Q. Liu, C. Dang, J. Cao, A novel recurrent neural network with one neuron and finite-time convergence for k-winners-take-all operation, IEEE Trans. Neural Netw. 21 (7) (2010) 1140–1148.
Please cite this article as: D. Guo, et al., Common nature of learning between BP-type and Hopfield-type neural networks, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.04.032i
D. Guo et al. / Neurocomputing ∎ (∎∎∎∎) ∎∎∎–∎∎∎ [34] C. Song, J. Cao, Dynamics in fractional-order neural networks, Neurocomputing 142 (2014) 494–498. [35] R. Rakkiyappan, J. Cao, G. Velmurugan, Existence and uniform stability analysis of fractional-order complex-valued neural networks with time delays, IEEE Trans. Neural Netw. Learn. Syst. 26 (1) (2015) 84–97. [36] S. Zhang, Y. Yu, H. Wang, Mittag–Leffler stability of fractional-order Hopfield neural networks, Nonlinear Anal.: Hybrid Syst. 16 (2015) 104–121. [37] H. Wang, Y. Yu, G. Wen, S. Zhang, J. Yu, Global stability analysis of fractional-order Hopfield neural networks with time delay, Neurocomputing 154 (2015) 15–23. [38] J. Yu, C. Hu, H. Jiang, X. Fan, Projective synchronization for fractional neural networks, Neural Netw. 49 (2014) 87–95. [39] F. Wang, Y. Yang, M. Hu, Asymptotic stability of delayed fractional-order neural networks with impulsive effects, Neurocomputing 154 (2015) 239–244. [40] D. Guo, Y. Zhang, Zhang neural network, Getz–Marsden dynamic system, and discrete-time algorithms for time-varying matrix inversion with application to robots' kinematic control, Neurocomputing 97 (2012) 22–32. [41] Y. Zhang, W. Ma, B. Cai, From Zhang neural network to Newton iteration for matrix inversion, IEEE Trans. Circuits Syst. I: Regular Papers 56 (7) (2009) 1405–1415. [42] J.H. Mathews, K.D. Fink, Numerical Methods Using MATLAB, fourth ed., Prentice-Hall, New Jersey, 2004. [43] G.A.F. Seber, A Matrix Handbook for Statisticians, Wiley-Interscience, New Jersey, 2007. [44] Y. Zhang, Y. Yin, D. Guo, X. Yu, L. Xiao, Cross-validation based weights and structure determination of Chebyshev-polynomial neural networks for pattern classification, Pattern Recogn. 47 (2014) 3414–3428. [45] Y. Zhang, X. Yu, D. Guo, Y. Yin, Z. Zhang, Weights and structure determination of multiple-input feed-forward neural network activated by Chebyshev polynomials of class 2 via cross-validation, Neural Comput. Appl. 25 (7–8) (2014) 1761–1770. [46] Y. Zhang, L. Li, Y. Yang, G. Ruan, Euler neural network with its weight-directdetermination and structure-automatic-determination algorithms, in: Proceedings of International Conference on Hybrid Intelligent Systems, 2009, pp. 319–324.
Dongsheng Guo received the B.S. degree in Automation in 2010 from Sun Yat-sen University, Guangzhou, China, where he is currently working toward the Ph.D. degree in Communication and Information Systems in the School of Information Science and Technology. He is also now with the SYSU-CMU Shunde International Joint Research Institute, Foshan, China, for cooperative research. He has been continuing his research work under the supervision of Prof. Y. Zhang since 2008. His current research interests include neural networks, numerical methods, and robotics.
9 Zhengli Xiao received the B.S. degree in Software Engineering from Changchun University of Science and Technology, Changchun, China, in 2013. He is currently pursuing the M.S. degree in Department of Computer Science with the School of Information Science and Technology, Sun Yat-sen University, Guangzhou. He is also now with the SYSU-CMU Shunde International Joint Research Institute, Foshan, China, for cooperative research. His current research interests include neural networks, intelligent information processing, and learning machines.
Mingzhi Mao received the B.S., M.S., and Ph.D. degrees in the Department of Computer Science from Sun Yat-sen University, Guangzhou, China, 1988, 1998, and 2008, respectively. He is currently a professor at the School of Information Science and Technology, Sun Yat-sen University, Guangzhou, China. His main research interests include intelligence algorithm, software engineering, and management information system.
Jianxi Liu received the B.S. degree in Communication Engineering from Sun Yat-sen University, Guangzhou, China, in 2011. He is currently pursuing the M.S. degree in the Department of Automation with the School of Information Science and Technology, Sun Yat-sen University, Guangzhou. He is also now with the SYSUCMU Shunde International Joint Research Institute, Foshan, China, for cooperative research. His current research interests include neural networks and its application in population research.
Yunong Zhang received the B.S. degree from Huazhong University of Science and Technology, Wuhan, China, in 1996, the M.S. degree from South China University of Technology, Guangzhou, China, in 1999, and the Ph.D. degree from Chinese University of Hong Kong, Shatin, Hong Kong, China, in 2003. He is currently a professor with the School of Information Science and Technology, Sun Yat-sen University, Guangzhou. Before joining Sun Yat-sen University in 2006, he had been with the National University of Singapore, Singapore, the University of Strathclyde, Glasgow, UK, and the National University of Ireland, Maynooth, Ireland, since 2003. He is also now with the SYSU-CMU Shunde International Joint Research Institute, Foshan, China, for cooperative research. His main research interests include neural networks, robotics, computation, and optimization.
Please cite this article as: D. Guo, et al., Common nature of learning between BP-type and Hopfield-type neural networks, Neurocomputing (2015), http://dx.doi.org/10.1016/j.neucom.2015.04.032i