Set-valued functional neural mapping and inverse system approximation

Set-valued functional neural mapping and inverse system approximation

Neurocomputing 173 (2016) 1276–1287 Contents lists available at ScienceDirect Neurocomputing journal homepage: www.elsevier.com/locate/neucom Set-v...

4MB Sizes 0 Downloads 49 Views

Neurocomputing 173 (2016) 1276–1287

Contents lists available at ScienceDirect

Neurocomputing journal homepage: www.elsevier.com/locate/neucom

Set-valued functional neural mapping and inverse system approximation Jiann-Ming Wu n, Chun-Chang Wu, Jia-Ci Chen, Yi-Ling Lin Department of Applied Mathematics, National Dong Hwa University, Shoufeng, Hualien 974, Taiwan

art ic l e i nf o

a b s t r a c t

Article history: Received 4 January 2015 Received in revised form 9 May 2015 Accepted 2 September 2015 Communicated by M. Bianchini Available online 12 September 2015

This work explores set-valued functional neural mapping and inverse system approximation by learning state-regulated multilayer neural networks. Multilayer neural organization is extended to recruit a discrete regulating state in addition to predictive attributes in the input layer. The network mapping regulated over a set of finite discrete states translates a predictor to many targets. Stimuli and responses clamped at visible units are assumed as mixtures of paired predictors and targets sampled from many joined elementary mappings. Unknown regulating states are related to missing exclusive memberships of paired training data to distinct sources. Learning a state-regulated neural network for set-valued mapping approximation involves retrieving unknown exclusive memberships and refining network interconnections. The learning process is realized by a hybrid of mean field annealing and Levenberg– Marquardt methods that simultaneously track expectations of unknown regulating states and optimal interconnections among consecutive layers along a physical-like annealing process. Numerical simulations show the presented learning process well reconstructing many joined elementary functions for setvalued functional mapping and inverse system approximation. & 2015 Elsevier B.V. All rights reserved.

Keywords: Supervised learning State-regulated neural networks Set-valued mapping Inverse neural systems Mean field annealing Levenberg–Marquardt learning

1. Introduction A multilayer neural network [11,6,20,24], typically consisting of layer-structured nonlinear processing elements, performs parallel and distributed computations for algebraic elementary singlevalued mapping. The layer-structured neural organization synchronously translates high-dimensional predictors in the input layer through hidden layers to desired targets in the output layer. Since nonlinear processing elements in the hidden layer ideally carry out radial basis functions [11,6] or projective basis functions [20,24] and adaptable interconnections among consecutive layers are multiplicative, the network mapping can be mathematically expressed as an adaptive algebraic function, Fðxj θÞ, where θ collects adaptive interconnections, including receptive fields and posterior weights. The expression y ¼ Fðxj θÞ therefore characterizes an adaptive single-valued mapping that delicately translates a high-dimensional predictor x to one and only one target y in the function range. Supervised learning of a multilayer neural network subject to paired training data mainly addresses the minimization of the mean square error of approximating targets by network responses to predictors with respect to θ. Generalization by supervised n

Corresponding author. Tel.: þ 886 3 8633531; fax: þ886 3 8633510. E-mail addresses: [email protected], [email protected] (J.-M. Wu).

http://dx.doi.org/10.1016/j.neucom.2015.09.002 0925-2312/& 2015 Elsevier B.V. All rights reserved.

learning is typically verified by paired testing data during testing phase. Data driven supervised learning stands for function approximation as paired predictors and targets for training and testing are oriented from an elementary single-valued mapping. Supervised learning based on powerful computational methodologies, including mean field annealing [17,19,26] of constrained optimization, and gradient-based iterative approaches, such as the backpropagation method, the nonlinear conjugate gradient method, the Newton–Gauss method, and the Levenberg–Marquardt method [5,15,4] of unconstrained optimization, has been proposed in the field of neural networks, and extensively applied to signal processes, pattern recognition, control and system identification in the past decade. However learning an adaptive single-valued mapping is not feasible for discrete set-valued mapping approximation in case that paired predictors and targets are mixtures of samples oriented from many joined elementary mappings. Similar predictors are probably mapped to very different desired targets according to constraints proposed by paired training data following the mixture assumption. A discrete set-valued mapping validly translates an identical predictor to many distinct targets. The problem of setvalued mapping approximation has been revealed in applications to inverse control [10,18], complex economic data prediction [12,13] and system inverting, where the mapping underlying paired training data is no more single-valued.

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287

For discrete set-valued mapping approximation, given training data are considered as mixtures of paired data sampled from many joined elementary mappings. Under the mixture assumption [8,16,22], the constraints proposed by paired training data allow translation of an identical predictor to many targets. A multilayer neural network only equipped with input units of receiving predictive attributes performs an adaptive elementary single-valued mapping. Subject to mixed training data, refining interconnections is ineffective for reducing the mean square approximating error toward resolving set-valued mapping approximation. The difficulty cannot be resolved by improving learning methodologies or generalizing transfer functions of hidden units in architecture. For set-valued mapping approximation, the previous work [23] has related distinct targets of a predictor to stable outputs of a recurrent multilayer neural network, yn ¼ Fðx; yn 1 j θÞ

ð1Þ

where the network output yn 1 is transmitted through delayed circular connections to the input layer. For fixed x and y0 , the recurrent relation (1) searches for stable outputs by different initializations [23] of retrieving multiple targets in response to fixed x. However difficulties, including effective optimization of interconnections, reliable enumeration of stable outputs and accurate extraction of many elementary mappings embedded within the recurrent multilayer neural network, still challenge this line of researches. Alternatively, set-valued mapping approximation has been approached by organizing regularization networks [21] and mixture density networks [2,3]. A regularization network [21] consists of two cascaded multilayer neural networks, the first translating a predictor to coefficients of a polynomial with zeros well storing desired targets and the second mapping retrieved coefficients to desired targets. Learning a regularization network requires all targets corresponding to every predictor for determining polynomial coefficients. But the requirement is not satisfied following the mixture assumption. The mixture density network [2,3], organized for conditional probability density function (pdf) reconstruction, mainly translates a predictor to semiparameters of Gaussian mixtures, including variances, mean vectors and weights. The domain or support of the reconstructed conditional probability density function is expected to contain continuous targets in response to a given predictor. Learning mixture density networks is translated to a task of estimating Gaussian mixtures for conditional density function approximation. This work approaches discrete set-valued mapping approximation by learning a state-regulated multilayer neural network, which recruits a finite discrete regulating state in the input layer. The proposed state-regulated neural network inherits feedforward synchronous transmission through multilayer neural networks. Conditional to a fixed regulating state, a state-regulated neural network realizes an elementary mapping, insisting on translating a predictor to one and only one target in the range. The network mapping regulated over all finite discrete states essentially translates a predictor to many targets. It is notable that a state-regulated multilayer neural network maintains only one copy of adaptive interconnections among consecutive layers, instead of many multilayer neural networks [7], for discrete set-valued mapping approximation. This work proposes a hybrid of mean field annealing and Levenberg–Marquardt methods for learning a state-regulated multilayer neural network. Under the mixture assumption, each paired predictor and target has its own exclusive membership, which is encoded by an unknown regulating state and represented by a Potts variable [19,25], to joined elementary functions. Supervised learning of a stateregulated multilayer neural network thus involves retrieving missed regulating states and optimizing adaptive interconnections for setvalued mapping approximation. Since the mean square approximating error is not differentiable with respect to discrete Potts variables, it

1277

is minimized by a hybrid of mean field annealing and Levenberg– Marquardt (LM) methods. Under a physical-like annealing process, the proposed hybrid approach iteratively tracks the mean configuration of multi-state Potts variables and applies the LM method to minimize the mean square error of approximating desired targets by network responses to predictors and expectations of regulating states. The mean configuration of Potts variables is eventually forced to discrete regulating states for representing exclusive memberships at the end of the annealing process. This paper is organized as follows. Section 2 presents learning a state-regulated multilayer neural network for discrete set-valued mapping approximation. The learning task is translated to a mixed integer programming and resolved by a hybrid of mean field annealing and LM methods. Section 3 explores quantitative performance of the proposed learning approach for discrete setvalued mapping approximation by numerical simulations. Section 4 further extends the proposed learning approach for inverse system approximation. Conclusions are given in the final section.

2. Discrete set-valued mapping approximation 2.1. A mixed integer programming Supervised learning of a state-regulated multilayer neural network is formulated as a mixed integer programming. Given training data are assumed as mixtures of paired predictors and targets oriented from many elementary functions. A discrete setvalued mapping, ξ ¼ ff i gi in Fig. 1, contains many elementary functions, where fi denotes an algebraic single-valued mapping, always translating a predictor to one and only one target. Let Si ¼ fðx½t; y½tÞgt collect paired predictors and targets sampled from the ith elementary function, where y½t ¼ f i ðx½tÞ þ n½t and n½t denotes a noise. Then S ¼ ⋃i Si denotes mixtures of paired predictors and targets oriented from many elementary functions. A multilayer neural network only equipped with input units of receiving attributes of predictors is unable to faithfully approximate the set-valued mapping ξ underlying S, since the network mapping translates a predictor to one and only one target. Discrete set-valued mapping approximation is essential for inverse system reconstruction. Figs. 4a and 5a show paired training data obtained by inverting the forward MIMO (multiple inputs and multiple outputs) system in Section 4. The forward MIMO system can be approximated by learning multilayer neural networks. But coordinate mappings of the inverse system are set-valued and cannot be faithfully approximated by single-valued mappings. A high-dimensional predictor x is concatenated with a regulating state δ in the input layer of a state-regulated multilayer neural network, where δ ¼ ½δ1 ; …; δK  is a Potts variable with δi A f0; 1g and K X

δi ¼ 1:

k¼1

The mapping of a state-regulated multilayer neural network translates concatenated x and δ to a target, b ¼ Fðx; δj θÞ: y

ð2Þ

The circular connection from the network output in Eq. (1) has been replaced with the discrete regulating state. Let δ A Ξ K ¼ fe1 ; …; eK g, where ek denotes a unitary vector with the kth bit one and others zero. Regulated by δ ¼ ek , the network response to x, is bk  F k ðxj θÞ ¼ Fðx; δ ¼ ek j θÞ; y

ð3Þ

where Fk denotes an elementary mapping, always translating x to one target. Regulated over all finite discrete states in ΞK, F transbk gk , for set-valued mapping lates x to many targets, denoted by fy

1278

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287

Fig. 1. Four problem sets for set-valued mapping approximation and the joined elementary functions for data generation.

approximation. With δ ¼ ek , Fk possesses specific posterior weights for characterizing the single-valued mapping as shown in Appendix A, where F is implemented by an RBF (radial basis function) neural network. In the occasion, all Fk share common radial basis functions but different posterior weights. Appendix B shows implementation by an advanced neural network of multiple Mahalanobis-NRBF modules, where fixing δ ¼ ek induces more flexible Fk for function approximation. Since each individual NRBF module measures Mahalanobis radial distances based on its own weight matrix, all Fk span more general elementary functions for set-valued mapping approximation.

Under the mixture assumption, each paired predictor and target, ðx½t; y½tÞ, in S, has an exclusive membership, δ½t A Ξ K , to joined elementary functions. Let δ½t denote an unknown regulating state that translates x½t to y½t by Eq. (2). Learning a stateregulated multilayer neural network is characterized by minimizing the following mean square approximating error: ES ðθ; ΛÞ ¼

¼

1X J y½t Fðx½t; δ½tj θÞ J 2 ; N t

ð4Þ

XX δk ½t J y½t Fðx½t; ek j θÞ J 2

ð5Þ

t

k

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287

1279

and expðβutk Þ 〈δk ½t〉 ¼ P h expðβuth Þ

Fig. 2. A hybrid of mean field annealing and LM learning for data driven set-valued mapping approximation.

¼

XX δk ½t J y½t F k ðx½tj θÞ J 2 t

k

where Λ collects all unknown regulating states. The second line (5) expresses the mean square error in terms of the difference between y½t and the output of Fk in response to x½t in form facilitating derivations of mean field equations for resolving the mixed integer programming. 2.2. A hybrid of mean field annealing and Levenberg–Marquardt methods The objective Eðθ; ΛÞ simultaneously contains discrete and continuous variables, its minimization is resolved by a hybrid of mean-field annealing and Levenberg–Marquardt methods. For fixed θ, the objective E is considered as a Hopfield-like energy function. Potts variables in Λ constitute a physical-like system that obeys the Boltzmann distribution at thermal equilibrium [28], PrðΛÞ p expð βEðθ; ΛÞÞ;

ð6Þ

where β denotes the inverse of a temperature-like parameter. Deriving thermodynamics of resolving a mixed integer programming (4) has been extensively explored in the field of neural networks [26,25]. Following Eq. (6), the Kullback–Leibler (KL) divergence [28] defines the quasi-distance between the product of marginal pdfs and the joint pdf of Potts variables in Λ. The KL divergence can be rewritten as the following tractable free energy function ψ [28], ! XX X 1X ψðθ; 〈Λ〉; uÞ ¼ Eðθ; 〈Λ〉Þ þ 〈δk ½t〉utk ln expðβutm Þ ; ð7Þ β t m t k   where δk ½t denotes the expectation of binary δk ½t and uik denotes an auxiliary variable. Now ψ depends on the mean configuration of discrete Potts variables, auxiliary variables and adaptive interconnections. A tractable free energy function is differentiable with respect to all dependent variables. Setting ∂ψ=∂〈δk ½t〉 and ∂ψ =∂uik to zero attains mean field equations, utk ¼

∂Eðθ; 〈Λ〉Þ ¼ J y½t Fðx½t; ek j θÞ J 2 ; ∂〈δk ½t〉

ð8Þ

ð9Þ

where E in Eq. (8) uses the form of Eq. (5). Mean field equations (8) and (9) characterize the saddle point of ψ for fixed θ. At each intermediate temperature, the mean configuration that satisfies Eqs. (8) and (9) is substituted to Eq. (4) for minimizing ψðθ; hΛiÞ, equivalently Eðθ; hΛiÞ, with respect to θ. A hybrid of mean field annealing and gradient descent methods has been applied in previous works [26,25] for resolving a mixed integer programming. The gradient descent method is here improved by the LM method for minimizing Eðθ; hΛiÞ with respect to θ for fixed hΛi. The LM method has been shown powerful for learning a multilayer neural network [5,15] for function approximation. The LM method employs a hybrid of searching directions, respectively determined by the gradient descent method [9] and the Newton–Gauss method [1], to minimize the mean square approximating error with respect to adaptive interconnections. The relative contribution of two types of searching directions balances reliability and efficiency of seeking the global minimum. The details of the LM method refer to previous works [5,15,29]. The objectiveEðθ;hΛiÞ minimized by the LM method is obtained by substituting δ½t to δ½t in the first line of Eq. (4). This form consistent with the standard objective function of unconstrained optimization for learning a multilayer neural network can be directly minimized by the LM method. Fig. 2 shows interleaving processes of mean field annealing and LM methods for learning a state-regulated multilayer neural network. By mean field Eqs. (8) and (9), the probability of δ½t being ek or the expectation of δk ½t is proportional to expð β J y½t F k ðx½tÞ J 2 Þ, where the square error of approximating y½t by F k ðx½tÞ serves as the criterion of quantifying competition among distinct states. Adaptive interconnections θ are refined by minimizing Eðθ; hΛiÞ using the powerful LM method based on the current mean configuration hΛi. The learning process operates under a physical-like annealing process, where the inverse of a temperature-like parameter β is carefully scheduled from sufficiently low to high values. At each β, the learning process respectively updates hΛi and θ by mean field dynamics and the LM method. At sufficiently small β, 〈δ½t〉 tends to consist of equal elements independent of square errors of approximating the target by outputs of different Fk. Along the annealing process, 〈δ½t〉 gradually approaches a unitary vector of binary values. At sufficiently large β, the state inducing the minimal approximating error (8) eventually dominates the denominator of Eq. (9) following the winner-take-all principle. The stability of mean activations is defined by 1 XX χ¼ 〈δ ½t〉2 N t k k Since the square sum of elements in each 〈δ½t〉 increases along the annealing process, the stability χ eventually approaches one at sufficiently large β. The simulated learning process exits once the stability exceeds a predetermined threshold and the halting condition (HC) holds.

3. Numerical simulations Quantitative performance of learning a state-regulated neural network for discrete set-valued mapping approximation is explored by numerical simulations in this section. The simulated learning process is applied for set-valued mapping approximation subject to training data S and verified with testing data T. Both S and T are mixtures of paired predictors and targets sampled from joined elementary functions. All desired targets in S and T have been normalized by the maximal absolute value.

1280

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287

The goal of learning a state-regulated multilayer neural network is to minimize the training error ES ðθ; ΛÞ (4). The halting criterion holds, when γ exceeds a pre-determined threshold at sufficiently large β. In the occasion, 〈δ½t〉 approaches a unitary vector that represents a discrete state for regulating the network approximation to y½t. For fixed Λ, ES ðθ; ΛÞ measures the mean square error of approximating y½t by the network response to x½t and 〈δ½t〉, denoted by Fðx½t; 〈δ½t〉j θÞ, over t. By Eqs. (8) and (9), at sufficiently large β, if 〈δi ½t〉 is the only active bit among K binary elements in 〈δt 〉, uti maximizes futk gk and

J y½t Fðx½t; ei j θÞ J minimizes K absolute errors. At the end of the annealing process, it follows J y½t Fðx½t; 〈δ½t〉j θÞ J 2 ¼

X

 δk ½t Jy½t Fðx½t; ek j θÞ J 2 ¼ min Jy½t Fðx½t; ek j θÞ J 2 :

k

k

Substituting the above equation to Eq. (4) leads to the following mean square approximating error after training phase E S ðθ Þ ¼

1X min J y½t Fðx½t; ek j θÞ J 2 N t k

ð10Þ

Fig. 3. Acceptable disjoint subsets partitioned by the proposed learning and refining approach and the reconstructed set-valued mappings. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287

Table 1 Quantitative performance of discrete set-valued mapping approximation by the proposed learning and refining approach. The proposed learning and refining approach Map

ξ1 ξ2 ξ3 ξ4

K0

K

3 2 2 3

3 2 2 4

ES

Lower bound

E0S

E0T

Mean

Var

Mean

Mean

1.18e  04 2.17e  04 1.57e  04 3.79e  04

8.82e  10 4.29e  09 3.66e  09 2.30e  08

0 5.00e  05 2.30e  05 8.00e  06

0 8.70e 05 3.60e  05 6.30e  05

LS

2.09e  01 2.48e  01 1.77e 01 1.12e  01

In testing phase, the mapping by a well trained state-regulated neural network is set-valued. In response to x½t, a state-regulated neural network generates K targets upon K distinct states, among which only the best approximation to y½t contributes the mean square testing error, denoted by ET ðθÞ, in Eq. (10). Numerical simulations summarize both training and testing errors for performance evaluation of the simulated learning process for set-valued mapping approximation. The proposed learning process has been implemented in Matlab codes and executed on an ultrabook equipped with Intel (R) Core(TM) i7-2637M CPU for performance evaluation. Fig. 1 shows four sets of training data and joined elementary functions for set-valued mapping approximation. Numerical simulations use the RBF network in Appendix A for organizing the state-regulated multilayer neural network and apply the popular K-means method to initialize centers of radial basis functions. The simulated learning process partitions paired training data in S to non-empty disjoint subsets. According to the final regulating states, each subset is expressed by   Q j ¼ ðx½t; y½tÞj x½t A S; 〈δ½t〉 ¼ ej ; and K0

S ¼ ⋃ Q j; j¼1

Qi \ Qj ¼ ⊘

for i 6¼ j;

0

where K  K denotes the number of non-empty subsets. The physical-like annealing process eventually forces 〈δ½t〉 to a discrete regulating state in ΞK, which indicates an exclusive membership to disjoint subsets. ES ðθÞ sums up mean squared training errors of fitting K 0 disjoint subsets. Let EQ j ðθÞ denote the mean square training error summarized over Qj. In comparison with a predetermined threshold, if ES ðθÞ is acceptable, all EQ j ðθÞ are acceptable. Paired training data in acceptable Qj is well approximated by F j ðxÞ ¼ Fðx; ej j θÞ, which translates a predictor to one and only one target. So Fj could be refined by supervised learning of an advanced multilayer neural network subject to Q j , which receives no regulating states in the input layer. The refined elementary function of fitting Qj, denoted by g j , is obtained by learning a network of multiple Mahalanobis-NRBFs (normalized radial basis functions) modules [30] based on annealed KLD minimization [28]. As shown in Appendix B, each Mahalanobis-NRBF module is composed of normalized radial basis functions that measure Mahalanobis radial distances based on its own weight matrix. The refined elementary network mapping g j employs manifold Mahalanobis distances for function approximation. The proposed learning and refining approach attains a set of 0 refined elementary mappings, denoted by G ¼ fg j gKj ¼ 1 , for setvalued mapping approximation. After the refining process, the mean square approximating error over S is measured by E0S ¼

1X min J y½t g k ðx½tÞ J 2 ; N t k

ð11Þ

1281

which is obtained by replacing Fðx; ek j θÞ in (10) with g k ðxÞ. Further replacing S in Eq. (11) with T leads to E0T for quantifying the mean square testing error of the set-valued mapping G. Fig. 3 shows disjoint subsets partitioned by the simulated learning process for examples in Fig. 1. In the left column of Fig. 3, colors of training data indicate exclusive memberships of paired data to K 0 disjoint subsets. Each set-valued mapping is composed of many elementary functions. The refined set-valued neural mapping G for each example is shown in the right column of Fig. 3. Table 1 shows quantitative performance of the proposed learning and refining approach subject to training data for each example in Fig. 1. Statistics, including the mean and variance of training and testing errors, ES , E0S and E0T , are summarized over five executions for each example. Numerical results show both training and testing errors have been significantly reduced for discrete set-valued mapping approximation subject to training data for all examples. Supervised learning of a multilayer neural network exactly equipped with d units in the input layer fails to reduce the mean square approximating error. Without the regulating state, the network mapping results in the mean square training error whose empirical lower bound can be determined by numerical simulations. Let y½t denote the average response of K elementary functions to x½t in S, y½t ¼

1X f ðx½tÞ; K k k

which denotes the minimizer of the mean square error of approximating f k ðx½tÞ over k. An empirical lower bound of the mean square approximating error of learning a single-valued elementary mapping subject to S can be determined by LS ¼

1X J y½t y½t J 2 : N t

By supervised learning of a multilayer neural network [11,20,24] exactly equipped with d units of receiving attributes in the input layer, the mean square approximating error is numerically shown above LS and is not listed in Table 1. In Table 1, extremely low variances indicate high reliability of the simulated learning process and acceptable approximating errors E0T over testing data reflect successful set-valued mapping approximation. ES formulates a mixed objective function that contains discrete and continuous dependent variables. A hybrid of mean field annealing and LM methods is shown effective for minimizing ES. Under a physical-like annealing process, the simulated learning process overcomes the serious local minimum problem. The simulated learning process for approximating ξ2 , ξ3 and ξ4 attains acceptable training errors in comparison with a predetermined threshold, ϵ ¼ 5  10 3 , for these examples, and needs no further process for improvement. Numerical simulations show the training error ES of approximating ξ1 unacceptable. Among different executions of approximating ξ1 by the simulated learning process, the partition to S always attains at least one non-empty subset whose approximating error is less than ϵ; there exists at least one k such that EQ k ϵ. This property guarantees significant reduction to S0 for further partition, where S0 ¼



fkj EQ k ϵg

Q k;

ð12Þ

denotes the union of unacceptable disjoint subsets. For approximating ξ1 , ES in Table 1 is a result of summing up mean square approximating errors of acceptable disjoint subsets separately derived by partition to S and S0 . Numerical results show that only approximating ξ1 requires additional partition to the reduced S0 by the learning process for extracting acceptable disjoint subsets.

1282

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287

4. Inverse system approximation 4.1. Inverting two-joint kinematics The proposed learning and refining approach is extended for inverse function approximation. Forward kinematics are defined by the following coordinate functions: p1 ¼ κ1 ða; bÞ ¼ r 1 cos ða þ bÞ sin ðaÞ sin ðbÞ

ð13Þ

p2 ¼ κ2 ða; bÞ ¼ r 2 sin ða þ bÞ þ cos ðaÞ sin ðbÞ;

ð14Þ

where q ¼ ða; bÞ within ½ 2π; 2π2 and p ¼ ðp1 ; p2 Þ respectively collect function inputs and outputs, and r1 and r2 are two constants. Forward kinematics (13) and (14) translate a given q to one and only one target p. Each coordinate function could be approximated by learning a multilayer neural network only equipped with input units of receiving attributes a and b. The goal of inverse function approximation is to reconstruct two coordinate mappings of inverse kinematics subject to training data oriented from forward kinematics. Let fðq½t; p½tÞgN t ¼ 1 collect paired data generated by forward kinematics, where all q½t are uniformly distributed within ½ 2π; 2π2 . Then A ¼ fðx½t; y½tÞj x½t ¼ p½t; y½t ¼ a½tgt ; B ¼ fðx½t; y½tÞj x½t ¼ p½t; y½t ¼ b½tggt represent two sets of training data for modeling inverse kinematics by set-valued mapping approximation. Figs 4a and 5a respectively show paired data in A and B. The coordinate mapping of inverse kinematics underlying paired data in either A or B is apparently set-valued. Approximating paired data in either A or B by learning a multilayer neural network only equipped with two input units suffers from unacceptable training errors. The domain of inverse kinematics, denoted by D, coincides with the range of forward kinematics.

Inverse function approximation first applies the simulated learning process to partition paired data in B to two disjoint subsets, respectively denoted by B1 and B2 . In Fig. 4b different colors indicate exclusive memberships of paired data to two disjoint subsets. For this example, K 0 ¼ K ¼ 2. The experiment executes one time for inverting forward kinematics (13) and (14) since extremely low variances in Table 1 have guaranteed high reliability of the proposed learning and refining approach. Numerical results of the proposed learning and refining approach in Table 2 show acceptable ES and E0S subject to training data in B. Derived by the proposed learning and refining process subject to B, H B ¼ fg k g2k ¼ 1 presents faithful set-valued mapping approximation to the coordinate mapping from the space of p to b. Fig. 4c shows two refined elementary functions of HB over D, which are derived by learning a network of multiple Mahalanobis-NRBF modules [30] separately subject to paired data in B1 and B2 based on annealed KLD minimization [28]. Fig. 4d shows different perspectives of HB. The derived set-valued neural mapping HB is verified with a testing set T, which is oriented from the same process of generating training data in B. Numerical results in Table 2 show acceptable training and testing errors, ES , E0S and E0T , indicating successful set-valued mapping approximation subject to S ¼B. Numerical simulations further explore set-valued mapping approximation subject to S ¼ A. Fig. 5b shows two disjoint subsets, respectively denoted by A1 and A2 , partitioned by the simulated learning process, where A ¼ A1 [ A2 . For this example, K 0 ¼ 2 and K ¼3. Numerical results in Table 2 show unacceptable training and testing errors, ES , E0S and E0T , for S¼ A relative to errors derived subject to B in the first row. Since K K 0 , it is helpless to increase K for reducing the unacceptable approximating error, which is indeed caused by high curvatures of functional surfaces of the setvalued mapping underlying training data in A. Since all disjoint subsets are unacceptable, the two-level partition is introduced for resolving set-valued mapping approximation subject to A.

Fig. 4. (a) Training data of reconstructing the first coordinate mapping of the inverse of forward kinematics. (b) Disjoint subsets partitioned by the proposed learning and refining approach. (c) and (d) Reconstructed one-to-many mapping displayed from different perspectives. (For interpretation of the references to color in this figure caption, the reader is referred to the web version of this paper.)

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287

1283

Fig. 5. (a) Training data of reconstructing the second coordinate mapping of the inverse of forward kinematics (b) Disjoint subsets partitioned by the proposed learning and refining approach. (c)–(f) The reconstructed set-valued mapping from distinct perspectives.

Table 2 Numerical results of the proposed learning and refining approach for inverse function approximation. Training set

Levels

ES

E0S

Mapping

E0T

Quality

B A A A1 A2

1 1 2 1 1

3.21e  04 2.53e  03 – 3.40e  05 3.70e  05

3.40e  05 2.07e 03 – 1.80e  05 1.20e  05

HB ¼ fg k g G ¼ fgk g HA ¼ fGk g G1 G2

3.40e  05 2.36e  03 1.50e  05 – –

Acceptable

As described previously, the simulated learning process decomposes A to two disjoint subsets, denoted by Ak ¼ fðx½t; y½tÞj δ½t ¼ ek g, at the first level of partition where Ak collects paired data with exclusive memberships identical to ek at the end of the annealing process. The second level partition applies the simulated learning process to decompose every Ak. The second level partition with K¼ 2 to each Ak attains disjoint subsets, fAkj gj , which induce refined approximating functions, Gk ¼ fg kj gj . Table 2 shows acceptable train-

ing errors, ES and E0S , of the proposed learning and refining approach for approximating paired data in every Ak.

Acceptable

The set-valued neural mapping Gk derived subject to Ak is piecewise for this example. The domain of gkj refers to the support [14] of predictors in Akj . For this case Ak1 and Ak2 have disjoint supports, which are respectively denoted by Dk1 and Dk2 . A discriminating rule can be derived by learning a multilayer neural network for distinguishing the membership of a predictor x to Dk1 and Dk2 . Learning a discriminating rule is related to a task of data driven classification and resolved by the multi-class classification approach developed in [26]. The derived discriminating rule is employed to determine the membership of x to disjoint supports and the selected elementary

1284

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287

configurations. Forward kinematics can be effectively reconstructed by learning multilayer neural networks subject to fðq½t; pi ½tÞgN t ¼ 1 for all i. In contrast, inverse kinematics underlying training sets,

Fig. 6. A tree diagram for illustrating multilevel learning of state-regulated neural networks for approximating shoulder-joint inverse kinematics.

mapping is then applied to generate the output of Gk, Gk ðxÞ ¼ g kj ðxÞ

if x A Dkj :

ð15Þ

Numerical simulations show training errors of deriving Gk subject to Ak , for k¼1 and 2, acceptable. The set-valued mapping approximation subject to A is given by H A ¼ fGk gk ;

ð16Þ

where G1 and G2 share the same domain D. In Table 2 the testing error E0T of HA derived by the two-level partition is acceptable. Fig. 5 shows the functional surface of HA from different perspectives. Numerical results show the two-level partition effective and reliable for approximating the second coordinate mapping of the inverse of forward kinematics (13) and (14). The first level partition by the simulated learning process decomposes A to disjoint subsets. Suffering from high curvatures of the mapping underlying each subset Ak, the reconstructed gk results in an unacceptable training error as shown in the second row of Table 2. The proposed learning and refining approach is again applied to process each Ak for the second level partition. The obtained approximating mapping Gk of Ak is piecewise. Evaluating Gk is conditional to the membership of a predictor to sub-domains as shown in Eq. (15). Numerical results show both training and testing errors of the reconstructed Gk acceptable for all k. 4.2. Inverting three-joint kinematics Inverting forward kinematics of waist, shoulder and elbow joints of the Puma 560 robotic manipulator [27] is explored by the proposed learning approach. Data driven construction of inverse kinematics is here provided with paired data generated by forward kinematics. Predictors and targets in raw paired data oriented from forward kinematics are exchanged in role for preparing training data of approximating inverse kinematics. A matlab toolbox [27] that simulates forward kinematics of the addressed robotic manipulator is employed to generate paired data for inverting forward kinematics. Let q ¼ ða; b; cÞ denote a configuration of waist, shoulder and elbow joints, where a A ½ 160○ ; 160○ , b A ½ 110○ ; 110○  and c A ½ 135○ ; 135○ , and p ¼ ðp1 ; p2 ; p3 Þ A R3 denote the position of the wrist. Let fðqft; p½tÞgN t ¼ 1 denote paired data generated by forward kinematics, where all q½t are uniformly distributed feasible

A ¼ fðx½t; y½tÞj x½t ¼ p½t; y½t ¼ a½tgt ;

ð17Þ

B ¼ fðx½t; y½tÞj x½t ¼ p½t; y½t ¼ b½tgt ;

ð18Þ

C ¼ fðx½t; y½tÞj x½t ¼ p½t; y½t ¼ c½tgt :

ð19Þ

cannot be faithfully approximated by learning multilayer neural networks equipped with only three input units. In A, B and C, predictors are positions of the wrist and desired targets are angles of joints normalized within ½ 1; 1. The mean square training errors of Levenberg–Marquardt learning of advanced multilayer neural networks respectively subject to either A, B or C are 2.218e 2, 1.938e 2 and 3.070e 2, which are unacceptable and cannot be improved by enhancing transfer functions or learning methodology of multilayer neural networks with only three input units due to set-valued mappings. The obtained mean square training errors are compared with a threshold ϵ for determining whether the derived neural networks are acceptable or not. The threshold ϵ is here set to 1.5e 3, which is less than one tenth of the above three mean square errors as well as 5.0e 3 used in Section 3. If the mean square training error of learning a state-regulated neural network is not less than ϵ, according to the obtained exclusive memberships, the training set is decomposed to disjoint subsets. A tree diagram in Fig. 6 is employed to illustrate results of multilevel learning of state-regulated neural networks for inverse kinematics approximation. A node in a tree diagram has different meanings depending on results of comparing the mean square training error ES with ϵ. If ES ϵ, a node is labeled as a square whose only successor is a circle leaf with specified S and K 0 . Otherwise a node is labeled as a rectangle other than a square, serving as an internal node with specified S and K 0 . A rectangle internal node decomposes the training set to disjoint subsets according to obtained exclusive memberships, having more than one successor for invoking further learning processes. Numerical simulations have revealed three different results of learning a state-regulated neural network for set-valued mapping approximation, which respectively induce all acceptable subsets, all unacceptable subsets as mentioned in the previous subsection, and at least one acceptable subset like for approximating ξ1. All situations can be clearly expressed by internal nodes and circle leaves in a tree diagram. A step-wise procedure of handling the proposed learning and refining approach for three different cases has been summarized in Appendix C. The proposed learning approach is further employed for approximating three-joint inverse kinematics. The set-valued mapping underlying paired training data in A (17) or C (19), can be well approximated by the proposed learning approach. The training errors ES in the first two rows of Table 3 show acceptable approximation of inverse kinematics corresponding to waist and elbow joints. The training and testing sets of A and C contain 1500 paired data and those of B are extended to contain 2000 paired data. By single level learning, the testing error ET , a result of directly verifying a state-regulated neural network, Fðx; δj θÞ, is calculated by replacing S in (10) with T. The testing error E0T derived by further refining is omitted in this subsection. The training set S¼ C is successfully decomposed to acceptable disjoint C k , k¼1 and 2. The mean square training error of deriving a state-regulated neural network Fðx; δj θÞ subject to C is acceptable as shown in Table 3. The mean square testing error for approximating the elbow-joint inverse kinematics is measured by ET ¼ 1:10e 4. When S ¼ A, the training set is also successfully decomposed to acceptable disjoint Ak , where k ¼ 1, 2 and 3. The state-regulated neural network derived subject to S¼ A attains an acceptable training error as shown in Table 3. The mean square testing error

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287

Table 3 Numerical results of multilevel learning of state-regulated neural networks for three-joint inverse kinematics approximation. Training set

K

K0

ES

Quality

A C B B1 B2 B21 B22

3 3 3 3 3 3 2

3 2 2 2 2 2 2

2.78e  04 5.2e  5 4.325e  3 1.061e  3 9.657e  3 1.018e  3 8.790e  4

Acceptable Acceptable Acceptable Acceptable Acceptable

for approximating the waist-joint inverse kinematics is measured by ET ¼ 3:68e 04. Subject to paired training data in B, the proposed learning process induces a state-regulated neural network whose training error is not acceptable. A tree diagram in Fig. 6 shows multi-level learning for approximating set-valued mapping subject to B. At the first level, the proposed learning approach, resulting in an unacceptable training error subject to B, plays a role of decomposing training data to disjoint subsets Bk , k ¼ 1; 2. The proposed learning approach at the second level of the left sub-tree induces an acceptable mean square error subject to B1 , directly inducing a two-state regulated neural network in a circle leaf. The proposed learning approach at the second level of the right subtree still attains an unacceptable mean square training error, after invoking the third level partition, inducing two terminal circle leaves. Fig. 6 shows two rectangle internal nodes and three circle leaves in total, where all state-regulated neural networks are integrated for approximating inverse kinematics corresponding to the shoulder-joint subject to paired data in B. Let Ki denote the number of regulated states and φi denote interconnections in a state-regulated neural network, where i indices both rectangle internal nodes and circle leaves in Fig. 6. Let I ik ðxj φi Þ denote the obtained elementary functions, where k runs from 1 to K i , and I ik ðxÞ ¼ Fðx; δ ¼ ek j φi Þ: For K i ¼ 1, the training set of deriving interconnections in φi sketches a domain for evaluating Iik. For K i 1, exclusive memberships of paired data indicate a partition to the training set to K i disjoint subsets, each also sketching a domain for evaluating I ik . So each Iik has a well defined domain for evaluation. All Iik are integrated to emulate shoulder-joint inverse kinematics underlying paired data in B. Given a predictor x, the responses of the emulated shoulder-joint inverse kinematics contain Iik(x) if x belongs to the domain of evaluating I ik , where i indices all rectangle internal nodes and circle leaves in Fig. 6 and k runs from 1 to Ki . Stateregulated neural networks in rectangle internal nodes are recruited to avoid undefined blank fragments in Fig. 5 for testing. The mean square testing error for approximating the shoulder-joint inverse kinematics is measured by ET ¼ 3:881e 3. Table 3 lists training errors of approximating set-valued mappings subject to A, B and C, respectively defined by Eqs. (17), (18) and (19). The obtained training and testing errors show the proposed learning approach of state-regulated neural networks effective for approximating inverse kinematics of the waist, shoulder and elbow joints of the Puma 560 robot arm.

5. Conclusions Different from previous works [8,16,22], this work has organized state-regulated multilayer neural networks for set-valued mapping approximation. The recruited discrete state in the input layer regulates a multilayer neural network to perform distinct elementary mappings.

1285

Under the mixture assumption, a hybrid of mean field annealing and LM methods has been proposed for training state-regulated neural networks and partitioning paired predictors and targets to nonoverlapping subsets for reconstructing joined elementary mappings. For real applications, multilevel learning of state-regulated neural networks has been also presented for inverse system approximation. Numerical simulations have shown the proposed learning approach reliable and effective for set-valued mapping and inverse system approximation. Numerical simulations have verified applicability of the proposed learning approach for inverting twojoint and three-joint forward kinematics except for problems of set-valued mapping approximation. For approximating shoulderjoint inverse kinematics, the proposed learning approach performs essential data decomposition at the first two levels in Fig. 6. Multilevel learning eventually attains acceptable state-regulated neural networks at terminal circle leaves for emulating inverse kinematics corresponding to the shoulder-joint. The derived two-joint or three-joint inverse kinematics can be directly applied for inverse control. The goal is to infer a proper configuration for driving the output of forward kinematics (13) and (14) to an upcoming desired position p. For the purpose, the derived two-joint inverse kinematics are employed to produce a Cartesian product, fH A ðpÞg  fH B ðpÞg, for approaching the desired position. Among set-valued elements in the Cartesian product, the configuration nearest to the current one can be selected for smooth inverse control following the minimal tuning principle. Since elementary mappings in both H A and HB are well defined in algebraic forms, feasible configurations in the product can be efficiently enumerated for real time control. It is remarkable that acceptable means and extremely low variances of testing errors in Tables 1 and 2 guarantee effectiveness and reliability of the proposed learning and refining approach for set-valued mapping and inverse system approximation. The outstanding performance is subsequence of collective decisions of mean field annealing and efficient dynamic second-order minimization of Levenberg–Marquardt learning. The two powerful computational methodologies cooperatively operate under an annealing process toward resolving the mixed integer programming in (4) and (5). Learning state-regulated multilayer neural networks will be further applied for modeling inverse kinematics [10] of six-joint robot arms and reconstructing Markov-chain embedded mappings, and the proposed neural architecture will be extended to recruit continuous regulating states [31] in the near future.

Acknowledgments This work was supported by the project NSC 102-2221-E-259007 of National Science Council.

Appendix A Let F denote the network mapping of radial basis function.   X J x am J 2 þ J δ bm J 2 Fðx; δj θÞ ¼ r m exp 2σ 2m m  where the vector bamm denotes a center, σ 2m denotes the variance and rm denotes a posterior weight. Substituting δ ¼ ek attains   X J x am J 2 þ J ek bm J 2 F k ðxÞ ¼ r m exp 2σ 2m m     2 X J x am J 1 2bmk þ J bm J 2 ¼ exp r m exp 2σ 2m 2σ 2m m   2 X J x am J ; wmk exp ¼ 2σ 2m m

1286

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287

where wmk ¼ r m exp



bmk σ 2m



 exp

1 þ J bm J 2σ 2m

2



Elementary mappings fF k gk share fam gm and fσ 2m gm , and possess different posterior weights fwmk gk .

Appendix B B.1. Multiple Mahalanobis-NRBF modules An individual Mahalanobis-NRBF module consists of M normalized hidden units. Each hidden unit activates to represent an overlapping membership of x to non-overlapping regions partitioned by all wm based on Mahalanobis distances measured by the weight matrix A: The normalized activation of each hidden unit in a module is given by expð βdA ðx; wm ÞÞ ϕm ðxÞ ¼ PM l ¼ 1 expð βdA ðx; w l ÞÞ where β is the inverse of a temperature-like parameter and dA ðx; wm Þ denotes the square Mahalanobis distance between x and wm , dA ðx; wm Þ ¼ ‖x wm ‖2A ¼ ðx wm ÞT Aðx wm Þ An individual module of normalized RBFs translates x to y ¼ Fðx; W; AÞ ¼

M X

r m ϕm ðxÞ

m¼1

where W ¼ fwm gM m ¼ 1 and rm denotes a posterior coefficient. A system of manifold Mahalanobis distances in a space are considered. A network of multiple Mahalanobis-NRBF modules translates x to X Fðx; Wk ; Ak Þ y¼ k

where Ak denotes the weight matrix in the kth module. Annealed KL divergence minimization of deriving optimal fWk g and fAk g and optimal β refers the previous work [30].

Appendix C A general procedure of handling the proposed learning and refining approach is summarized for set-valued mapping approximation. 1. Apply the simulated learning process to partition training data in S to non-empty disjoint subsets Q 1 ; …; Q K 0 . 2. If the training error EQ k ϵ for all k, derive gk for all k, set G ¼ fg k gk and exit. 3. If the training error ESk ϵ for all k, apply the simulated learning process to each Sk for the second level partition, form Gk ¼ fg kj gj , derive a discriminating rule for identifying the exclusive membership to disjoint Dkj for all j, set G ¼ fGk gk and exit. 4. Set G ¼ fg k j EQ k ϵg and form the reduced training data S0 by Eq. (12). 5. Apply the proposed learning and refining approach to process S0 and add the refined elementary mappings to G and exit.

References [1] R. Battiti, 1st-order and 2nd-order methods for learning—between steepest descent and Newton method, Neural Comput. 4 (1992) 141. [2] C.M. Bishop, Mixture density networks, NCRG/94/004, 1994, available from 〈http://www.ncrg.aston.ac.uk〉.

[3] C.M. Bishop, Neural Networks for Pattern Recognition, Clarendon Press, Oxford, 1995. [4] R. Fletcher, Practical Methods of Optimization, Wiley, 1987. [5] M.T. Hagan, M.B. Menhaj, Training feedforward networks with the Marquardt algorithm, IEEE Trans. Neural Netw. 5 (6) (1994) 989–993. [6] J. Hertz, A. Krogh, R.G. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, 1991. [7] R.A. Jacobs, M.L. Jordan, S.J. Nowlan, G.E. Hinton, Adaptive mixtures of local experts, Neural Comput. 3 (1) (1992) 79–87. [8] C. Jayne, A. Lanitis, C. Christodoulou, Neural network methods for one-to-many multi-valued mapping problems, Neural Comput. Appl. 20 (6) (2011) 775–785. [9] E.M. Johansson, F.U. Dowla, D.M. Goodman, Back-propagation learning for multilayer feed-forward neural networks using the conjugate gradient method, World Sci. 2 (4) (1991) 291–301. [10] R. Koker, A genetic algorithm approach to a neural-network-based inverse kinematics solution of robotic manipulators based on error minimization, Inf. Sci. 222 (2013) 528–543. [11] J. Moody, J. Darken, Fast learning in networks of locally-tuned processing units, Neural Comput. 1 (1989) 281–294. [12] J. Moody, Economic forecasting: challenges and neural network solutions, in: International Symposium on Artificial Neural Networks, Hsinchu, Taiwan, 1995, pp. 1–8. [13] J. Moody, Forecasting the economy with neural nets: a survey of challenges and solutions, in: G.B. Orr, K.-R. Muller (Eds.), Neural Networks: Tricks of the Trade, 1996, pp. 347–371. [14] M.M. Moya, D.R. Hush, Network constraints and multi-objective optimization for one-class classification, Neural Netw. 9 (3) (1996) 463–474. [15] M. NØrgaard, O. Ravn, N.K. Poulsen, L.K. Hansen, Neural Networks for Modelling and Control of Dynamic Systems, Springer-Verlag, London, UK, 2000. [16] D.K. Oh, S.H. Oh, S.Y. Lee, Learning one-to-many mapping with locally linear maps based on manifold structure, Signal Process. Lett. 18 (9) (2011) 521–524. [17] C. Peterson, B. Sődergerbg, A new method for mapping optimization problems onto neural network, Int. J. Neural Syst. 1 (1989) 3. [18] R. Rohwer, J.C. van der Rest, Minimum description length, regularization, and multimodal data, Neural Comput. 8 (1996) 595–609. [19] K. Rose, E. Gurewitz, G.C. Fox, Statistical mechanics and phase transitions in clustering, Phys. Rev. Lett. 65 (8) (1990) 945–948. [20] F. Rosenblatt, Principles of Neurodynamics: Perceptrons and the Theory of Brain Mechanisms, Spartan Books, Washington, DC, 1962. [21] M. Shizawa, Regularization networks for approximating multi-valued functions: learning ambiguous input–output mappings from examples, in: IEEE International Conference on Neural Networks, 1994. [22] A. Thayananthan, R. Navaratnam, B. Stenger, P.H.S. Torr, R. Cipolla, Pose estimation and tracking using multivariate regression, Pattern Recognit. Lett. 29 (9) (2008) 1302–1310. [23] Y. Tomikawa, K. Nakayama, Approximating many valued mappings using a recurrent neural network, in: Proceedings of 1998 IEEE World Congress on Computational Intelligence, vol. 2, Anchorage, AK, USA, 1998, pp. 1494–1497. [24] P.J. Werbos, Backpropagation: past and the future, neural networks, in: IEEE International Conference, vol. 1, 1988, pp. 343–353. [25] J.M. Wu, Z.H. Lin, Learning generative models of natural images, Neural Netw. 15 (3) (2002) 337–347. [26] J.M. Wu, Natural discriminant analysis using interactive Potts models, Neural Comput. 14 (March (3)) (2002) 689–713. [27] 〈http://www.petercorke.com/Robotics_Toolbox.html〉. [28] J.M. Wu, P.H. Hsu, Annealed Kullback–Leibler divergence minimization for generalized TSP, spot identification and gene sorting, Neurocomputing 74 (12– 13) (2011) 2228–2240. [29] J.M. Wu, Multilayer Potts perceptrons with Levenberg–Marquardt learning, IEEE Trans. Neural Netw. 19 (December (12)) (2008) 2032–2043. [30] J.M. Wu, C.C. Wu, C.W. Huang, Annealed cooperative-competitive learning of Mahalanobis-NRBF neural modules for chaotic differential function approximation, Neurocomputing 136 (July (20)) (2014) 56–70. [31] J.M. Zamarren, P. Vega, State space neural network. Properties and application, Comput. Sci. Autom. Control 11 (1998) 1099–1112.

Jiann-Ming Wu was born in Taiwan in 1966. He received B. S. degree in computer science in 1988 from National Chiao Tung University, M.S. degree and Ph.D. degree in computer science and information engineering from National Taiwan University in 1990 and 1994 respectively. In 1996, he joined the faculty as an assistant professor at the Department of Applied Mathematics in National Dong Hwa University, where he was an associate professor in 1997, currently serving as a full Professor and a faculty of lifetime-free evaluation. His research interests include neural networks and information technology applications. In the past two decades, he has developed effective learning methods for neural networks toward solving independent component analysis, self-organization, classification, function approximation, gene sorting, chaotic differential function approximation and density approximation. He has devised neural organization of natural elastic nets, generalized adalines, multilayer Potts perceptrons and Mahalanobis-NRBF modules, computational methodologies of annealed expectationmaximization, annealed Kullback–Leibler divergence minimization and annealed

J.-M. Wu et al. / Neurocomputing 173 (2016) 1276–1287 cooperative-competitive learning. In 2014, he devised Sudoku Associative Memory for topological information encoding. He has been recruited in the Editorial Board of Advances in Artificial Neural Systems, Frontiers in Computational Neuroscience, and International Journal of Neural Networks and Advanced Applications.

Chun-Chang Wu was born in Taiwan in 1982. He received the M.S. degree in mathematics from National Central University, Taoyuan, Taiwan, in 2007, and the B. S. degree in applied mathematics from Providence University, Taichung, Taiwan, in 2004. He received Ph. D. degree in applied mathematics, National Dong Hwa University, Hualien, Taiwan, in 2014.

1287

Jia-Ci Chen was born in Taiwan in 1989. She received the B.S. degree in mathematics from National Dong Hwa University, Hualien, Taiwan, in 2011, and the M.S. degree in applied mathematics from National Dong Hwa University, Hualien, Taiwan, in 2013.

Yi-Ling Lin was born in Taiwan in 1989. She received the B.S. degree in mathematics from National Dong Hwa University, Hualien, Taiwan, in 2011, and the M.S. degree in applied mathematics from National Dong Hwa University, Hualien, Taiwan, in 2013.