Gradient descent learning for quaternionic Hopfield neural networks

Gradient descent learning for quaternionic Hopfield neural networks

Accepted Manuscript Gradient Descent Learning for Quaternionic Hopfield Neural Networks Masaki Kobayashi PII: DOI: Reference: S0925-2312(17)30695-1 ...

404KB Sizes 4 Downloads 302 Views

Accepted Manuscript

Gradient Descent Learning for Quaternionic Hopfield Neural Networks Masaki Kobayashi PII: DOI: Reference:

S0925-2312(17)30695-1 10.1016/j.neucom.2017.04.025 NEUCOM 18358

To appear in:

Neurocomputing

Received date: Revised date: Accepted date:

6 November 2016 5 February 2017 11 April 2017

Please cite this article as: Masaki Kobayashi, Gradient Descent Learning for Quaternionic Hopfield Neural Networks, Neurocomputing (2017), doi: 10.1016/j.neucom.2017.04.025

This is a PDF file of an unedited manuscript that has been accepted for publication. As a service to our customers we are providing this early version of the manuscript. The manuscript will undergo copyediting, typesetting, and review of the resulting proof before it is published in its final form. Please note that during the production process errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.

ACCEPTED MANUSCRIPT

Masaki Kobayashi

CR IP T

Gradient Descent Learning for Quaternionic Hopfield Neural Networks Mathematical Science Center, University of Yamanashi Takeda 4-3-11, Kofu, Yamanashi 400-8511, Japan

AN US

Abstract

CE

PT

ED

M

A Hopfield neural network (HNN) is a neural network model with mutual connections. A quaternionic HNN (QHNN) is an extension of HNN. Several QHNN models have been proposed. The hybrid QHNN utilizes the noncommutativity of quaternions. It has been shown that hybrid QHNNs with the Hebbian learning rule outperformed QHNNs in noise tolerance. The Hebbian learning rule, however, is a primitive learning algorithm, and it is necessary that we study advanced learning algorithms. Although the projection rule is one of a few promising learning algorithms, it is restricted by network topologies and cannot be applied to hybrid QHNNs. In the present work, we propose gradient descent learning, which can be applied to hybrid QHNNs. We compare the performance of gradient descent learning with that of projection rule. Results showed that the gradient descent learning outperformed projection rule in noise tolerance in computer simulations. For small training-pattern sets, hybrid QHNNs with gradient descent learning produced the best performances. QHNNs did so for large training-pattern set. In future, gradient descent learning will be extended to QHNNs with different network topology and activation function.

AC

Keywords: Hopfield neural networks, quaternion, learning algorithm, gradient descent learning 1. Introduction Models of neural networks have been extended and applied in a variety of areas. Complex-valued neural networks have been one of the most successful Preprint submitted to Neurocomputing

April 19, 2017

ACCEPTED MANUSCRIPT

AC

CE

PT

ED

M

AN US

CR IP T

extensions. Neural networks have currently been extended to quaternionic versions, such as multi-layered perceptrons and Hopfield neural networks (HNNs) [1]-[12]. Moreover, quaternions have often been used in signal processing [13, 14]. An HNN is a recurrent neural network with mutual connections, and is one of the most successful neural network models. Complex-valued HNNs can represent phasor information, and have been applied to the storage of gray-scale images [15, 16]. To improve storage capacity and noise tolerance, several learning algorithms have been studied [17]-[19]. The projection rule is a fast learning algorithm, but is restricted by network topologies [20]-[22]. The gradient descent learning algorithm is iterative and flexible in network topologies, although its learning speed is slower [23]-[25]. Complex-valued HNNs have been extended to quaternionic HNNs (QHNNs). Several types of activation functions have been proposed for QHNNs. The split activation function is the simplest [26]. The multistate activation function has been applied to the storage of color images [27]. Attractors of QHNNs have also been studied, and a variety of network topologies have been proposed [28]. Bidirectional models with multistate activation function were found to improve noise tolerance [29]. A hybrid QHNN is a model with the split activation function. It improves noise tolerance by utilizing the non-commutativity of quaternions [30]. Thus far, the Hebbian learning rule and the projection rule have been studied. The Hebbian learning rule, however, is primitive, with very little storage capacity [30]. The projection rule, meanwhile, cannot be applied to hybrid QHNNs, because it requires fully left- or right-connections in QHNNs. In this study, we propose the gradient descent learning. Gradient descent learning is flexible in network topologies, and can be applied to hybrid QHNNs. Signal processing provides a novel technique, the GHR calculus [31]-[33]. Using the GHR calculus, gradient descent learning can be represented in an elegant form. We performed computer simulations to analyze storage capacity and noise tolerance. We show that either the QHNNs or hybrid QHNNs with gradient descent learning outperformed the QHNNs with projection rule. This paper is organized as follows. Section 2 describes quaternions and QHNNs. In Section 3, gradient descent learning is provided. Section 4 describes the computer simulation and Section 5 discusses the results. Finally, Section 6 presents the conclusion.

2

ACCEPTED MANUSCRIPT

2. Quaternionic Hopfield Neural Networks

AN US

CR IP T

2.1. Hopfield Neural Networks An HNN is a model of a recurrent neural network, and consists of the neurons and symmetric mutual connections between neurons. A neuron receives the inputs from other neurons or sensors and produces a single output. The weighted sum models how the neuron copes with the inputs. In a quaternionic network, a neuron receives quaternions from the other neurons and produces a single quaternion as output. The strength of a connection is referred to as the connection weight. Neuron states and connection weights are represented by real numbers, with neurons taking bipolar states ±1. We denote the state of neuron a and the connection weight from neuron b to neuron a by xa and wab , respectively. The neurons asynchronously receive the weighted sum inputs and update their states using the activation function. The weighted sum input Ia to neuron a and the activation function f (I) are defined as X wab xb , (1) Ia = b6=a

1 −1

M



f (I) =

I>0 I < 0.

(2)

ED

If I = 0, then the neuron keeps its state. The symmetric connection weights wab = wba ensure the convergence of HNNs.

AC

CE

PT

2.2. Quaternion Quaternion algebra is a number system with four components; a quaternion is composed of one real part and three imaginary parts. The three imaginary units are represented by i, j, and k, and satisfy the following properties: i2 = j 2 = k 2 = −1, ij = −ji = k, jk = −kj = i, ki = −ik = j

(3) (4) (5) (6)

A quaternion forms q0 + q1 i + q2 j + q3 k. For two quaternions q = q0 + q1 i + q2 j + q3 k and q 0 = q00 + q10 i + q20 j + q30 k, the addition and the multiplication 3

ACCEPTED MANUSCRIPT

are defined as follows: (7)

CR IP T

q + q 0 = (q0 + q00 ) + (q1 + q10 ) i + (q2 + q20 ) j + (q3 + q30 ) k, qq 0 = (q0 q00 − q1 q10 − q2 q20 − q3 q30 ) + (q0 q10 + q1 q00 + q2 q30 − q3 q20 ) i + (q0 q20 − q1 q30 + q2 q00 + q3 q10 ) j + (q0 q30 + q1 q20 − q2 q10 + q3 q00 ) k.

(8)

AN US

From rules (4)-(6), we know that the multiplication is non-commutative. The associative and distributive laws are valid. For a quaternion q = q0 + q1 i + q2 j + q3 k, the conjugate q and the norm |q| are defined as q = q0 − q1 i − q2 j − q3 k, q p p |q| = qq = qq = q02 + q12 + q22 + q32 .

(9) (10)

M

We briefly describe the GHR calculus, which provides a partial derivative by quaternion [31]-[33]. Let g be a function with a quaternionic variable q. Then, a GHR derivative is defined by   1 ∂g ∂g ∂g ∂g ∂g = + i+ j+ k . (11) ∂q 4 ∂q0 ∂q1 ∂q2 ∂q3

ED

We describe two necessary formulas:

PT

 1 ∂|αqβ + γ|2 = |αβ|2 q + αγβ , ∂q 2 2  1 ∂|αqβ + γ| = |αβ|2 q + βγα , ∂q 2

(12) (13)

CE

where α, β and γ are quaternions. These formulas are essential to determining the gradient descent learning.

AC

2.3. Quaternionic Hopfield Neural Networks A QHNN consists of quaternionic neurons and the conjugate, not symmetric, mutual connections between them. The neuron states and connection weights are represented by quaternions. We denote the state of neuron a and connection weight from neuron b to neuron a as xa and wab , respectively. We use the split activation function as described later, so there are 16 neuron states, and the set of neuron states is denoted as S = {±1 ± i ± j ± k}. 4

ACCEPTED MANUSCRIPT

The weighted sum input to neuron a is given as X Ia = wab xb .

(14)

CR IP T

b6=a

For a quaternionic weighted sum input I = I0 + I1 i + I2 j + I3 k, the activation function f (I), referred to as the split activation function, is defined as f (I) =

I1 I2 I3 I0 + i+ j+ k. |I0 | |I1 | |I2 | |I3 |

(15)

M

AN US

If Ic = 0 (c = 0, 1, 2, 3), then the correpsponding part keeps its value. The connection weights are required to satisfy the stability condition wab = wba , which is a condition to ensure the convergence of QHNNs [30]. Since we use the split activation function, the QHNNs can be regarded as ordinary HNNs. Utilizing the noncommutativity of quaternions, we can construct other models of QHNNs. We replace wab xb in the weighted sum inputs with xb wab , and then combine both orders of multiplication. In this way, we obtain another QHNN, referred to as a hybrid QHNN [30]. We now describe the weighted sum inputs of hybrid QHNNs in detail. We define the following sets:

ED

Ve = {(a, b) | a 6= b, a + b is even.} , Vo = {(a, b) | a + b is odd.} .

(a,b)∈Ve

(18)

(a,b)∈Vo

CE

PT

The weighted sum inputs of a hybrid QHNN are defined as X X Ia = wab xb + xb wab .

(16) (17)

Even in hybrid QHNNs, the stability condition wab = wba is required.

AC

3. Gradient Descent Learning 3.1. Gradient Descent Learning for Quaternionic Hopfield Neural Networks We describe gradient descent learning for QHNNs. Let N and P be the number of neurons and training patterns, respectively. We denote the pth training pattern by xp = (xp1 , xp2 , · · · , xpN ). The learning determines the connection weights that make the training patterns stable. 5

ACCEPTED MANUSCRIPT

First we determine the gradient descent learning for QHNNs. We define the error function E as E p,

p=1

Ep = 2

N X c=1

Icp

=

X

|Icp − xpc |2 ,

wcd xpd .

d6=c

(19)

CR IP T

E =

P X

(20) (21)

P X ∂E p ∂E = −η , = −η ∂wab ∂wab p=1

M

∆wab

AN US

Icp is the weighted sum input to neuron c for the pth training pattern. Let η be a small positive number, referred to as a learning rate. In ordinary gradient descent learning, the coefficient in (20) is 0.5 to cancel it in the derivation of learning algorithm. We modified it to use GHR calculus. By GHR calculus [31]-[33], the gradient descent learning is given by

∂ |Iap − xpa |2 ∂ |Ibp − xpb |2 + ∂wab ∂wab

ED

∂E p = 2 ∂wab

!

.

(22) (23)

PT

p p We define Jab and Kab as

p Jab = Iap − wab xpb − xpa , p Kab = Ibp − wab xpa − xpb .

(24) (25)

AC

CE

p p p Then, Jab and Kab do not include wab . By substituting α = 1, β = xpb , γ = Jab , and q = wab into formula (12), we obtain the following equalities: p 2 ∂ |wab xpb + Jab | ∂ |Iap − xpa |2 = ∂wab ∂wab  1 p2 p p = |xb | wab + Jab xb 2 1 p ) xpb = (wab xpb + Jab 2 1 p = (I − xpa ) xpb . 2 a

6

(26) (27) (28) (29)

ACCEPTED MANUSCRIPT

p and q = wab into formula (13), we By substituting α = 1, β = xpa , γ = Kab obtain the following equalities:

(30)

p

AN US

CR IP T

p 2 ∂ |Ibp − xpb |2 ∂ |wab xpa + Kab | = ∂wab ∂wab  1 p2 p = |xa | wab + xpa Kab 2  1 p p p = xa xa wab + Kab 2 1 p p = ) xa (wab xpa + Kab 2   1 p p = x I − xpb . 2 a b Therefore, we obtain the gradient descent learning for QHNNs   X p p p p p p (xa − Ia )xb + xa xb − Ib ∆wab = η .

(31) (32)

(33)

(34)

(35)

ED

M

3.2. Gradient Descent Learning for Hybrid Quaternionic Hopfield Neural Networks Next, we determine the gradient descent learning for hybrid QHNNs. Icp changes to X X p Icp = wcd xpd + xd wcd . (36) (c,d)∈Ve

(c,d)∈Vo

PT

We have only to determine the gradient descent learning in the case of (a, b) ∈ p Vo . We define Lpab and Mab as Lpab = Iap − xpb wab − xpa , p Mab = Ibp − xpa wab − xpb .

(37) (38)

AC

CE

p Then, Lpab and Mab do not include wab . By substituting α = xpb , β = 1, γ = Lpab and q = wab into formula (12), we obtain the following equalities:

∂ |Iap − xpa |2 ∂ |xpb wab + Lpab |2 = ∂wab ∂wab  1 p2 = |xb | wab + xpb Lpab 2 1 p p = x (x wab + Lpab ) 2 b b 1 p p = x (I − xpa ) . 2 b a 7

(39) (40) (41) (42)

ACCEPTED MANUSCRIPT

p and q = wab into formula Likewise, by substituting α = xpa , β = 1, γ = Mab (13), we obtain the following equalities:

(43)

AN US

CR IP T

p 2 ∂ |Ibp − xpb |2 ∂ |xpa wab + Mab | = ∂wab ∂wab  1 p2 p p = |xa | wab + Mab xa 2  1 p = wab xpa + Mab xpa 2 1 p p = (xa wab + Mab )xpa 2  1 p = Ib − xpb xpa . 2

(44)

(45) (46)

(47)

M

Therefore, we obtain the gradient descent learning for hybrid QHNNs  P   p p p p p p  η (x I + x − ) ((a, b) ∈ Ve ) (x − I )x a b a a b b p   (48) ∆wab =  η P xp (xpa − Iap ) + (xp − I p )xpa ((a, b) ∈ V ). o b b b p 4. Computer Simulations

AC

CE

PT

ED

We performed computer simulations to determine storage capacities and noise tolerance. In the simulations, we compared gradient descent learning and projection rule. The projection rule cannot be applied to hybrid QHNNs. We describe the initial weights and stopping condition for the gradient descent learning. The initial weights for gradient descent learning were zero. We confirmed whether all the training patterns were stable after the connection weights were updated. If they were stable, we stopped the learning. The training patterns were randomly selected from S N . For each condition, 100 training-pattern sets were randomly generated. If all the training patterns were not stable by 1 000 iterations, it was considered that the gradient descent learning had failed. First, we performed computer simulations to ascertain the storage capacity. The storage capacity of the projection rule is N − 1. We investigated the storage capacity of gradient descent learning. The conditions of the computer simulation were N = 50 and η = 0.001. The procedure for the computer simulations was as follows: 1. We generated P training patterns randomly. 8

CR IP T

60 40 0

10

AN US

0

20

Success Rate(%)

80

100

ACCEPTED MANUSCRIPT

20

30

40

50

number of patterns

hybrid QHNN

QHNN

M

Figure 1: Storage capacities of QHNNs and hybrid QHNNs with gradient descent learning. Storage capacity of QHNNs with the projection rule was N − 1 = 49.

ED

2. A QHNN or a hybrid QHNN attempted to learn P training patterns by gradient descent learning.

AC

CE

PT

The number P of the training patterns varied from 5-50 every 5. Figure 1 shows the results of the computer simulations. The horizontal and vertical axes indicate the number of patterns and success rate, respectively. From the simulation results, we could estimate the storage capacity of gradient descent learning for QHNNs and hybrid QHNNs as N and 0.6N , respectively. The storage capacities of QHNNs with gradient descent learning were almost same as those with the projection rule. The storage capacities of hybrid QHNNs with gradient descent learning were smaller than those of QHNNs. Next, we performed computer simulations for noise tolerance. The conditions of computer simulations were N = 100 and η = 0.0005. Since this computer simulation used double neurons of that for storage capacity, we halved η. P had to smaller than storage capacity. Since the storage capacity of hybrid QHNN was 0.6N = 60, P varied from 10-50 every 10. The procedure for the computer simulations was as follows: 1. A training pattern was selected from P training patterns, and noise 9

ACCEPTED MANUSCRIPT

0.4

0.6

Noise Rate

0.8

1.0

0.4

0.6

Noise Rate

0.8

1.0

40

0.0

AN US

100 80 60 40

0.2

0.4

0.6

Noise Rate

0.8

1.0

hybrid QHNN with GDL

QHNN with GDL

QHNN with PR

0

M

Success Rate(%)

20

100 80 60 40

80

100 0.2

P=50

20

60

Success Rate(%)

0

0.0

P=40

0

CR IP T

0.2

20

80 60 40

Success Rate(%)

0

20

80 60 40

Success Rate(%)

20 0 0.0

Success Rate(%)

P=30

100

P=20

100

P=10

0.2

0.4

0.6

Noise Rate

0.8

1.0

0.0

0.2

0.4

0.6

Noise Rate

0.8

1.0

ED

0.0

PT

Figure 2: Noise tolerance of hybrid QHNN with gradient descent learning, QHNN with gradient descent learning and QHNN with the projection rule.

AC

CE

was added. Each neuron state was replaced by a new state at the rate r, referred to as the noise rate. The new state was randomly selected from S. 2. If the original training pattern was retrieved, the trial was regarded as successful; otherwise, as failed.

We performed 100 trials for each training-pattern set. Therefore, the total number of trials was 10 000 for each P . Figure 2 shows the simulation results. The horizontal and vertical axes indicate the noise and success rates, respectively. The noise tolerance of hybrid QHNNs was best from P = 10 to 40. In the case of P = 50, tolerance of QHNNs with the gradient descent learning was best. In addition, the results of the projection rule were worst 10

ACCEPTED MANUSCRIPT

in the case of P = 30 and 40. 5. Discussion

CE

PT

ED

M

AN US

CR IP T

First, let us consider the storage capacities. The storage capacity is how many training patterns the learning algorithm can successfully store for the given number of neurons. In this paper, we defined the maximal rate of the number of successfully stored random training patterns to that of neurons. The storage capacity of Hebbian learning rule was below 0.06N [30]. Those of hybrid QHNN and QHNN with gradient descent learning were approximate 0.6N and N, respectively. Therefore, gradient descent learning improved storage capacities. The storage capacities of QHNNs and hybrid QHNNs with the Hebbian learning rule were almost the same, while those with gradient descent learning were different, with QHNNs outperforming hybrid QHNNs. It is likely that the error functions of hybrid QHNNs would be more complicated than those of QHNNs, and that gradient descent learning for hybrid QHNNs would tend to be trapped at local minima. We speculate that the complicated error functions decreased the storage capacity. However, we cannot currently determine whether there exist connection weights that can store more training patterns. Next, we discuss noise tolerance. From P = 10 to 40, the hybrid QHNNs outperformed the QHNNs. In the case of P = 50, the QHNNs with gradient descent learning outperformed the others. In all cases, either the QHNNs or hybrid QHNNs with gradient descent learning outperformed the QHNNs with projection rule. We surmise that the noise tolerance of hybrid QHNNs rapidly decreased near their small storage capacities. Actually, the other models did often demonstrate a rapid decrease in noise tolerance near the storage capacities [22]. 6. Conclusions

AC

In the present work, we proposed gradient descent learning and conducted computer simulations to investigate storage capacity and noise tolerance in QHNNs and hybrid QHNNs. The projection rule cannot be applied to hybrid QHNNs. In the case of QHNNs, the storage capacities of the projection rule and gradient descent learning were almost same. The storage capacity of hybrid QHNNs with gradient descent learning was smaller than those of the others. When the number of training patterns was sufficiently small, hybrid 11

ACCEPTED MANUSCRIPT

CR IP T

QHNNs showed the best noise tolerance. When the number of training patterns was around half the number of neurons, QHNNs with gradient descent learning had the best noise tolerance. Although we should use the gradient descent learning to enhance noise tolerance, QHNN or hybrid QHNN should be selected according to the number of training patterns. Gradient descent learning algorithm would be applicable for multistate quaternionic activation function, which was used for storage of color images [34]. References

AN US

[1] A. Hirose, Complex-valued neural networks: theories and applications, Series on Innovative Intelligence 5, World Scientific (2003) [2] A. Hirose, Complex-valued neural networks, second edition, Series on Studies in Computational Intelligence, Springer (2012).

M

[3] A. Hirose, Complex-valued neural networks: advances and applications, The IEEE Press Series on Computational Intelligence, WileyIEEE Press (2013). [4] T. Nitta, Complex-valued neural networks: utilizing high-dimensional parameters, Information Science Publishing (2009).

PT

ED

[5] P. Arena, L. Fortuna, G. Muscato, M. G. Xibilia, Neural networks in multidimensional domains: fundamentals and new trends in modelling and control, Lecture Notes in Control and Information Sciences 234 Springer (1998).

CE

[6] N. Muramoto, T. Minemoto, T. Isokawa, H. Nishimura, N. Kamiura, N. Matsui, A scheme for counting pedestrians by quaternionic multilayer perceptron, Proceedings of the 14th International Symposium on Advanced Intelligent Systems (2013)F5C-3.

AC

[7] T. Isokawa, T. Kusakabe, N. Matsui, F. Peper, Quaternion neural network and its application, Lecture Notes in Artificial Intelligence 2774(2003)318-324. [8] T.Nitta, A quaternary version of the back-propagation algorithm, Proceedings of IEEE International Conference on Neural Networks 5(1995)2753-2756. 12

ACCEPTED MANUSCRIPT

[9] T. Nitta, An extension of the back-propagation algorithm to quaternions, Proceedings of International Conference on Neural Information Processing 1(1996)247-250.

CR IP T

[10] M. Kobayashi, A. Nakajima, Twisted quaternary neural networks, IEEJ Transactions on Electrical and Electronic Engineering 7(4)(2012)397401.

[11] M. Kobayashi, Uniqueness theorem for quaternionic neural networks, Signal Processing, http://dx.doi.org/10.1016/j.sigpro.2016.07.021 (2016).

AN US

[12] F. Shang, A. Hirose, Quaternion neural-network-based polSAR land classification in Poincare-sphere-parameter space, IEEE Transactions on Geoscience and Remote Sensing 52(9)(2014)5693-5703. [13] T. Nitta, Widely linear processing of hypercomplex signals, Proceedings of International Conference on Neural Information Processing (Lecture Notes in Computer Science) (2011)519-525.

ED

M

[14] Y. Xia, C. Jahanchahi, T. Nitta, D. P. Mandic Performance bounds of quaternion estimators, IEEE Transactions on Neural Networks and Learning Systems, 26(12)(2015)3287-3292.

PT

[15] S. Jankowski, A. Lozowski, J. M. Zurada, Complex-valued multistate neural associative memory, IEEE Transactions on Neural Networks 7(6)(1996)1491-1496.

CE

[16] G. Tanaka, K. Aihara, Complex-valued multistate associative memory with nonlinear multilevel functions for gray-level image reconstruction, IEEE Transactions on Neural Networks 20(9)(2009)1463-1473.

AC

[17] M. Kobayashi, Pseudo-relaxation learning algorithm for complexvalued associative memory, International Journal of Neural Systems 18(2)(2008)147-156. [18] M. K. Muezzinoglu, C. Guzelis, J. M. Zurada, A new design method for the complex-valued multistate Hopfield associative memory, IEEE Transactions on Neural Networks 14(4)(2003)891-899.

13

ACCEPTED MANUSCRIPT

[19] Y. Suzuki, M. Kobayashi, Complex-valued bipartite auto-associative memory, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E97-A(8)(2014)1680-1687.

CR IP T

[20] D. L. Lee, Improvements of complex-valued Hopfield associative memory by using generalized projection rules, IEEE Transactions on Neural Networks 17(5)(2006)1341-1347.

[21] M. Kitahara, M. Kobayashi, Projection rule for complex-valued associative memory with large constant terms, Nonlinear Theory and Its Applications 3(3)(2012)426-435.

AN US

[22] M. Kitahara, M. Kobayashi, Projection rule for rotor Hopfield neural networks, IEEE Transactions on Neural Networks and Learning Systems 25(7)(2014)1298-1307. [23] D. L. Lee, Improving the capacity of complex-valued neural networks with a modified gradient descent learning rule, IEEE Transactions on Neural Networks 12(2)(2001)439-443.

ED

M

[24] M. Kobayashi, H. Yamada, M. Kitahara, Noise robust gradient descent learning for complex-valued associative memory, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E94-A(8)(2011)1756-1759.

PT

[25] M. Kobayashi, Gradient descent learning rule for complex-valued associative memories with large constant terms, IEEJ Transactions on Electrical and Electronic Engineering 11(3)(2016)357-363.

CE

[26] T. Isokawa, H. Nishimura, N. Kamiura, N. Matsui, Associative memory in quaternionic Hopfield neural network, International Journal of Neural Systems 18(2)(2008)135-145.

AC

[27] T. Minemoto, T. Isokawa, H. Nishimura, N. Matsui, Quaternionic multistate Hopfield neural network with extended projection rule, Artificial Life and Robotics (2015)1-6. [28] M. Kobayashi, Rotational invariance of quaternionic Hopfield neural networks, IEEJ Transactions on Electrical and Electronic Engineering 11(4)(2016)516-520. 14

ACCEPTED MANUSCRIPT

CR IP T

[29] T. Minemoto, T. Isokawa, M. Kobayashi, H. Nishimura, N. Matsui, On the performance of quaternionic bidirectional auto-associative memory, Proceedings of International Joint Conference on Neural Networks (2015)2910-2915. [30] M. Kobayashi, Hybrid quaternionic Hopfield neural network, IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences E98-A(7)(2015)1512-1518.

AN US

[31] D. Xu, Y. Xia, D. P. Mandic, Optimization in quaternion dynamic systems: gradient, Hessian, and learning algorithms, IEEE Transactions on Neural Networks and Learning Systems, 27(2)(2016)249-261.

[32] D. Xu, D. P. Mandic, The theory of quaternion matrix derivatives, IEEE Transactions on Signal Processing, 63(6)(2015)1543-1556. [33] D. Xu, C. Jahanchahi, C. C. Took, D. P. Mandic, Enabling quaternion derivatives: the generalized HR calculus, Royal Society open science, 2(8)(2015)150255.

PT

ED

M

[34] T. Minemoto, T. Isokawa, H. Nishimura, N. Matsui, Quaternionic multistate Hopfield neural network with extended projection rule, Artificial Life and Robotics, 21(1)(2016)106-111.

AC

CE

Masaki Kobayashi is a professor at University of Yamanashi. He received the B.S., M.S., and D.S. degrees in mathematics from Nayoga University in 1989, 1991, and 1996, respectively. He became a research associate and an assosciate profesoor at the University of Yamanashi in 1993 and 2006, respectively. He has been a professor since 2014.

15