Layered neural networks with non-monotonic transfer functions

Layered neural networks with non-monotonic transfer functions

Physica A 317 (2003) 270 – 298 www.elsevier.com/locate/physa Layered neural networks with non-monotonic transfer functions Katsuki Katayama, Yasuo S...

2MB Sizes 2 Downloads 24 Views

Physica A 317 (2003) 270 – 298

www.elsevier.com/locate/physa

Layered neural networks with non-monotonic transfer functions Katsuki Katayama, Yasuo Sakata, Tsuyoshi Horiguchi∗ Department of Computer and Mathematical Sciences, GSIS, Tohoku University, Sendai 980-8579, Japan Received 11 March 2001

Abstract We investigate storage capacity and generalization ability for two types of fully connected layered neural networks with non-monotonic transfer functions; random patterns are embedded into the networks by a Hebbian learning rule. One of them is a layered network in which a non-monotonic transfer function of even layers is di3erent from that of odd layers. The other is a layered network with intra-layer connections, in which the non-monotonic transfer function of inter-layer is di3erent from that of intra-layer, and inter-layered neurons and intra-layered neurons are updated alternately. We derive recursion relations for order parameters for those layered networks by the signal-to-noise ratio method. We clarify that the storage capacity and the generalization ability for those layered networks are enhanced in comparison with those with a conventional monotonic transfer function when non-monotonicity of the transfer functions is selected optimally. We also point out that some chaotic behavior appears in the order parameters for the layered networks when non-monotonicity of the transfer functions increases. c 2002 Elsevier Science B.V. All rights reserved.  PACS: 05.50.+q; 75.10.N Keywords: Layered neural network; Non-monotonic transfer function; Hebb rule; Storage capacity; Generalization ability; Signal-to-noise ratio method

1. Introduction Extensive researches have been performed as for storage capacity and generalization ability of various arti=cial neural networks within the framework of the statistical physics [1–24]. As for the arti=cial neural networks, there exist mainly two types of ∗

Corresponding author. Tel.: +81-22-217-5842; fax: +81-22-217-5851. E-mail address: [email protected] (T. Horiguchi).

c 2002 Elsevier Science B.V. All rights reserved. 0378-4371/03/$ - see front matter  PII: S 0 3 7 8 - 4 3 7 1 ( 0 2 ) 0 1 3 1 9 - 5

K. Katayama et al. / Physica A 317 (2003) 270 – 298

271

network architectures. One of them is a recurrent network and the other is a layered (feed-forward) network. On the other hand, there exists a hybrid network which is obtained by combining a recurrent network with a layered network. We hereafter call it a layered network with intra-layer connections in the present paper. The storage capacity of an asynchronous fully connected recurrent network, the so-called Hop=eld neural network (HNN) [25], was investigated by Amit et al. [1–3] and others [4–6] using the replica method [26–28], for random patterns embedded into the network by the Hebbian learning rule [29]. The critical value C of the storage capacity  (=p=N ) is about 0:138 at zero temperature, where N is the total number of neurons and p the total number of embedded patterns. The generalization ability of the asynchronous fully connected HNN at zero temperature was investigated by Fontanari [7,8] using the replica method. It was clari=ed that the network is able to extract a set of concepts from example patterns. The generalization ability of the asynchronous fully connected HNN at =nite temperature was investigated by Krebs and Theumann [9] using the replica method. The storage capacity of a synchronous fully connected layered neural network (LNN) was investigated by Domany et al. [10–13] using the signal-to-noise ratio method; the critical value C of the storage capacity is about 0:269 at zero temperature. The generalization ability of the synchronous fully connected LNN was investigated by Dominguez and Theumann [14,15] using the signal-to-noise ratio method. Coolen and Viana [16] investigated the storage capacity of a fully connected LNN with intra-layer connections using the replica method, in which the inter-layered neurons and the intra-layered neurons are updated simultaneously. They clari=ed that the storage capacity of the LNN with the intra-layer connections is enhanced in comparison with that of the recurrent network and that of the LNN at zero temperature, due to competition between the inter-layer connections and the intra-layer connections; the critical value C of the storage capacity is about 0:317 at zero temperature. Katayama and Horiguchi [17] investigated the storage capacity of a fully connected LNN with intra-layer connections using Q (¿ 2)-states clock neurons, for random Q-valued patterns embedded by the Hebbian learning rule. It has been known that the neural networks with a non-monotonic transfer function yield a larger critical value of the storage capacity than that with a conventional monotonic one. The increase of C using the non-monotonic transfer function was =rst pointed out by Morita et al. [18] by numerical simulations. The storage capacity was investigated for the asynchronous fully connected HNN with the non-monotonic transfer function by Yoshizawa et al. [19], Shiino and Fukai [20] and Inoue [21]. Nishimori and Opris [22] investigated the retrieval dynamics of the synchronous fully connected HNN with the non-monotonic transfer function using the extended Amari–Maginu theory. They obtained the increase of C in the equilibrium state at zero temperature and an oscillatory behavior in the dynamical order parameters, when non-monotonicity of the transfer function increases. It has also been known that the neural networks with a non-monotonic transfer function improve the generalization ability in comparison with that with a conventional monotonic one. For example, Dominguez and Theumann [14,15] investigated the generalization ability of the synchronous fully connected LNN with the non-monotonic transfer function using the signal-to-noise ratio method, and obtained some improvement of the generalization ability. They also found that some

272

K. Katayama et al. / Physica A 317 (2003) 270 – 298

chaotic behavior appears in order parameters as a function of recursion step when non-monotonicity of the transfer function increases. Hence, we are interested in the storage capacity and the generalization ability of the LNN when the non-monotonic transfer function of even layers is di3erent from that of odd layers. We are also interested in the storage capacity and the generalization ability of the LNN with the intra-layer connections when the non-monotonic transfer function of inter-layer is different from that of intra-layer. The present paper is organized as follows. In Section 2, we formulate the LNN in which the non-monotonic transfer function of even layers is di3erent from that of odd layers. The storage capacity and the generalization ability for the LNN with two di3erent non-monotonic transfer functions are investigated by using the signal-to-noise ratio method. In Section 3, we formulate the LNN with the intra-layer connections, in which the non-monotonic transfer function of inter-layer is di3erent from that of intra-layer, and inter-layered neurons and intra-layered neurons are updated alternately. The storage capacity and the generalization ability for the LNN with the intra-layer connections with two di3erent non-monotonic transfer functions are investigated by using the signal-to-noise ratio method. Concluding remarks are given in Section 4.

2. Layered neural network In this section, we consider a fully connected LNN in which a non-monotonic transfer function (NMTF) of even layers is di3erent from that of odd layers. We investigate the storage capacity and the generalization ability for the LNN with two di3erent NMTFs for even layers and odd layers. We show a schematic description of a relation among the (2n − 1)th layer, the 2nth layer and the (2n + 1)th layer (n = 1; : : : ; ∞) in the LNN in Fig. 1. We assume that there exist N neurons in each layer and each neuron in the (t + 1)th layer is received inputs from all the neurons in the tth layer (t = 1; : : : ; ∞). An update rule for each neuron in the (t + 1)th layer is de=ned as follows:   N  it+1 = Ft+1  (1) Jijt jt  ; j=1

where it represents the state of a neuron in the tth layer and Jijt the connection from neuron j in the tth layer to neuron i in the (t +1)th layer. The NMTF Ft (x) is assumed to be given as follows:  Go (x); t = 2n − 1 ; Ft (x) = (2) Ge (x); t = 2n ; where the functions Go (x) and Ge (x) are de=ned by  sgn(x); |x| 6  ; G (x) = 0; |x| ¿  ;

(3)

K. Katayama et al. / Physica A 317 (2003) 270 – 298

273

Fig. 1. A schematic description of a relation among the (2n − 1)th layer, the 2nth layer and the (2n + 1)th layer in the LNN.

where  stands for e and o, and o and e are a threshold for the NMTF in odd layers and a threshold for the NMTF in even layers, respectively. We note that G (x) reduces to a conventional monotonic transfer function (MTF) sgn(x) when  → ∞. 2.1. Storage capacity We =rst investigate the storage capacity for the LNN in which the NMTF of even layers is di3erent from that of odd layers. We de=ne the connection Jijt from neuron j in the tth layer to neuron i in the (t + 1)th layer by using the Hebbian learning rule as follows: Jijt =

p 1  ; t+1 ; t i j ; N

(4)

=1

where p is the total number of embedded patterns. We assume p = N as usual, where t  is the load parameter. The component ; is assumed to be a quenched random i variable and to take ±1 independently with equal probability: t ; t ; t 1 1 P(; i ) = 2 (i − 1) + 2 (i + 1) :

(5)

A degree of the retrieve is measured by the overlap, between a binary embedded pattern t and a state of neuron, it , the so-called the retrieval overlap is de=ned as follows: ; i r ; t =

N 1  ; t t i i : N i=1

(6)

274

K. Katayama et al. / Physica A 317 (2003) 270 – 298

We assume that all neurons of each layer retrieve only the pattern with  = 1. Hence, we call the pattern with  = 1 the condensed pattern and the patterns labeled by  = 2; : : : ; p the uncondensed patterns in this subsection. Under this assumption, we √ have r 1; t = O(1), and r ; t = O(1= N ) for  = 2; : : : ; p. We de=ne the following order parameters, r t , Rt and Qt for the LNN: r t ≡ r 1; t  ;

(7)

N 1  R ≡ (R; t )2  ; N

(8)

t

=2

Qt ≡

N 1  ( it )2  ; N

(9)

i=1

√ where R; t = N r ; t and X  denotes a pattern average of quantity X . When we perform the analysis using the signal-to-noise ratio method, we obtain the following recursion relations for the order parameters in the thermodynamic limit of N → ∞: √ r t+1 = Ft (r t + Rt x) x ; (10) Rt+1 = Qt+1 + (t+1 )2 Rt ; Qt+1 = {Ft (r t + where t+1 = Ft (r t +





Rt x)}2 x ;

Rt x) x :

(11) (12)

(13)

Here Ft (x) represents the derivative of Ft (x) and an average (x) x for quantity (x) is given by  2  ∞ x dx √ exp − (x) x = (x) : (14) 2 2 −∞ Now we investigate performance for the LNN with the two di3erent NMTFs by using the obtained recursion relations for order parameters (10)–(13). We solve recursion relations (10)–(13) numerically by means of an iterative method after substituting r 1 =1 into Eqs. (10)–(13) as the initial value of r t . We have con=rmed that the critical value C of the storage capacity for the LNN with the NMTFs of o = e = ∞ is about 0.269. We show behavior of the retrieval error E for the LNN with the di3erent NMTFs of o =∞ and e =0:8 in Fig. 2, where E =limt→∞ (1−r t ) when r t is de=ned by Eq. (7). We =nd that some chaotic behavior for the retrieve appears in both even layers and odd layers, even for small values of threshold o and e . We show the storage capacity in the o –e plane in Fig. 3; we have put C =0 when the chaotic behavior for the retrieve appears. We =nd a maximum value 0.299 of the storage capacity at o = e = 1:515. We show behavior of the retrieval error for the NMTFs with o = e = 1:515 and for the MTF in Fig. 4. The retrieval error for the NMTFs with o = e = 1:515 is always

K. Katayama et al. / Physica A 317 (2003) 270 – 298

275

Fig. 2. Behavior of the retrieval error as a function of the load parameter  for the LNN with the two di3erent NMTFs of o = ∞ and e = 0:8.

Fig. 3. Phase diagram of the storage capacity in o –e plane for the LNN with the two di3erent NMTFs.

larger than that for the MTF when  ¡ 0:269. This is due to the fact that the zero state of the NMTF does not match with the values of ±1 for the binary embedded patterns. We show the retrieval error of odd layer in the o – plane in Fig. 5 and that of even layer in the o – plane in Fig. 6 for the NMTFs with e = 1:515. It turns out that the value of the retrieval error for even layers is di3erent from that for odd layers when the value of the threshold for even layers is di3erent from that for the odd layers. We investigate the basin of attraction by solving recursion relations (10)–(13) numerically by substituting values r 1 in [0; 1] as an initial value of r t . We =nd that there exists a lower bound of r 1 , rL1 ; the system retrieves the embedded patterns when

276

K. Katayama et al. / Physica A 317 (2003) 270 – 298

Fig. 4. Behavior of the retrieval error as a function of the load parameter  for the LNN with the NMTFs of o = e = 1:515 and for the LNN with the MTF.

rL1 6 r 1 6 1, but does not when r 1 ¡ rL1 for each value of  6 C . We show the basin of attraction for the NMTFs with o = e = 1:515 and that for the MTF in Fig. 7. As seen in Fig. 7, the basin of attraction for the NMTFs with o = e = 1:515 is larger than that for the MTF. 2.2. Generalization ability We investigate the generalization ability for the LNN within the framework by Dominguez and Theumann [15]. We assume that all neurons of each layer learn a &; t ∈ {−1; 0; 1} (& = 1; : : : ; s) for each binary concept patset of s example patterns %; i ; t tern i ∈ {−1; 1} ( = 1; : : : ; p) by the Hebbian learning rule, where p = N is the total number of the concept patterns. Then the connection Jijt from neuron j in the tth layer to neuron i in the (t + 1)th layer is de=ned as follows: Jijt =

p s 1   ; &; t+1 ; &; t %i %j ; sb2 N

(15)

=1 &=1

&; t t ; t = (i; &; t ; of the binary concept where %; i i . We assume that each component i patterns is a quenched and independent random variable, generated according to the following probability distribution: t ; t ; t 1 1 P(; i ) = 2 (i − 1) + 2 (i + 1) :

(16)

We also assume that each component (i; &; t ∈ {−1; 0; 1} is a quenched and independent random variable, generated according to the following probability

K. Katayama et al. / Physica A 317 (2003) 270 – 298

277

Fig. 5. Phase diagram of the retrieval error in o – plane for odd layers when the threshold e of even layers is =xed to 1:515.

distribution: P((i; &; t ) =

a+b a−b ((i; &; t − 1) + ((i; &; t + 1) + (1 − a)((i; &; t ) : 2 2

(17)

The parameter a means an activity of the example patterns and is de=ned as follows: &; t 2 (%; ) ( = a i

(0 6 a 6 1) ;

(18)

where Y ( denotes a pattern average of quantity Y . The parameter b measures a correlation between the example patterns and the binary concept patterns as follows: &; t ; t i ( = b %; i

(0 6 b 6 1) ;

(19)

so that the smaller the value of b is, the more diLcult is the extraction of the concepts from the examples. We note that parameters a and b satisfy the following condition: &; t 2 &; t ; t 2 &; t ; t 2 ) ( = (%; i ) ( ¿ %; i ( = b2 : a = (%; i i i

(20)

The generalization ability is characterized by the generalization overlap, which is t and a state of de=ned by the overlap between a binary concept pattern ; i

278

K. Katayama et al. / Physica A 317 (2003) 270 – 298

Fig. 6. Phase diagram of the retrieval error in o – plane for even layers when the threshold e of even layers is =xed to 1:515.

Fig. 7. Basin of attraction for the LNN with the NMTFs of o = e = 1:515 and for the LNN with the MTF.

K. Katayama et al. / Physica A 317 (2003) 270 – 298

279

neuron it : r ; t =

N 1  ; t t i i : N

(21)

i=1

&; t The retrieval overlap is de=ned by the overlap between an example pattern %; and i t a state of neuron i :

l; &; t =

N 1  ; &; t t %i i : N

(22)

i=1

We assume that all neurons of each layer extract only the concept with  =1. Hence, we call the pattern of =1 the condensed pattern and the patterns labeled by =2; : : : ; p the uncondensed patterns in this subsection. Under we have r 1; t =O(1) √ √ this assumption, ; t ; &; t 1; &; t =O(1= N ) for &=1; : : : ; s and l =O(1) for &=1; : : : ; s, and r =O(1= N ) and l and  = 2; : : : ; p. The following order parameters, r t , lt , Lt and Qt are de=ned for the LNN: r t ≡ r 1; t ( ;

(23)

s

1  1; &; t l ( ; sb &=1

s 2 N 1  ; &; t 1  t L ≡ L N s lt ≡

=2

&=1

(24) ;

(25)

(

N 1  Qt ≡ ( it )2 ( ; (26) N i=1 √ √ where R; t = N r ; t and bL; &; t = N l; &; t . Using the signal-to-noise ratio method, we obtain the following recursion relations for the order parameters in the thermodynamic limit of N → ∞: √ √ r t+1 = Ft (r t A − 1x + Lt Ay + r t ) x y ; (27)

where

lt+1 = r t+1 + (A − 1)t+1 lt ;

(28)

Lt+1 = AQt+1 + A2 (t+1 )2 Lt ;

(29)

√ √ Qt+1 = {Ft (r t A − 1x + Lt Ay + r t )}2 x y ;

(30)

√ √ t+1 = Ft (r t A − 1x + Lt Ay + r t ) x y ;

(31)

A=1+

1 (a − b2 ) : sb2

(32)

280

K. Katayama et al. / Physica A 317 (2003) 270 – 298

Fig. 8. Behavior of the generalization error as a function of the load parameter  for the LNN with the NMTFs of o = e = 1:515 and for the LNN with the MTF in the case of b = 0:2, 0:6 and 1 when a = 1 and s = 20.

Here we de=ne an average (x; y) x y for quantity (x; y) as follows:  (x; y) x y =



−∞





−∞

 2 d x dy x + y2 (x; y) : exp − 2 2

(33)

Now we investigate performance of the LNN with the NMTFs by solving numerically recursion relations (27)–(32) iteratively after substituting r 1 = 1 as the initial value of r t . We show behavior of the generalization error for the LNN with the NMTFs of o =e =1:515 and for the LNN with the MTF in the case of b=0:2, 0:6 and 1 for a=1 and s = 20 in Fig. 8. Here the generalization error E is de=ned by E = limt→∞ (1 − r t ), when r t is de=ned by Eq. (23). We =nd that the generalization ability for the NMTFs of o = e = 1:515 improves in comparison with that for the MTF when the value of b is close to 1, but decreases when the value of b is small. In order to clarify this point, we show the optimal threshold, at which the generalization ability is maximum, as a function of the correlation b for the NMTFs in the case of a = 1 and s = 20 in Fig. 9. We =nd that the value of the threshold for the NMTF of even layers, e , is always the same as that of odd layers, o , when the generalization ability is maximum for each value of b. It also turns out that the value of the optimal threshold increases monotonically from o = e = 1:515 and diverges to ∞ as the value of b decreases. This means that the generalization ability for the NMTFs of o = e = 1:515 is optimal when the value of b is close to 1. On the other hand, the generalization ability for the MTF becomes optimal when the value of b is small. Hence, we investigate the generalization ability at the optimal threshold for each value of b, and show it along

K. Katayama et al. / Physica A 317 (2003) 270 – 298

281

Fig. 9. Behavior of the optimal thresholds as a function of the correlation b for the LNN with the two di3erent NMTFs in the case of a = 1 and s = 20.

Fig. 10. Behavior of the generalization ability as a function of the correlation b for the LNN with the NMTFs of the optimal thresholds and for the LNN with the MTF.

with that with the MTF for each value of b in Fig. 10. We =nd an improvement of the generalization ability for the range 0 ¡ b 6 1 by using the optimal threshold for each value of b. We show the generalization error in the b– plane for the MTF in Fig. 11 and that for the NMTFs with the optimal threshold in Fig. 12. The generalization error for the NMTFs with the optimal threshold is always larger than that for the MTF, although the generalization ability of the former is larger than that of the latter for each value of b. This behavior is similar to the relation between the storage capacity and the retrieval error. We investigate the activity dependence of the generalization ability of the LNN with the NMTFs of the optimal threshold. We show the generalization ability as a

282

K. Katayama et al. / Physica A 317 (2003) 270 – 298

Fig. 11. Phase diagram of the generalization error in b– plane for the LNN with the MTF.

function of activity a for the NMTFs with the optimal threshold and that for the MTF when b = 0:6 and s = 20 in Fig. 13. We note that a ¿ b2 holds from Eq. (20). We =nd an improvement of the generalization ability for all range 0:36 6 a 6 1 by using the optimal threshold for the NMTFs at each value of a. We also investigate the generalization ability as a function of the number of the example patterns s and show it in Fig. 14. 3. Layered neural network with intra-layer connections In this section, a fully connected LNN with intra-layer connections is considered; the NMTF of inter-layer is di3erent from that of intra-layer, and the inter-layered neurons and the intra-layered neurons are updated alternately. We investigate the storage capacity and the generalization ability for the LNN with the intra-layer connections with two di3erent NMTFs for intra- and inter-layers. We show a schematic description of the LNN with the intra-layer connections in Fig. 15. We assume that there exist N neurons in each layer, the neurons in each layer are fully connected with those in the same layer, and each neuron in the (t + 1)th layer is connected with all the neurons in the tth layer. The inter-layered neurons and the intra-layered neurons are assumed

K. Katayama et al. / Physica A 317 (2003) 270 – 298

283

Fig. 12. Phase diagram of the generalization error in b– plane for the LNN with the NMTFs of the optimal thresholds.

to be updated alternately, and hence we have the following update rules:   N  Wijt jt  Sit = F 

(34)

j=1

for the intra-layer neurons and   N  Jijt Sjt  it+1 = G 

(35)

j=1

for the inter-layer neurons. Here, it and Sit represent a state of neuron i in the tth layer and that of neuron i in the tth layer, respectively. Wijt and Jijt represent the intra-layer connection between neurons i and j in the tth layer and the inter-layer connection from neuron j in the tth layer to neuron i in the (t + 1)th layer, respectively. The NMTFs F(x) and G(x) are de=ned as follows:  sgn(x); |x| 6 F ; (36) F(x) = 0; |x| ¿ F ;

284

K. Katayama et al. / Physica A 317 (2003) 270 – 298

Fig. 13. Behavior of the generalization ability as a function of the activity a for the LNN with the NMTFs of the optimal thresholds and for the LNN with the MTF when b = 0:6 and s = 20.

Fig. 14. Behavior of the generalization ability as a function of the number of the example patterns s for the LNN with the NMTFs of the optimal thresholds and for the LNN with the MTF when a = 1 and b = 0:6.

K. Katayama et al. / Physica A 317 (2003) 270 – 298

285

Fig. 15. A schematic description of a relation among the tth layer, the (t + 1)th layer and the (t + 2)th layer in the LNN with intra-layer connections.

 G(x) =

sgn(x);

|x| 6 G ;

0;

|x| ¿ G ;

(37)

where F and G are a threshold of the NMTF for intra-layer updates and that for inter-layer updates, respectively. 3.1. Storage capacity We now investigate the storage capacity for the LNN with the intra-layer connections de=ned above. We use the Hebbian learning rule for the intra-layer connection Wijt between neurons i and j in the tth layer and the inter-layer connection Jijt from neuron j in the tth layer to neuron i in the (t + 1)th layer. Hence, we have p 1  ; t ; t i j ; N

(38)

1  ; t+1 ; t i j ; N

(39)

Wijt = Jijt =

=1 p

=1

where p is the total number of embedded patterns and p = N as usual. We assume t that ; i is a quenched and independent random variable given by Eq. (5). The retrieval t overlaps m; t and r ; t are de=ned by the overlap between an embedded pattern ; i and ; t a state of neuron, Sit , and by the overlap between an embedded pattern i and a state of neuron, it , respectively, m; t =

N 1  ; t t i Si N i=1

(40)

286

K. Katayama et al. / Physica A 317 (2003) 270 – 298

and r

; t

N 1  ; t t = i i : N

(41)

i=1

By assuming that the pattern with  = 1 is condensed pattern and then we have : : ; p are r 1; t = O(1) and m1; t = O(1). We also assume that the√ patterns with  = 2; :√ uncondensed patterns and then we have r ; t = O(1= N ) and m; t = O(1= N ) for  = 2; : : : ; p. Hence, the following order parameters, r t , mt , Rt , M t , Qt and U t are de=ned for the LNN: r t ≡ r 1; t  ;

(42)

mt ≡ m1; t  ;

(43)

Rt ≡ Mt ≡ Qt ≡

N 1  (R; t )2  ; N

1 N 1 N

=2 N 

(M ; t )2  ;

=2 N 

( it )2  ;

(44) (45) (46)

i=1 N 

1 (Sit )2  ; (47) N i=1 √ √ where R; t = N r ; t and M ; t = N m; t . By using the signal-to-noise ratio method, we obtain the following recursion relations for the order parameters in the thermodynamic limit of N → ∞: √ r t+1 = G(mt + M t x) x ; (48) Ut ≡

mt+1 = F(r t +



Rt x) x ;

(49)

Rt+1 = Qt+1 + (Mt+1 )2 M t ;

(50)

M t+1 = U t+1 + 2r t+1 mt+1 3t+1 + (3t+1 )2 Rt+1 ;

(51)

Qt+1 = {G(mt + U t+1 = {F(r t + where t+1 = G  (mt + 3t+1 = F  (r t +



M t x)}2 x ;

(52)

Rt x)}2 x ;

(53)

M t x) x ;

(54)

√ √



Rt x) x :

(55)

K. Katayama et al. / Physica A 317 (2003) 270 – 298

287

Fig. 16. Behavior of the retrieval error as a function of the load parameter  for the LNN with the intra-layer connections with the two di3erent NMTFs of F = 0:8 and G = ∞.

Solving the above equations numerically by iteration starting with r 1 =1, we calculate the retrieval error, the storage capacity and so on. We show =rst behavior of the retrieval error for the NMTFs with F = 0:8 and G = ∞ in Fig. 16. Here the retrieval error after inter-layer updates is de=ned by E = limt→∞ (1 − r t ) and the retrieval error after intra-layer updates is de=ned by E = limt→∞ (1 − mt ). We =nd that some chaotic behavior appears for both the retrieval error after the inter-layer updates and the retrieval error after the intra-layer updates. We show the storage capacity in the F –G plane in Fig. 17. We have put C = 0 in Fig. 17 when the chaotic behavior appears. We =nd that there exists a maximum value of the storage capacity; its value is about 0.275 when the thresholds F and G are equal to 1.220 and 1.330, respectively. We show the retrieval error with the NMTFs of F = 1:220 and G = 1:330 and that with the MTF in Fig. 18. We =nd that there exists large di3erence between the retrieval error with the NMTFs of F = 1:220 and G = 1:330 and that with the MTF after the intra-layer updates. We show a phase diagram of the retrieval error in the F – plane for the intra-layer neurons in Fig. 19 and a phase diagram of the retrieval error in the F – plane for the intra-layer neurons in Fig. 20, respectively, when the threshold G is =xed to 1.330. We =nd that the retrieval error for the intra-layer neurons becomes larger in a vicinity of the optimal threshold after intra-layer updates, but the retrieval error for the intra-layer neurons is smaller in a vicinity of the optimal threshold after inter-layer updates. We see in Fig. 21 that the basin of attraction for the LNN with the NMTFs of F = 1:220 and G = 1:330 is larger than that for the LNN with the MTF. 3.2. Generalization ability We investigate the generalization ability for the LNN within the framework by Dominguez and Theumann [15]. We assume that all neurons of each layer learn a set

288

K. Katayama et al. / Physica A 317 (2003) 270 – 298

Fig. 17. Phase diagram of the storage capacity in F –G plane for the LNN with the intra-layer connections with the two di3erent NMTFs.

Fig. 18. Behavior of the retrieval error as a function of the load parameters  for the LNN with the intra-layer connections with the two di3erent NMTFs of F =1:220 and G =1:330 and for the LNN with the intra-layer connections with the MTF.

K. Katayama et al. / Physica A 317 (2003) 270 – 298

289

Fig. 19. Phase diagram of the retrieval error in F – plane for the LNN with the intra-layer connections with the di3erent two NMTFs after intra-layer updates when the threshold G is =xed to 1:330.

&; t of s example patterns %; ∈ {−1; 0; 1} (& = 1; : : : ; s) for each binary concept pattern i ; t i ∈ {−1; 1} ( = 1; : : : ; p) by the Hebbian learning rule, where p = N as usual. The intra-layer connection Wijt between neurons i and j in the tth layer and the inter-layer connection Jijt from neuron j in the tth layer to neuron i in the (t + 1)th layer are de=ned as follows: p s 1   ; &; t ; &; t %i %j ; sb2 N

(56)

p s 1   ; &; t+1 ; &; t %i %j ; sb2 N

(57)

Wijt =

=1 &=1

Jijt =

=1 &=1

&; t t ; t where %; = (i; &; t ; is a quenched and independent random i i . We assume that i ; &; t variable given by Eq. (16) and (i ∈ {−1; 0; 1} is a quenched and independent random variable given by Eq. (17). The generalization ability is characterized by the generalization overlaps, m; t and r ; t , which are de=ned by the overlaps between a t binary concept pattern ; and a state of neuron, Sit , and between a binary concept i

290

K. Katayama et al. / Physica A 317 (2003) 270 – 298

Fig. 20. Phase diagram of the retrieval error in F – plane for the LNN with the intra-layer connections with the di3erent two NMTFs after the inter-layer updates when the threshold G is =xed to 1.330.

t pattern ; and a state of neuron, it , respectively, as follows: i N 1  ; t t i Si ; N

m; t =

(58)

i=1

r ; t =

N 1  ; t t i i : N

(59)

i=1

The retrieval overlaps e; &; t and l; &; t are de=ned by the overlaps between an example &; t &; t pattern %; and a state of neuron, Sit , and between an example pattern %; and a i i t state of neuron, i , as follows: e; &; t =

N 1  ; &; t t %i Si ; N

(60)

N 1  ; &; t t %i i : N

(61)

i=1

l; &; t =

i=1

K. Katayama et al. / Physica A 317 (2003) 270 – 298

291

Fig. 21. Basin of attraction for the LNN with the intra-layer connections with the two di3erent NMTFs of F = 1:220 and G = 1:330 and for the LNN with the intra-layer connections with the MTF.

We assume that the pattern with  = 1 is condensed pattern and then we have r 1; t = O(1) and m1; t = O(1). We assume that the patterns with  = 2; : : : ; p are uncondensed patterns and we have l1; &; t = O(1)√and e1; &; t = O(1) for √ therefore √ √ & = 1; : : : ; s, and ; t ; t r =O(1= N ), m =O(1= N ), l; &; t =O(1= N ) and e; &; t =O(1= N ) for &=1; : : : ; s and  = 2; : : : ; p. We de=ne r t , mt , lt , et , Lt , E t , Qt and U t as follows: r t ≡ r 1; t ( ;

(62)

mt ≡ m1; t ( ;

(63)

lt ≡ et ≡

s

1  1; &; t l ( ; sb

(64)

1 sb

(65)

&=1 s 

e1; &; t ( ;

&=1

N 1  L ≡ N t

=2



s

1  ; &; t L s &=1

2 ; (

(66)

292

K. Katayama et al. / Physica A 317 (2003) 270 – 298

Fig. 22. Behavior of the generalization error as a function of the load parameter  for the LNN with the intra-layer connections with the two di3erent NMTFs of F = 1:220 and G = 1:330 and for the LNN with the intra-layer connections with the MTF in the case of b = 0:2, 0:6 and 1 when a = 1 and s = 20.

Fig. 23. Behavior of the optimal thresholds as a function of the correlation b for the LNN with the intra-layer connections with the two di3erent NMTFs in the case of a = 1 and s = 20.

K. Katayama et al. / Physica A 317 (2003) 270 – 298

293

Fig. 24. Behavior of the generalization ability as a function of the correlation b for the LNN with the intra-layer connections with the two di3erent NMTFs of the optimal thresholds and for the LNN with the intra-layer connections with the MTF.

N 1  E ≡ N t

=2



s

1  ; &; t E s

2

&=1

;

(67)

(

N 1  t Q ≡ ( it )2 ( ; N

(68)

N 1  (Sit )2 ( ; N

(69)

i=1

Ut ≡

i=1

√ √ √ √ where R; t = N r ; t , M ; t = N m; t , bL; &; t = N l; &; t and bE ; &; t = N e; &; t . By using the signal-to-noise ratio method, we obtain the following recursions relations for the order parameters in the thermodynamic limit of N → ∞: √ √ r t+1 = G(et A − 1x + E t Ay + et ) x y ; (70) √ √ mt+1 = F(lt+1 A − 1x + Lt+1 Ay + r t+1 ) x y ;

(71)

lt+1 = r t+1 + (A − 1)Mt+1 et ;

(72)

et+1 = mt+1 + (A − 1)3t+1 lt+1 ;

(73)

Lt+1 = AQt+1 + A2 (t+1 )2 E t ;

(74)

294

K. Katayama et al. / Physica A 317 (2003) 270 – 298

Fig. 25. Phase diagram of the generalization error in b– plane for the LNN with the intra-layer connections with the MTF.

E t+1 = AU t+1 + A2 (3t+1 )2 Lt+1 + 2A2 mt+1 r t+1 3t+1 ; Qt+1 = {G(e

√ t



E t Ay + et )}2 x y ; √ √ U t+1 = {F(lt+1 A − 1x + Lt+1 Ay + r t+1 )}2 x y ;

where

A − 1x +

√ √ t+1 = G  (et A − 1x + E t Ay + et ) x y ; √ √ 3t+1 = F  (lt+1 A − 1x + Lt+1 Ay + r t+1 ) x y

(75) (76) (77) (78) (79)

and the parameter A is de=ned by Eq. (33). We now investigate the generalization ability of the LNN by solving the obtained recursion relations (70)–(79) numerically by means of an iterative method after substituting r 1 = 1. We show behavior of the generalization error for the NMTFs with F = 1:220 and G = 1:330 and that for the MTF in the case of b = 0:2, 0:6 and 1 for a = 1 and s = 20 in Fig. 22. The generalization error after the inter-layer updates is de=ned by E = limt→∞ (1 − r t ) and the generalization error after the intra-layer updates

K. Katayama et al. / Physica A 317 (2003) 270 – 298

295

Fig. 26. Phase diagram of the generalization error in b– plane for the LNN with the intra-layer connections with the two di3erent NMTFs of the optimal thresholds.

is de=ned by E =limt→∞ (1−mt ). We see that the generalization ability for the NMTFs with F = 1:220 and G = 1:330 is improved in comparison with that for the MTF when the value of b is large, but is not when the value of b is small. In order to clarify this point, we show the optimal threshold at which the generalization ability becomes maximum as a function of the correlation b for the NMTFs in the case of a = 1 and s = 20 in Fig. 23. The value of the optimal threshold increases monotonically from F = 1:220 and G = 1:330 and diverges to ∞ as the value of b decreases. This means that the generalization ability is optimal for the LNN with the NMTFs with F = 1:220 and G = 1:330 when b is large, while the generalization ability for the MTF becomes optimal when b is small. In this way, it turns out that there exists the optimal threshold for the maximum of the generalization ability for the LNN with the NMTFs at each value of b. Hence, we show the generalization ability for the LNN with the NMTFs of the optimal threshold and that for the MTF at each value of b in Fig. 24. We =nd the improvement of the generalization ability for the range 0 ¡ b 6 1 when the optimal threshold is used for the two di3erent NMTFs for each value of b. We show a phase diagram of the generalization error in the b– plane for the LNN with the MTF in Fig. 25 and the one for the LNN with the NMTFs of the optimal threshold in Fig. 26.

296

K. Katayama et al. / Physica A 317 (2003) 270 – 298

Fig. 27. Behavior of the generalization ability as a function of the activity a for the LNN with the intra-layer connections with the two di3erent NMTFs of the optimal thresholds and for the LNN with the intra-layer connections with the MTF when b = 0:6 and s = 20.

The generalization ability for the LNN with the NMTFs of the optimal threshold becomes always larger than that for the LNN with the MTF, although the generalization error of the former is larger than that of the latter for each value of b. This behavior is similar to the relation between the storage capacity and the retrieval error. We show the generalization ability as a function of activity a for the LNN with the NMTFs of the optimal threshold and for the LNN with the MTF for b = 0:6 and s = 20 in Fig. 27. We =nd some improvement of the generalization ability for all range of 0:36 ¡ a 6 1 when introducing the optimal threshold for the NMTFs at each value of a. The dependence of the generalization ability on the number of the example patterns s is shown in Fig. 28 for the LNN with the NMTFs of the optimal threshold and for the LNN with the MTF at a = 1 and b = 0:6. We =nd some improvement of the generalization ability for all range of 20 6 s 6 1000 when introducing the optimal threshold for the two di3erent NMTFs for each value of s. 4. Concluding remarks In the present paper, we have investigated the storage capacity and the generalization ability for two types of the fully connected LNNs with the two di3erent NMTFs, where the random patterns are embedded into the networks by Hebbian learning rule. One of them is the LNN (LNN1) in which the NMTF of even layers is di3erent from that of

K. Katayama et al. / Physica A 317 (2003) 270 – 298

297

Fig. 28. Behavior of the generalization ability as a function of the number of the example number s for the LNN with the intra-layer connections with the two di3erent NMTFs of the optimal thresholds and for the LNN with the intra-layer connections with the MTF when a = 1 and b = 0:6.

odd layers. The other is the LNN (LNN2) with the intra-layer connections, in which the NMTF of inter-layer is di3erent from that of intra-layer, and the inter-layered neurons and the intra-layered neurons are updated alternately. We have derived the recursion relations for the order parameters for both LNNs by means of the signal-to-noise ratio method. We have clari=ed that the storage capacity for both LNNs with the two di3erent NMTFs is enhanced in comparison with that with the MTF when non-monotonicity of the transfer functions is selected optimally. The storage capacity is enhanced to C ∼ 0:299 for the LNN1 and C ∼ 0:275 for the LNN2; these values should be compared with C ∼ 0:269 and C ∼ 0:202 for the conventional MTF. We have also clari=ed that the generalization ability for both LNNs with the two di3erent NMTFs is enhanced in comparison with those with the MTF when non-monotonicity of the transfer functions is selected optimally and in particular the value of the correlation b is large. It has also been pointed out that some chaotic behavior appears in the order parameters as for both LNNs when non-monotonicity of the transfer functions increases.

Acknowledgements This work was partially supported by Grant-In-Aid for Scienti=c Research No. 13680383 from the Ministry of Education, Science and Culture, Japan.

298

K. Katayama et al. / Physica A 317 (2003) 270 – 298

References [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] [14] [15] [16] [17] [18] [19] [20] [21] [22] [23] [24] [25] [26] [27] [28] [29]

D.J. Amit, H. Gutfreund, H. Sompolinsky, Phys. Rev. A 32 (1985) 1007. D.J. Amit, H. Gutfreund, H. Sompolinsky, Phys. Rev. Lett. 55 (1985) 1530. D.J. Amit, H. Gutfreund, H. Sompolinsky, Ann. Phys. 173 (1987) 30. D.J. Amit, Modelling Brain Function, Cambridge University Press, Cambridge, 1989. T. Geszti, Physical Models of Neural Networks, World Scienti=c, Singapore, 1990. J. Hertz, A. Krogh, R. Palmer, Introduction to the Theory of Neural Computation, Addison-Wesley, Reading, MA, 1991. J.F. Fontanari, R. Meir, Phys. Rev. A 40 (1989) 2806. J.F. Fontanari, J. Phys. France 51 (1990) 2421. P.R. Krebs, W.K. Theumann, J. Phys. A 26 (1993) 3983. E. Domany, R. Meir, W. Kinzel, Europhys. Lett. 2 (1986) 175. E. Domany, R. Meir, Phys. Rev. Lett. 59 (1987) 359. R. Meir, E. Domany, Phys. Rev. A 37 (1988) 608. E. Domany, W. Kinzel, R. Meir, J. Phys. A 22 (1989) 2081. D.R.C. Dominguez, W.K. Theumann, J. Phys. A 29 (1996) 749. D.R.C. Dominguez, W.K. Theumann, J. Phys. A 30 (1997) 1403. A.C.C. Coolen, L. Viana, J. Phys. A 29 (1996) 7855. K. Katayama, T. Horiguchi, Physica A 297 (2001) 532. M. Morita, S. Yoshizawa, K. Nakano, Trans. IEICE J73-D-2 (2) (1990) 242. S. Yoshizawa, M. Morita, S. Amari, Neural Networks 6 (1993) 167. M. Shiino, T. Fukai, Phys. Rev. E 48 (1993) 867. J. Inoue, J. Phys. A 29 (1996) 4815. H. Nishimori, I. Opris, Neural Networks 6 (1993) 1061. K. Katayama, T. Horiguchi, J. Phys. Soc. Jpn. 69 (2000) 2816. K. Katayama, T. Horiguchi, J. Phys. Soc. Jpn. 70 (2001) 1300. J.J. Hop=eld, Proc. Natl. Acad. Sci., USA 79 (1982) 2554. D. Sherrington, S. Kirkpatrick, Phys. Rev. Lett. 35 (1975) 1792. S. Kirkpatrick, D. Sherrington, Phys. Rev. B 17 (1978) 4384. M. Mezard, G. Parisi, M.A. Virasoro, Spin Glass Theory and Beyond, World Scienti=c, Singapore, 1987. D.O. Hebb, The Organization of Behavior, Wiley, New York, 1949.