Physica A 339 (2004) 653 – 664
www.elsevier.com/locate/physa
Dynamic programming for optimal packet routing control using two neural networks Tsuyoshi Horiguchi∗ , Hideyuki Takahashi, Keisuke Hayashi, Chiaki Yamaguchi Department of Computer and Mathematical Sciences, Graduate School of Information Sciences, Tohoku University, 04, Aramaki-Aza-Aoba, Aoba-ku, Sendai, Miyagi 980-8579, Japan Received 26 January 2004
Abstract We propose a dynamic programming for optimal packet routing control using two neural networks within the framework of statistical physics. An energy function for each neural network is de.ned in order to express competition between a queue length and the shortest path of a packet to its destination node. We set a dynamics for the thermal average of the state of neuron in order to make the mean-.eld energy of each neural network decrease as a function of time. By computer simulations with discrete time steps, we show that the optimal control of packet 4ow is possible by using the proposed method, in which a goal-directed learning has been done for time-dependent environment for packets. c 2004 Elsevier B.V. All rights reserved. PACS: 05.90.+m; 05.50.+q; 07.05.Kf; 84.35.+i Keywords: Spin model; Computer network; Routing control; Packet 4ow; Goal-directed learning; Dynamic programming
1. Introduction Better control of packet 4ow on large-scale computer networks is needed in the recent internet society. There are routing control, tra=c control, congestion control, sequence control and so on as for the packet 4ow control. Among those kinds of control, the routing control is considered in the present paper. A computer network is assumed to consist of nodes, links and a process. A node is a host computer, a personal ∗
Corresponding author. Tel.: +81-22-217-5842; fax: +81-22-217-5851. E-mail address:
[email protected] (T. Horiguchi).
c 2004 Elsevier B.V. All rights reserved. 0378-4371/$ - see front matter doi:10.1016/j.physa.2004.03.064
654
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
computer, a work station or something like those. A link is a communication line. A process is a mathematical model for the network layer [1]. There are so many nodes in large-scale computer networks, and hence decentralized, autonomous and adaptive control of packet routing becomes very important in those large-scale computer networks. This situation is completely diFerent from that of small-scale computer networks, in which a central computer controls the packet 4ow for the whole computer network. A search of a suitable route (or path) for a packet to be sent from a source node to its destination node is one of the important issues for the control of the packet 4ow. However, .nding of the shortest path is not always the best solution. Next shortest paths may be found as a trade-oF with a queue length, a distance from the present node to a destination node of a packet and so on. There are many papers in which the control of the packet 4ow has been investigated by using neural networks [2– 5]. It is known that techniques developed in statistical physics are very useful for optimization problems formulated by using neural networks [6]. Then we have proposed in a previous paper [7] a neural network model for the routing control of packet 4ow in large-scale computer networks within the framework of statistical physics. There are several situations such that some links have higher reliability than other links as for sending packets, that some links have higher capacity than other links and/or that the ability of processing packets in some nodes is higher than other nodes. In those situations, it is better to avoid using those links with lower reliability and/or with lower capacity for sending packets, and to avoid sending packets to those nodes with lower ability of packet processing. Hence, we have also proposed a neural network model for the routing control of packet 4ow with the priority links [8,9]. In large-scale computer networks, there is another situation in which nodes, where many packets concentrate, may change as time goes on. Namely, time-dependent environment for packets has to be taken into account for optimal packet 4ow control. For this aim, we propose a dynamic programming for a goal-directed learning by using two neural networks in the present paper. In Section 2, we de.ne two neural networks for routing control of packet 4ow within the framework of statistical physics. We use a mean .eld approximation for soft control and propose dynamic programming for a goal-directed learning as for optimal packet routing control. We present some of the results obtained by numerical simulations in Section 3. Concluding remarks are given in Section 4. 2. Two neural networks for optimal packet routing control We de.ne two neural networks for optimal packet routing control in a decentralized, autonomous and adaptive way by dynamic programming. One of the neural networks is used for a communication control neural network (CCNN) and the other is an auxiliary neural network (ANN) used for a goal-directed learning in the CCNN. When nodes concentrated by packets change time-dependently, namely under the time-dependent environment of packets, the optimal packet routing control by the goal-directed learning is important. Hence, in the present paper, we propose the goal-directed learning, which
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
655
Fig. 1. An example of a computer network with 25 nodes. Node 13 is assumed to be the node where packets concentrate.
is performed by changing connection weights in the CCNN with assistance of the ANN, according to the time-dependent environment for packets. We .rst explain a computer network. The computer network is assumed to consist of nodes, links and a process. We consider that there are N nodes. At each node, we put two neural networks, namely the CCNN and the ANN. We use a neural network with Ising neurons for the CCNN given in Ref. [9] and a neural network with lattice gas neurons for the ANN given in Refs. [7,8]. We assume that neuron ik of the CCNN at node i controls to send a packet from node i to node k, only when the neuron ik .res. When there is a neuron ik at node i, then there is a neuron ki at node k. We assume that there is a link between node i and node k through neurons ik and ki ; each link is assumed to be full duplex. We apply the .rst-in-.rst-out (FIFO) rule for packets if one of the neurons .res at a node. An example of computer networks is shown in Fig. 1. We put Ni Ising neurons {ik ∈ {±1}|k ∈ {k1i ; k2i ; : : : ; kNi i }; i = k} at node i for the CCNN and Ni lattice gas neurons {sik ∈ {1; 0}|k ∈ {k1i ; k2i ; : : : ; kNi i }; i = k} at node i for the ANN, where kji corresponds to a node number of jth node connected to node i. In each neural network at a node, we assume that a neuron is connected fully with other neurons within the same node. In Fig. 2, we show a schematic illustration of the two neural networks at a node with 8 links for the computer network given in Fig. 1. By considering a trade-oF between the queue length at each node and the shortest path of a packet to its destination node, and by taking into account of priority of some links over other links, we de.ne an energy function for each neural network.
656
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
Fig. 2. Schematic illustration of two neural networks in a node with 8 links in the computer network as shown in Fig. 1.
We de.ne the energy function for the CCNN as follows: E =
N
N
N
i=1
k
l
i=1
l
i i 1
Jik; il ik il 2
Nl Ni N 1 1 1 1 ( + 1) 1 − ql + − (jl + 1) 2 il bl 2 2 − (1 − )
j=i
Ni N i=1
l
d 1− l dc
N 2 N i 1 1 (il + 1) + (il + 1) − 1 ; 2 2 i=1
l
(1) Jik; il
is a connection weight between Ising neurons ik and where ik ∈ {−1; 1} and il . We assume that Jik; ∈ {−1; 0; 1}, Jik; il il = Jil; ik and Jik; ik = 0 in the present paper. Here bl is a buFer size at node l, dc a constant related to a characteristic path-length of the computer network, ql a queue length at node l, dl the shortest distance of a packet, ready to go out from node l to its destination node. Three control parameters are denoted by , and . In order to give priority to a link which has higher reliability and/or higher capacity, and/or to a link which is connected to a node with higher ability of processing packets, we introduce a concept of priority link [8]. Namely, we assume that a node has some links with the priority than other links for sending packets. We wish to implement this idea in the connection weights between Ising neurons at each node. In order to do this, we introduce two kinds of links, one kind of link with priority and the other kind of link without priority; Lp , a set of links with priority, and Lq , a set of links pq without priority. Then we express connection weight Jik; ; Jipp , and il at node i as Ji qq Ji according to the case in which Ising neuron ik or il is connected to a link belonging to Lp and the other to a link belonging to Lq , to the case in which Ising neurons ik and il are connected to a link belonging to Lp , and to the case in which
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
657
Ising neurons ik and il are connected to a link belonging to Lq , respectively. Thus we classify connection weights into three kinds of connection weights, Jipq ; Jipp and Jiqq , as stated above. For simplicity, we assume Jipq = 0; Jipp = 1, and Jiqq = −1. The energy function for the ANN is de.ned as follows: N
N
N
k
l
i i 1 s Jik; Es = il sik sil 2 i
Ni N N 1 1 1 − qls + sjl sil − 2 bl i j=i
l
− (1 − )
Ni N i
l
ds 1− l dc
sil +
N Nl i i
2 sil − 1
;
(2)
l
s where sil ∈ {0; 1} and Jik; il is a connection weight between lattice gas neurons sik and s s s s sil ; we assume that Jik; = 1, Jik; il il = Jil; ik and Jik; il = 0 for k = l [7]. Here we assume that bl and dc in Eq. (2) are the same as those in Eq. (1); qls and dsl in Eq. (2) might be diFerent from ql and dl in Eq. (1) in order to take into account the time-dependent environment of packets. Two control parameters and in Eq. (2) are assumed to be the same values as those in Eq. (1) just for simplicity. In order to introduce soft control of the packet 4ow, the framework of statistical physics is useful. We use a mean-.eld approximation in the present paper. The energy given by Eq. (1) for the CCNN becomes as follows [9] by using a mean-.eld approximation:
Emf =
N
N
N
k
l
i i 1
Jik; il vik vil 2 i
Nl Ni N 1 1 1 1 1 − ql + − (vjl + 1) (v + 1) 2 il bl 2 2 i j=i
l
− (1 − )
Ni N i
l
d 1− l dc
N 2 N i 1 1 (vil + 1) + (vil + 1) − 1 : 2 2 i l
(3) Here vil is calculated by vil = il = tanh hil
(4)
in terms of an internal eFective .eld hil , where = 1=kT as usual and T is the absolute temperature in the statistical physics and expresses a parameter for soft control in the present system. The energy, Eq. (2), for the ANN is given as follows by using a
658
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
mean-.eld approximation [8]: Esmf
Nl Ni Ni Ni N N 1 1 1 s qls + 1− ujl uil Jik; = il uik uil − bl 2 2 i i k
l
− (1 − )
Ni N i
j=i
l
l
d 1− l dc
uil +
N
Ni
i
2 uil − 1
:
(5)
l
Here uil is calculated by uil = sil =
1 1 + exp{−hsil }
(6)
in terms of an internal eFective .eld hsil . 3. Numerical simulations by dynamic programming We assume that states of Ising neurons depend on time t and hence hil , vil and are a function of t, respectively. We derive an equation for the dynamics by assuming that the energy Emf (t) decreases as a function of time t. Then it is reasonable to assume that a time dependence of the internal eFective .eld hil (t) is given as follows [9]: Nl Ni d 1 1 ql + Jik; 1− hil (t) = −
(vjl (t) + 1) il vik (t) + dt bl 2 Emf
j=i
k
dl + (1 − ) 1 − dc
− 2
Ni 1 k
2
(vik (t) + 1) − 1
;
(7)
and a time dependence of the internal eFective .eld hsil (t) is given as follows [7]: Nl Ni d s 1 s qls + hil (t) = − Jik; 1− ujl (t) il uik (t) + dt bl j=i
k
dl + (1 − ) 1 − dc
− 2
Ni
uik (t) − 1
:
(8)
k
In numerical simulations, we use a discrete time renewal process for Eqs. (7) and (8) in the calculation of the internal eFective .elds and then the average values of states of neurons are obtained by using Eqs. (4) and (6), after some iteration steps. Namely we approximate the lhs of Eq. (7) and the lhs of Eq. (8) as follows: d h (t) ∼ hil (t + 1) − hil (t) ; dt il
(9)
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
659
Fig. 3. Time table for running the two neural networks for the dynamic programming of packet 4ow. T is a unit of running time for the CCNN and T s the one for the ANN.
d s h (t) ∼ hsil (t + 1) − hsil (t) ; (10) dt il respectively. We determine whether Ising neuron il .res by using threshold for the thermal average of Ising neuron vil . Namely, Ising neuron il takes 1 if vil ¿ and does −1 if vil ¡ , where we use = 0:8. When Ising neuron il .res, a packet ready to go out at node i is sent to node l. We also determine whether lattice gas neuron sil .res by using threshold s for the thermal average of lattice gas neuron uil . Namely, lattice gas neuron sil takes 1 if uil ¿ s and takes 0 if uil ¡ s , where we use s = 0:9; the lattice gas neurons do not participate in the sending of packets directly even if those neurons .re, but do the goal-directed learning for the CCNN. In order to introduce a dynamic programming, we assume that Jik; il is a function mf of Es in a sense that a value of Jik; il changes due to the information stored in the ANN according to the time-dependent environment for packets. We use a time-table in dynamic programming for eFective usage of the neural networks in order to have the adaptive optimal packet routing control. We show a part of the running time-table for each neural network in Fig. 3. T is a unit of running time for the CCNN and T s the one for the ANN. In order to implement a goal-directed learning, we assume that the CCNN receives information from the ANN in order to perform the adaptive optimal routing control of packet 4ow, according to the time-dependent environment for packets. Namely, we change the connection weights in the CCNN according to the data accumulated in the ANN during T s . In order to do this, we assign parameter aik for each neuron sik at all the nodes i in the ANN. As initial values of parameters, s we assume Jik; il = Jik; il = 1 for all the neurons ik and il in the CCNN and for all the neurons sik and sil in the ANN, qis ← qi and dsi ← di for all the nodes i, and aik ← 0 for all the neurons sik . The CCNN sends packets after each T time step. However, the ANN runs in the same way as the CCNN but does not participate for sending packets. Instead, the ANN just stores information about the packet 4ow done during the time step T s in order to achieve the goal-directed learning. Namely, we have asil ← asil + 1 for neuron sil .red at each time step T s , where neuron sil is selected according to neuron number l = arg maxj {vijs ¿ 0:9}. At each time step T s , we choose
660
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
the gi neurons sij according to the largest gi values in {asij }, chosen by starting from the largest value up to gi th value; those neurons {sij } at node i is labeled by elements in !i = {j1 ; j2 ; : : : ; jgi }, namely {sik |k ∈ !i }. If we could not choose such gi values, we select those by a random number; this could happen in the .rst few T s time steps. We pp = 1 for k; l ∈ !i , change the values of connection weights in the CCNN as Jik; il = Ji qq pq Jik; il = Ji = −1 for k; l ∈ !i , and Jik; il = Ji = 0 for k ∈ !i ; l ∈ !i or k ∈ !i ; l ∈ !i . Then we put qis ← qi and dsi ← di for all the nodes i, and repeat the procedure stated above. Namely, we have the following algorithm for dynamic programming: s (1) Jik; il = 1 for all the neurons ik and il in the CCNN and Jik; il = 1 for all the s s neurons sik and sil in the ANN, qi ← qi and di ← di for all the nodes i, and aik ← 0 for all the neurons sik . (2) The CCNN sends packets after each T time step. The ANN stores information about the packet 4ow done during the time step T s . (3) asil ← asil +1 for neuron sil .red at each time step T s , where neuron sil is selected according to neuron number l = arg maxj {vijs ¿ 0:9}. (4) At each time step T s , we choose the gi neurons sij according to the largest gi values in {asij } and those neurons {sij } at node i is labeled by elements in !i = {j1 ; j2 ; : : : ; jgi }, namely {sik |k ∈ !i }. We change the values of connection weights for the CCNN as Jik; il =1 for k; l ∈ !i , Jik; il =−1 for k; l ∈ !i , and Jik; il =0 for k ∈ !i ; l ∈ !i or k ∈ !i ; l ∈ !i . (5) We put qis ← qi and dsi ← di for all the nodes i. (6) Go to (2).
The numerical simulations have been done for the computer network with 25 nodes shown in Fig. 1. The parameters are then chosen as N = 25, dc = 4, = 0:6, = 0:7, = 0:1, = 2:0. We assume bi = 50, gi = 2 for all the nodes i. We have made numerical simulations for the two cases. One of the cases is the one in which a destination node of a packet is selected from uniformly distributed random numbers. The other case is the one in which node 13 is the special node chosen as the destination node for packets. In the latter case, we have made numerical simulations for several values of the rate, r, at which packets concentrate to node 13 and give the results for r= 10%, 30%, 50%, 70% and 90%; note that r= 4% corresponds to the case for a random selection of destination nodes. We have performed the numerical simulations for T = 30; T s = 30, and for T = 10; T s = 30. We de.ne an average number of packets, Np and an average number of packets arrived at their destinations, A, as follow: N 1 Na Np = qi ; A = ; (11) N i Ng where Na is the total number of packets which have arrived at their destinations and Ng the number of packets generated in the computer network. We note that Na corresponds to the throughput. We give the obtained results for A and Na by using T = 30 and T s = 30 in Figs. 4 and 5 and those for A and Na by using T = 10 and T s = 30 in Figs. 6 and 7. The total number of iteration is .xed to 1500 times. We see that the average number of
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
661
Fig. 4. The average number of packets arrived at their destinations for the case of T = 30 and T s = 30. The symbols shown at the top of .gure indicate the rate of packets concentrating to the node 13.
Fig. 5. The total number of packets arrived at their destinations for the case of T = 30 and T s = 30. See the .gure caption in Fig. 4 for the symbols.
662
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
Fig. 6. The average number of packets arrived at their destinations for the case of T = 10 and T s = 30. See the .gure caption in Fig. 4 for the symbols.
packets arrived at their destinations, A, in Fig. 6 is much larger than that in Fig. 4. This means that the optimal control of packet 4ow has been performed by using the present goal-directed learning system. This is more evident if we compare the results for the throughput, namely Na , given in Figs. 5 and 7. By using the present dynamic programming algorithm, we have obtained two times or more larger values for Na in the present system than those in the previous system, in which the goal-directed learning is not considered. Thus we can see that the learning eFect works well and makes the throughput increase. We also have performed numerical simulations for computer network with irregular arrangement of nodes and found similar results presented above for the computer network given in Fig. 1. We have also calculated an average number of .red nodes, Mr , which corresponds to the rate of movement of packets, and the average number of packets discarded without arriving at their destinations, D, which are de.ned as follow: Mr =
Nf ; Nt
D=
Ndp ; Ng
(12)
respectively, where Nf is the total number of nodes in which neurons have .red actually in the CCNN, Nt the total number of nodes where there exists at least one packet, Ndp the number of packets discarded without arriving at their destinations and Ng the number of packets generated in the computer network. Although we did not give the results for Mr and D, the results for Mr and D do not show much diFerence from
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
663
Fig. 7. The total number of packets arrived at their destinations for the case of T = 10 and T s = 30. See the .gure caption in Fig. 4 for the symbols.
those calculated in the previous papers [7,8]; actually we have that Mr is almost 1 and D almost zero except when Np is very close to the buFer size. 4. Concluding remarks In the present paper, we have proposed a system with two neural networks in order to implement a goal-directed learning for optimal packet routing control. One of the two neural networks is the communication control neural network with Ising neurons and the other is the auxiliary neural network with lattice gas neurons. We have implemented the goal-directed learning according to the time-dependent environment for packets by using the auxiliary neural network; namely, we have proposed a dynamic programming algorithm for this aim. We have performed the numerical simulations for a computer network by the regular arrangement of nodes on the square lattice. We have found that the proposed goal-directed learning works well and makes the throughput increase. We have also performed the simulations for the computer network by an irregular arrangement of nodes and con.rmed that the conclusion does not change. As for a future problem, it is possible to use other energy function, for example, the one with Potts spins [10]. An investigation of the packet 4ow on a scale-free network is also considered as one of the future problems [11]. We wish to investigate a system with tra=c congestion by a setting of packet 4ow diFerent from the setting in the present paper, as another future problem.
664
T. Horiguchi et al. / Physica A 339 (2004) 653 – 664
Acknowledgements We are grateful to K. Tanaka, K. Katayama and T. Omori for their valuable discussions and for S. Ishioka for his preliminary contribution to this work. This work was partially supported by Grant-in-Aid for Scienti.c Research No. 14084202 from MEXT of Japan and also by the Ishida foundation. References [1] A.S. Tanenbaum, Computer Networks, 3rd Edition, Prentice-Hall, Englewood CliFs, NJ, 1998. [2] H.E. Rauch, T. Winarske, IEEE Control System Mag. (1988) 26. [3] I. Iida, A. Chungo, R. Yatsuboshi, Proceedings of the IEEE International Conference on SMC, 1989, p. 194. [4] M.K. Mehmet Ali, F. Kamoun, IEEE Trans. Neural Networks 4 (1993) 941. [5] H. Kurokawa, C.Y. Ho, S. Mori, Neural Networks 11 (1998) 347. [6] J. Hertz, A. Krogh, R. G. Palmer, Introduction to the Theory of Neural Computations, Addison-Wesley, Redwood City, CA, 1991. [7] T. Horiguchi, S. Ishioka, Physica A 297 (2001) 521. [8] T. Horiguchi, H. Takahashi, Proceedings of 10th International Conference on Neural Information Processing, 2003, p. 358. [9] T. Horiguchi, H. Takahashi, K. Hayashi, C. Yamaguchi, Proceedings of 2003 Joint Workshop of Hayashibara Foudation and SMAPIP, 2003, p. 115. [10] C. Peterson, B. SOoderberg, in: M.A. Arbib (Ed.), Optimization, The Handbook of Brain Theory and Neural Networks, 2nd Edition, The MIT Press, Cambridge, MA, 2002, pp. 822–827. [11] R. Albert, A.-L. BarabQasi, Rev. Mod. Phys. 74 (2002) 47.