Control Engineering Practice 19 (2011) 1287–1296
Contents lists available at ScienceDirect
Control Engineering Practice journal homepage: www.elsevier.com/locate/conengprac
Network topology design Tomas Fencl , Pavel Burget, Jan Bilek Department of Control Engineering, Faculty of Electrical Engineering, Czech Technical University in Prague, Czech Republic
a r t i c l e i n f o
abstract
Article history: Received 31 May 2010 Accepted 2 July 2011 Available online 23 July 2011
The application of Industrial Ethernet brings up issues of reliability and behaviour of the networks under a demanded load, while preserving the permitted time delay in the whole network. An algorithm that designs network topology, which meets the required fault tolerance and allows reliable communication between parts of the control system is proposed. It is an iterative genetic algorithm designing the physical topology and verifying the behaviour under the expected load and time delay in the whole network. It is necessary to propose chromosome representation and required genetic operators as well. The results of the proposed algorithm show that it is suitable especially for the design of control network topologies, which has been its main aim. According to the authors’ knowledge there are no other algorithms fulfilling the requirements to such an extent as the proposed algorithm does. & 2011 Elsevier Ltd. All rights reserved.
Keywords: Networks Network reliability Genetic algorithms Network topologies Fault tolerance
1. Introduction The application of distributed control systems brings cost savings and the possibility to control geographically vast technologies. The important part of a distributed control system is the network and its topology. The network topology has a direct influence on the behaviour of the control system. A poorly designed network can limit the data bandwidth and consequently cause late data delivery. The network should meet a basic request for the on-time data delivery. The network can transmit safety data (e.g. pressure in tank, temperature, etc.) and its late delivery can cause a malfunction of the controlled technology since, as a consequence, the control system can decide wrongly. Moreover, there is a possibility that the communication link can be interrupted and, therefore, the network should be tolerant to the communication link interruption. If the network is not tolerant to the link interruption, the data transfer can be stopped by only one disrupted communication link. In such a case the control system cannot control the technology and consequently the technology is stopped, bringing financial losses. Thus, an algorithm that is able to design a fault-tolerant network with different levels of fault tolerance throughout the network is proposed in this paper. Such a designed network allows data transfer within the maximum permitted time. 1.1. Outline and objectives Since known algorithms are not sufficient for the design of the network in control engineering, a new algorithm, which is able to Correspondance to: Department of Control Engineering, Faculty of Electrical Engineering, Czech Technical University in Prague, Technicka´ 2, 166 27 Prague 6, Czech Republic. Tel.: þ420 728 669 281; fax: þ 420 224 918 646. E-mail address:
[email protected] (T. Fencl).
0967-0661/$ - see front matter & 2011 Elsevier Ltd. All rights reserved. doi:10.1016/j.conengprac.2011.07.001
design network topologies with different levels of fault tolerance in different parts of the network, is presented in this paper. The designed network keeps the constraints on timeliness of data delivery, i.e. fulfills the real-time requirements of the abovelaying control application. As the acquisition costs of the technology are more and more important, the costs of the designed network are kept as small as possible while fulfilling the demands on the fault tolerance and data delivery in time. This paper is an extension of Fencl, Burget, and Bilek (2008) which is able to design the network with different levels of fault tolerance while the network ensures that it has the requested behaviour in the worst-case scenario. Nevertheless, the algorithm contains unnecessary steps during the verification of the worstcase scenario and does not describe the network behaviour in a realistic way. Therefore, the model of the network’s common behaviour is proposed and its verification in OPNET environment is described. The paper is organised as follows. Related work is described in Section 2. The meaning of the variables and network topology is explained in Section 3.1. The description of the network topology representation as well as the design of the fault-tolerant physical topology with the help of a genetic algorithm is in Section 3.2. Section 3.3.1 shows the method of an average bit delay calculation which is used in Section 3.4 during the design of the logical topology. This design allows the verification of whether the physical topology allows data delivery in the demanded time and thanks to that, it is able to find out whether there will be any bottleneck for the data used and required by the above-laying control application. Section 4 puts together the algorithms from previous sections and depicts the whole complex algorithm, which designs an inexpensive fault-tolerant network for the control engineering. Section 4.2 describes the time complexity of the algorithm. Sections 5.1 and 5.2 compare the results gained
1288
T. Fencl et al. / Control Engineering Practice 19 (2011) 1287–1296
by the proposed algorithm and other existing algorithms. Delay calculation is also verified against the simulation results. Section 6 concludes the paper.
2. Related work There are many algorithms for the network topology design available. Nevertheless, most of the published algorithms are devoted only to a part of the network topology design issue. Some of the algorithms only design a physical topology of the network (Altiparmak & Dengiz, 2009) but do not solve the issue of the in-time data delivery. Other algorithms solve the issue of scheduling data frames in the real-time networks (Hanzalek, Burget, & Sucha, 2010). Other algorithms solve the issue of the optimum routing in a product network under constrained costs (Lin & Yeh, 2010). The algorithm described in Lin and Yeh (2010) finds a routing that provides the highest reliability of product delivery (data, electricity, etc.). The network is expected to be unreliable and the transmission budget is constrained. The algorithm by Lin (2010) finds the routing with the constrained time and budget but both algorithms (Lin, 2010; Lin & Yeh, 2010) are not able to design the physical topology and use already existing topology, for which they find the optimum solution from the point of view of the delivery probability. Another algorithm by Sun, Sung, Krothapalli, and Rao (2010) evaluates a behaviour of the existing network and proposes the optimisation of VLAN networks in order to minimise the broadcast messages among the VLANs and consequently decrease the load in the network. The algorithm works with an already existing network, in which the physical structure is not possible to be changed but the setting of the network is possible to be changed. The algorithm described in Misra, Guoliang, and Yang (2009) solves the important issue of the design of traffic routing among multiple paths with bandwidth and delay constraints. This issue arose with the Video on Demand or VoIP (Voice over IP) technology. Other algorithms evaluate the reliability of the existing networks (Altiparmak, Dengiz, & Smith, 2009; Hayashi & Abe, 2008). Any of the above mentioned algorithms does not solve the network topology design issue completely but they are focused only on one part of the issue. Even if the solved part is important it does not allow it to have an influence on the other part of the network topology design and these algorithms have an disadvantage in comparison to the algorithms that solve the network topology design issue completely such as Szlachcic and Mlynek (2009), Szlachcic (2006), Han, Malan, and Jahanian (2002), Cheng (1998), Kumar, Narang, and Ravikumar (1998), Ko, Tang, Chan, Man, and Kwong (1997), or Jan, Hwang, and Chen (1993), which are aimed towards the design of computer networks. Computer networks are used in various areas of human life but the basic requirements in all these areas are the same: a certain degree of reliability, a constrained average delay of data delivery and, of course, minimum acquisition costs. These demands for reliability, delay and acquisition costs are contrary to each other. Thus, it is impossible to fulfill them all at the same time. There are two common possibilities of how to design a computer network. The first one is to design a highly reliable network regardless of the acquisition costs, the other one is to design the network as reliable as possible with constrained acquisition costs. However, either of these possibilities cannot satisfy both the demands for the reliability and costs at the same time. Therefore, multiobjective optimisation is used for the network topology design problem (Wang & Chang, 2008). Multiobjective optimisation finds the Pareto optimum solution. Unfortunately, such an optimisation cannot always ensure the optimum solution to the network design problem. It is mostly dependent on the weight of the
optimisation criterion (Coello, 2000), which means that the optimisation could find an inexpensive network, which is unfortunately not reliable enough, but can still fulfil the optimisation criterion. Therefore, it is necessary to use a different approach. In industrial applications reliability is more important than the price. The first possibility, which keeps the acquisition costs as small as possible, has been chosen, since the application is expected to be used in industry. At the same time, it was necessary to ensure the maximum permitted delay of data delivery. Control networks must satisfy much stricter constraints, e.g. a different degree of tolerance to the link interruption in different parts of the network (due to the different probability of an interruption in the different network parts and the different importance of various parts of the network). An average delay of the data delivery can hardly be used as a measure of the network’s ability for real-time control systems because it does not ensure the maximum delay for all data flow as the worst case. Late delivery of data can cause serious issues (malfunction of the control system and, as a consequence, malfunction of the controlled system). Hence, the design of a control network must employ different approaches than that in the design of a computer network. Algorithms that ensure the average delay and the same minimum degree of fault tolerance in the whole network were proposed, e.g., in Szlachcic and Mlynek (2009), Szlachcic (2006) or Ko et al. (1997) for computer networks. These algorithms were developed to be quite fast and efficient and employ genetic algorithms. One can use them if the same minimum degree of reliability in the whole network is required and the average delay of the data delivery is important. Another possibility is to use algorithms that do not focus on reliability, but on other attributes of the network such as price, traffic delay or load maximisation as for example (Dutta & Mitra, 1993; Elbaum & Sidi, 1996; Han et al., 2002; Kumar et al., 1998; Thompson & Gilbro, 1998). However, this possibility is not an option since network applications in control engineering are expected and there are different requirements, as mentioned above. The reliability of the network relates to the tolerance of the link interruption. The fault tolerance can be described as the network connectivity (edge connectivity). The edge connectivity says that there is a minimum number of independent paths equal to k for every pair of nodes in the whole network, i.e. it is possible to interrupt k 1 edges and the network is still connected (Diestel, 2005). This network attribute is useful for the design of the reliable network. However, care must be taken when using k-connectivity as a measure of the network reliability. Some authors use the node degree instead of the edge connectivity, which is an insufficient condition for the measure of reliability. For example, Cheng (1998) introduced the application of k-connectivity as a measurement of reliability. It is assumed in Cheng (1998) that it is possible to substitute the edge connectivity for the node degree by the following equation: degðVi Þ Z k
for 8V,
ð1Þ
where V is the node set of the network, degðVi Þ is the node degree (the number of adjacent communication links of the node Vi). This is an essential but not sufficient condition for a reliable network. One can think that (1) ensures k-connectivity, which is not true in certain cases. It is possible to design a similar network as in Fig. 1. The network in the figure corresponds to (1) but definitely does not fulfill demands for 2-edge connectivity. If the edge between nodes V3 and V5 is interrupted, the original network is disconnected into two components that are not able to communicate between each other. Therefore, the network does not fulfill the demands for edge connectivity and, thus, (1) cannot be used
T. Fencl et al. / Control Engineering Practice 19 (2011) 1287–1296
V1
V3
V5
V6
V2
V4
V7
V8
Fig. 1. Mesh network.
as a measure for fault tolerance. If the node degree as a verification of connectivity is used, it is necessary to use another condition instead of (1): degðVi Þ ¼ h
for 8V
where h A fk,kþ 1,kþ 2, . . .g:
ð2Þ
Eq. (2) ensures that the network is really k connected and consequently also k 1 fault tolerant. Eq. (1) would be possible to use only if the genetic operators made only permissible solutions. Even if the genetic operators described in Cheng (1998) do a great job preventing impermissible solutions, they may produce solutions similar to that in Fig. 1. The other way to verify the edge connectivity is an application of the Monte Carlo method as shown in Liu and Iwamura (2000) or Wang and Chang (2008). This method randomly tries to interrupt edges of the network and verifies whether the network is still connected. These stochastically independent tests allow one to compute probability of the network disconnection. Methods based on the Monte Carlo algorithm are very useful for reliability verification of already built networks. However, they need the knowledge of the reliability of every communication link in the whole network and it is the main drawback of these Monte Carlo methods. Usually, sufficient data to compute the link reliability of an existing network is available but the link reliability of a network, which is planned to be built, is only possible to estimate. The estimation of the link reliability can be based on the knowledge of similar networks but it is still only a ‘‘good’’ estimation, not real knowledge. However, a huge advantage of the Monte Carlo method is the easy application of the method for really complex networks if the link reliability is known and if the network interconnection can be verified. Then, it is possible to calculate the overall network reliability. However, these methods are not able to provide 100% proof of fault tolerance of the network because the results are strictly related to the number of statistical independent tests of the network topology. Even if the results say that the network is fault tolerant to a certain number of link interruptions it just means that the results were gained with certain statistical significance but they are not able to ensure real fault tolerance because there still could be a configuration of the link interruption which would lead to the network disconnection. Other possibilities also exist to verify the fault tolerance in another way than the application of the node degree or the Monte Carlo method. It is possible to verify the fault tolerance using the number of independent spanning trees as described in Zili, Nenghai, and Zheng (2008). Algorithms described there are able to ensure the same fault tolerance in the whole network. Nevertheless, the evaluation function used causes the possibility that the network may not have a capacity big enough since there is no limitation on the maximum load of the link (the designed load can be bigger than the link capacity). Thus, the designed network cannot ensure that the transferred data is delivered on time. It is also possible to find the number of paths between nodes by applying the methods of the graph theory (Diestel, 2005) and particularly the adjacency matrix. If P is the adjacency matrix then the elements of matrix Pk correspond to the number of paths
1289
of length k between nodes i and j. Unfortunately, this method is not able to verify whether the paths are independent and, therefore, algorithms, which use this method, are not able to ensure fault tolerance of the network at all. In a real situation, it is possible to expect that the network is possible to be split up into several parts that have different importance. The network can provide data for the public access as well as data important from safety reasons or real-time data. Then, it is possible to request different levels of fault tolerance in different parts of the network. Algorithms that are not able to design the network with the different levels of fault tolerance, such as Szlachcic and Mlynek (2009), Szlachcic (2006), Cheng (1998), and Wang and Chang (2008), must design the network with the highest demanded fault tolerance. On the other hand, the presented algorithm that is able to design the network with different levels of fault tolerance designs a less expensive network than the others do. It is very difficult to satisfy the demands for the different number of independent paths between the nodes, the maximum delay in each communication path and the minimum acquisition costs at the same time; nowadays algorithms such as Szlachcic and Mlynek (2009), Szlachcic (2006), Cheng (1998), Wang and Chang (2008) or Liu and Iwamura (2000) are not able to ensure the fulfillment of all these demands at the same time and thus they are not satisfactory for applications in control engineering. This task belongs to NP-complete problems and, therefore, a topological iterative genetic algorithm, which ensures the design of a reliable network (a redundant communication path between the nodes) with the lowest possible acquisition costs as is possible, is used.
3. Network topology design 3.1. Problem formulation Let G ¼ (V, E) be the network supposed to be designed, with V being the set of nodes and E being the set of communication links. To design the physical topology, it is necessary to know matrix of redundancy M as it describes how many independent paths are demanded between each pair of nodes, and matrix of prices C containing the acquisition costs. Adjacency matrix P is the result of the physical topology design and in fact it is set E of communication links. It implies that: ci,j ¼ cj,i ;
8i,j A f1, . . . ,Ng,
ð3Þ
where N is the number of nodes. The parameters, which are taken as the input for the logical topology design and, in fact, are used to verify whether the maximum permitted delay is not exceeded in any part of the network, are as follows: P – adjacency matrix; F – matrix of data flows (describing the amount of data between the nodes); D – the matrix of the maximum permitted delays (the sensitivity of the control algorithm in every node to the late data delivery is known, thanks to that it is possible to estimate the maximum permissible delay of the data delivery); if there are no demands on the delay, the corresponding element is infinity. The matrix of data flows is gained from an estimation of the network traffic during the design of the control algorithm. The data, which every node needs to gain from the other nodes, and the data update rate are the requirements of the above-laying control application. The capacity of the network links is given at the start of the design and it is the same for each communication link in the network. Such an assumption is realistic and puts no constraints on the applicability of the results.
1290
T. Fencl et al. / Control Engineering Practice 19 (2011) 1287–1296
3.2. Design of physical topology The design of the network topology with respect to aspects of different fault tolerance and minimum network cost belongs to the NP-complete problems. Therefore, a genetic algorithm for this purpose is used. The genetic algorithm does not ensure the identification of the optimum solution (it mostly finds ‘‘only’’ the near-optimum solution), but it has a great advantage of searching quickly through a vast amount of solutions and applying several constraints easily. The speed of the genetic algorithm is given by the genetic operators: mutation and crossover. The genetic algorithm works like evolution in nature. Mutation and crossover change the actual population and selection determines, which offspring or individuals of the actual generation will be in the next generation. Selection utilises chromosome evaluation that describes the quality of the chromosomes. The evaluation of the chromosomes is obtained thanks to a fitness function. 3.2.1. Chromosome representation The chromosome that describes the network topology is a vector i with l elements i ¼ fij : ij A f0,1g, j ¼ 1, . . . ,lg:
ð4Þ
l is given by l¼
NðN1Þ , 2
ð5Þ
where N is the number of nodes in the network. The vector characterises the upper triangular part of the adjacency matrix P that is symmetrical (Fig. 2). The fact that P is symmetrical arises from the assumption of a full-duplex communication where one communication link permits communication in both directions. 3.2.2. Genetic functions The fitness function for the design of the physical topology is the price of the whole network. The price includes all acquisition costs and possible penalties. The price of the network is the sum of all elements of matrix CP: CP ¼ C4P,
ð6Þ
where 4 is ‘‘logical AND’’ (it means: cpi,j ¼ ci,j if pi,j ¼ 1; or cpi,j ¼ 0 if pi,j ¼ 0). The final price of the physical topology is given by c¼
N X
where cmax is the maximum element of the matrix of the acquisition costs C. In other cases Pen ¼0. Eq. (8) ensures that the chromosome, which is penalised, has a bigger price than the permissible chromosomes. The chromosomes are penalised if they do not correspond to the demanded number of redundant communication paths for every node pair or if they are the same as one of the former inappropriate solutions. The former inappropriate solutions, which are also called inappropriate topologies, are stored in an accumulator of inappropriate topologies. The inappropriate topologies are such topologies in which the delay is higher than the permitted value. Those solutions are obtained in the previous iterations of the algorithm (the whole algorithm is described in Section 4). Moreover, the accumulator may also contain topologies that are inappropriate, for example, due to a limitation of the environment and are entered by the operator at the beginning the computations. It is necessary to verify the network interconnection and the number of independent paths between the nodes before applying the penalty. Eq. (9) is used as a fast approximate test of the network interconnection. It speeds up the evaluation because it is not necessary to try to find more independent paths if some node is not connected to the network. If the network satisfies (9) the Ford–Fulkerson algorithm (Cormen, Leiserson, Rivest, & Stein, 2001) is used to verify the number of independent paths and the network interconnection. degðvÞ Z 1;
8v A V:
ð9Þ
One-bit mutation (a mutated chromosome is chosen at random according to the mutation probability and a bit is also chosen at random) and one point crossover (two chromosomes are crossed at the same place, the offspring consists of part of both parents as depicted in Fig. 3) are applied. The new offspring replaces the parents only if it has better quality. A tournament selection is used (two chromosomes selected at random are compared and the better one is chosen for the next generation). The genetic algorithm is stopped after 100 generations. The limitation of 100 iterations was chosen empirically based on the number of various network topologies. To be able to compare the results of the individual computations, especially computation time and optimality, the number of iterations has always been the same when performing experiments with different networks.
3.3. Logical topology design cPi,j þPen,
ð7Þ
i,j ¼ 1
where N is the number of nodes and cPi,j are the elements of CP. If the demands for the number of independent paths for some node pairs are not satisfied, then the penalty is: Pen ¼ 1 þ N2 cmax ,
ð8Þ
3.3.1. Average bit delay The attributes of a network in control engineering are the same as the attributes of a computer or communication network. The network has Poisson distribution at the inter-arrival time and the service time as well. Then the average delay of the next sent bit is: Dp ¼
Fig. 2. Adjacency matrix.
Lp , Cap Lp
ð10Þ
Fig. 3. Crossover.
T. Fencl et al. / Control Engineering Practice 19 (2011) 1287–1296
where Cap is the link capacity and Lp is the load of the link (Gerla & Kleinrock, 1977).
3.4. Design of the logical topology Fig. 4 shows a possible physical topology that meets the demand for two independent paths between the pairs of nodes V1, V2 and V1, V3. The network consists of two different parts: branches of the network (V1 ,V6 ; V3 ,V4 ,V5 ) and the ‘‘core’’ of the network (V1 ,V2 ,V3 ). The logical topology in the branches is strictly given by the physical topology. Thus, there is no any possibility to change the logical topology without modifying the physical topology. It is possible to find the branches of the network thanks to the degree of nodes. The start node of the branch has degðnÞ ¼ 1. Then every node, which is connected to the start node and has a degree of less than 3, is a component of the branch. The end node of the branch has degree degðnÞ Z3. Then it is possible to find all branches and exclude them from the logical topology design. (There is no possible change in the logical topology of the branches since it is strictly given by the physical topology.) Before continuing the design, it is appropriate to verify the delays in the branches (delays in the branches must be smaller than the permitted delays for all data flows including the data flows that continue through the ‘‘core’’ of the network) because it is possible for the physical topology not to allow one to keep the permitted delay in the branches and, therefore, it is necessary to start the design of the physical topology again. Excluding the branches decreases the number of possible logical topologies and increases the algorithm speed. It is possible to see it in the following example: The original matrix of the data flows for the network looked like this: 0
0 B f B 2,1 B B f3,1 B F ¼B B f4,1 B B 0 @ 0
f1,2
f1,3
f1,4
0 f2,5
0
f2,3
0
f3,2
0
f3,4
0
0
0
0
f4,5
f5,2 f6,2
0 0
f5,4 0
0 f6,5
The original matrix 0 0 d1,2 d1,3 B 0 d2,3 B d2,1 B B d3,1 d3,2 0 B D¼B 0 0 B d4,1 B B 0 0 d 5,2 @ 0
d6,2
0
0
1
C f2,6 C C 0 C C C: 0 C C f5,6 C A 0
ð11Þ
of permitted delays looked like this: 1 d1,4 0 0 C 0 d2,5 d2,6 C C d3,4 0 0 C C ð12Þ C: 0 d4,5 0 C C d5,4 0 d5,6 C A 0 d6,5 0
1291
Matrices (11) and (12) are changed into matrices (13) and (14) after removing the branches 0 1 0 fn1,2 fn1,3 B 0 f2,3 C Fn ¼ @ f2,1 ð13Þ A, fn3,1 fn3,2 0 0
0 Bd B 2,1 B B d3,1 B Dn ¼ B B dn4,1 B B 0 @ 0
d1,2
d1,3
dn1,4
0
0
d2,3
0
dn2,5
d3,2
0
d3,4
0
0
0
0
d4,5
dn5,2 dn6,2
0 0
d5,4 0
0 dn6,5
0
1
dn2,6 C C C 0 C C C, 0 C C dn5,6 C A 0
ð14Þ
where the elements of matrix Fn comprise the following data flows: In a new matrix of data flows there are not only data flows with the input and output nodes being in the ‘‘core’’ but also data flows with the input or output nodes being outside of the ‘‘core’’. The data flow, which goes through the ‘‘core’’, has new input and output nodes. They are the first and last nodes in the ‘‘core’’ for these data flows. fn1,2 ¼ f1,2 þ f6,2 , fn1,3 ¼ f1,3 þ f6,5 , fn3,1 ¼ f3,1 þ f4,1 þf5,6 , fn3,2 ¼ f3,2 þ f5,2 :
ð15Þ
Elements of matrix Dn are: dni,j ¼ di,j dBi,j :
ð16Þ
In (16), dBi,j is the sum of all delays in the branches for data flow fi,j , the delay in the branches is counted according to (10). Delay dni,j is the new maximum permitted delay for the data flow that is transmitted/received in the branch and goes through the ‘‘core’’ of the network. One must presume a smaller capacity of communication links because if some communication link is interrupted, its load will be sent through the other links. If the substitute links are fully utilised, the data will be delivered too late. Therefore, one can only count on a smaller link capacity that is given by 1 k ¼ 1 , n
ð17Þ
where n is the amount of independent paths; then the new capacity in the ‘‘core’’ of the network is CPN ¼ kC ap :
ð18Þ
Then, CPN is used instead of Cap from (10) to calculate the delay in the network ‘‘core’’.
V1
V6
V3
V2
V7
3.4.1. Logical topology representation The chromosome for the logical topology description codes every possible path for every data flow in the ‘‘core’’ of the network. The length of the chromosome is given by l ¼ jm,
V4
V5
Fig. 4. Network.
ð19Þ
where j is the amount of data flows, which go through the ‘‘network core’’, and m is the number of communication links in the ‘‘core’’ of the network. In the described example, j ¼11 (flows fni,j goes through the ‘‘core’’ of the network and they are composed of flows fi,j of matrix F as shown in (15)) and m¼3 (see (15) and (13), Fig. 4) and, therefore, the length of the chromosome is l ¼33.
1292
T. Fencl et al. / Control Engineering Practice 19 (2011) 1287–1296
3.4.2. Genetic operators It is necessary to modify the genetic operators for the above described chromosome configuration. The crossover function can be modified very easily; it is known that a chromosome comprises groups of bits that represent the logical path for each data flow. Therefore, the one-point crossover function must cross the chromosomes exactly at the borders of these groups. If the chromosomes are crossed somewhere else, the resultant logical topology can be impermissible. A possible crossing point is depicted in Fig. 5. In this case the logical paths Px, which describe each data flow, are composed of 3 bits, because the ‘‘core’’ in Fig. 4 contains 3 links. The LSB (Least Significant Bit) corresponds to the link between nodes V1 and V2 and the MSB (Most Significant Bit) corresponds to the link between nodes V3 and V1 (see Fig. 4). It is necessary to modify the mutation operator for the chosen description of the logical topology. If the common one-bit mutation is applied, the results obtained by this operator will be completely wrong because they will represent an impermissible logical topology. An application of a mutation is shown for data flow f1,2 . Firstly, it is necessary to create a list of all possible logical paths for each data flow. The possible paths are represented by only two binary chains, these being 001 and 110 (the coding of the path is the same as for the crossover). If chain 001 is mutated, its value will be transformed into chain 110 and vice versa. It is to be expected that there are more than two possible binary chains. In this case, the mutation result is chosen at random from the set of the remaining paths. The lists of the possible paths are also created during the chromosomes initialisation. Initialisation is completed by setting a randomly chosen logical path for each data flow.
3.4.3. Fitness function An important part of the genetic algorithm is the fitness function and it must be defined for the network topology design problem. The delay in the network can be calculated very easily because each chromosome exactly defines the whole logical topology for the ‘‘core’’ and the rest of the logical topology (the logical topology in branches) is known from the past. The average delay or the sum of delays in every link can be used as the fitness function. If the average delay decreases, then the quality of the logical topology will increase. It is possible that the average delay is satisfactory but the delay in one path is bigger than the permitted delay and, therefore, the whole logical topology is wrong. Thus, a certain level of inappropriate designs must be recognised. To accomplish this, a penalty to the fitness function is added. A different penalty is needed for each rank of inappropriate chromosomes according to the number of delays that are bigger than the permitted delays. Penalty ¼ k maxðDÞ,
Fit ¼
j X
penalty function; k is the number of delays that are bigger than the permitted delay. 3.5. Selection and end condition A tournament selection is used because of its efficiency. An important piece of knowledge is that for the chosen fitness function, the best chromosome has the smallest delay (the best quality). The end condition is the number of algorithm iterations, which is set to 100 as described early in Section 3.2.2.
4. Algorithm 4.1. Algorithm description As mentioned above, the whole algorithm is an iterative task based on the genetic algorithm. The structure of the whole algorithm is depicted in Fig. 6. Every part of the algorithm is connected to other parts only with inputs and outputs. Therefore, it is easy to replace parts of the algorithm if needed. Thus, if a more accurate model of the network behaviour is needed, just the block ‘‘logical topology design’’ is updated with the new model and the other parts of the algorithm remain unaffected. The new model may also represent a different type of network such as an electrical grid, pipelines, etc. 4.1.1. Block ‘‘Set input parameters’’ The matrices of data flow, permitted delays, acquisition costs and redundancy are set in this step. The accumulator of inappropriate solutions is filled with the unsuitable topologies, which do not satisfy the requirements mostly because of the environment limitation. The structure of unsuitable topologies is based on the knowledge of application engineers and their experience with the environment in which the network will be used or it can be just the wish of the plant manager. 4.1.2. Block ‘‘Physical topology’’ The physical topology design is obtained by the application of the genetic algorithm in this step. It is a common genetic algorithm
Set input parameters
Design of physical topology
Accumulator of unsuitable solutions
Physical topology
ð20Þ
di þPenalty:
ð21Þ
Unsuitable physical topology
Design of logical topology
i¼1
Eq. (21) is the fitness function that describes the quality of a chromosome; di is the delay for the ith data flow, which is gained from (10) for each data flow in the ‘‘core’’. Eq. (20) describes the
Yes P1
_
Correct structure?
Save data
NO P2
Fig. 5. Crossover function.
P11
+ Write output parameters Fig. 6. Whole algorithm.
T. Fencl et al. / Control Engineering Practice 19 (2011) 1287–1296
that uses the representation, genetic operators and fitness function as described in Section 3.2. The designed topology is compared with the topologies stored in the accumulator of the inappropriate topologies, and if appropriate, the resultant topology is used as the input for the design of the logical topology. 4.1.3. Block ‘‘Logical topology’’ In this step, it is verified whether the physical topology allows the maximum permitted delay to be preserved for each data flow. For this purpose, a logical topology is designed. If the logical topology does not meet the demands for the permitted delay, the physical topology is stored in the accumulator of the inappropriate topologies and the algorithm starts designing the physical topology again. If the network allows the maximum permitted delay to be preserved, the algorithm ends and the results are written to the output. One can see that it is possible that the algorithm could iterate to infinity because of unrealistic demands. Therefore, it is necessary to set the permitted number of iterations of the whole algorithm. If the limit number of iterations is reached, the algorithm is stopped and the best result from the accumulator of the inappropriate solutions is written to the output. The maximum number of algorithm iterations is set empirically according to the network complexity.
then the maximum found capacity corresponds to the number of independent paths between the nodes and the breadth-first search is used to find of the augmenting path. Thus, the requirement on the complexity of Ford–Fulkerson algorithm to be jEjn is fulfilled. The number of independent paths for every pair of nodes can vary: n A /2; N1S, where 2 corresponds to the ring topology and N1 corresponds to the fully connected network where every node is connected with all other nodes. The number of communication links is jEjA N; NðN1Þ=2 , where N corresponds to the number of communication links for the ring topology and NðN1Þ=2 corresponds to the number of links of the fully connected network. Then the complexity of the physical topology design is:
YP ðN 3 ; N5 Þ:
ð25Þ
Similar substitution for the logical topology is as follows. The amount of data flows in the worst case corresponds to NF NðN1Þ since every node can communicate with all other nodes. Then the complexity of the logical topology is
YL ðN4 Þ:
ð26Þ
The complexity of the whole algorithm is
Y ðN4 ; N5 Þ:
ð27Þ
4.2. Time complexity
5. Results
Design of the physical topology: NðN1Þ NðN1Þ ð1 þjEjnÞ þ Nch jEjn ðpc þ pm Þ þ4 NP , YP ¼ Nch 2 2 ð22Þ
5.1. Numerical results
where the first part belongs to the initialisation of the genetic algorithm, jEj is the number of communication links, N is the number of nodes, n is the number of independent communication paths, Nch is the number of the chromosomes. The second part belongs to the iterations of the genetic algorithm; pc and pm are the probability of the crossover and the mutation for the design of the physical topology, NP belongs to the number of operations, which are necessary for the tournament selection; NP is the number of iterations of the algorithm for the physical topology design. Design of the logical topology:
YL ¼ jEjnNF þðNLch NF Þ4jEjNF þðNLCh ðpLc þ pLm Þ4jEjNF þ 4NLCh ÞNL , ð23Þ where NF is the amount of data flow, NLCh is the number of chromosomes, pLc and pLm are the probabilities of the crossover and the mutation, respectively, and NL is the number of iterations. Complexity of the whole algorithm: The worst-case time complexity of the whole algorithm is
Y ¼ NA ðYL þ YP Þ,
1293
The tests of the physical topology design were made for medium sized networks with a different number of nodes (N¼ 10, 12, 14, 16, 18, 20). The experiments were performed with the following parameters: population size 200, crossover probability pc ¼0.25, mutation probability pm ¼ 0.05, the number of iterations 100. The PC was a Pentium 4–3 GHz, 512 MB RAM, Windows XP-SP2. The algorithm was implemented in C# (Table 1). The gained network is expected to be applied in control engineering and hence the resultant network must be k 1 fault tolerant if k-connectivity is used as a reliability measure. Therefore, the algorithm applies Eq. (2) instead of (1). This ensures that every resultant topology is really (k 1) fault tolerant contrary to algorithms using (1) as Cheng (1998) and Szlachcic (2006). One can see that the algorithm is not sensitive to the requested number of k-connectivity, but is sensitive to the number of nodes. This attribute of the algorithm is expected because when the node number increases, the length of the chromosome rises as well. Moreover, if the length of the chromosome is bigger, chromosome creation needs more time. Thus, the algorithm is sensitive to the number of nodes. In Tables 2 and 4 j represents the number of node pairs, for which three independent interconnecting paths are requested. For the remaining pairs of nodes only two independent paths are
ð24Þ
where NA is the number of iterations of the whole algorithm after which the algorithm is stopped without finding an appropriate solution. This situation arises from the unrealistic demands for the permitted delays which are impossible to reach because of the other demands on the network. It is possible to find a more compact version of (24). Part jEjn belongs to the complexity of the Ford–Fulkerson algorithm (Diestel, 2005). The Ford–Fulkerson algorithm has this complexity if the capacity of the edge/link is an integer and the breadth-first search algorithm is used to find the augmenting path. In the case of the proposed algorithm, the capacity of the communication link is set to 1 during the verification of the number of independent paths;
Table 1 Time consumption for k-connectivity. N
k ¼3 T (ms)
k¼ 4 T (ms)
k¼ 5 T (ms)
10 12 14 16 18 20
118 170 224 288 356 439
120 172 223 288 357 440
121 172 224 287 356 440
1294
T. Fencl et al. / Control Engineering Practice 19 (2011) 1287–1296
5.2. Simulation results
Table 2 Time consumption for the different number of redundant paths. N
j
T (s)
10 12 14 16 18 20
4 5 6 7 8 9
2.86 4.41 6.49 9.39 13.42 18.62
Table 3 k-connectivity, statistical results. N
m
s
min
10 12 14 16 18 20
15.56 18.96 22.07 25.55 28.85 32.66
0.05 0.06 0.06 0.09 0.09 0.1
15 18 21 24 27 31
Table 4 Physical topology design, statistical results. N
j
m
s
min
10 12 14 16 18 20
4 5 6 7 8 9
13.41 16.42 19.38 22.11 25.27 28.63
0.05 0.08 0.08 0.09 0.09 0.11
13 15 18 20 23 26
requested. It is possible to see that the algorithm needs a longer solution time but this behaviour is natural, because the algorithm has to verify the number of independent paths. In Tables 3 and 4 min is the minimum value of the acquisition costs found, m is the average acquisition cost and s is the variance. These results were obtained from 100 runs of the network topology design algorithm. Tables 3 and 4 contain the results for a network, which has at least two independent paths between the nodes outside of the ‘‘core’’ and there are three or more independent paths between the node pairs in the ‘‘core’’. The ‘‘core’’ contains the minimum number of the node pairs according to jth column in Table 4. Table 3 describes the results, which were gained by the algorithm based on Eq. (1) for 3-connectivity in the whole network. This setting should ensure that there are three independent paths for each pair in the core. Table 4 describes the results, which were gained by the proposed algorithm for the requested number of independent paths. The results in Tables 3 and 4 were obtained from a hundred runs for each set of input parameters. The acquisition costs for each link were set to 1. Thanks to that, one can compare the acquisition costs of the resultant network. It is possible to see that the average costs and the minimum costs in Table 4 are in every case smaller than in Table 3. According to these results, one can conclude that the algorithm designs a better physical topology with the requested number of independent paths in a reasonable time. An important part is the verification of the end-to-end delay for each data flow in the worst-case scenario, which can be done according to (10). Nevertheless, the results are too pessimistic regarding network behaviour since the worst-case scenario cannot happen for each data flow. Thus, it is useful to have knowledge about the ordinary network behaviour, which can be obtained, for example, from a network model simulation as shown in Section 5.2.
The end-to-end delay calculation used for the logical topology design does not describe the behaviour of the network exactly because it expects that, in every link, all the other load is going out from the node before the load for which the delay is counted. Moreover, there is an expectation that the load is sent in one burst and this kind of behaviour is expected in the whole network. Therefore, the results are unrealistically pessimistic and do not describe the real network behaviour. This behaviour can occur only in some part of the network but not in the whole network. Therefore, one can expect that this behaviour will not ever occur in the whole network and it is a simulation of the worst-case scenario. Nevertheless, it is also useful to estimate the real network behaviour under the demanded load as shown in the following text. A fixed packet size 50 bytes is expected for this calculation. Every node uses, for the packet generation, an exponential distribution with a mean outcome of 0.005 (200 packets per second). In a typical industrial application the length of the packets is constant and the data is sent in cycles which can vary a little. It can be modelled with a uniform distribution in constant time. Packet routing is provided according to the routing table which was gained by the proposed algorithm in its second step, i.e. during the design of the logical topology. It is possible to estimate the end-to-end delay thanks to the knowledge of the data load at every link and the capacity of the links. These variables allow the calculation of all the data flows which affect the data flow for which the estimation of the end-to-end delay is provided. Finf L ¼
n X j¼0
n X
fLj
fLk ðlinkk 1Þ,
ð28Þ
k¼0
P where L is the path index, nj¼ 0 fLj is the sum of all the data flows going through path L, and j is the index of the data flow going P through path L. nk ¼ 0 fLk ðlinkk 1Þ is the sum of all the data flows that do not have an influence on the packet, for which the end-toend delay is counted. linkk is the number of links of path L that are used by the data flow fLk . The data flow can delay a packet at one communication link and a corresponding queue, not at two or more links and queues since a packet is already postponed at the first queue and the average outcome from the queue is the same. The average number of packets that can delay a packet at path L is: PackL ¼
Finf L , Packlen
ð29Þ
where Packlen is the length of the packet. The uniform distribution of the packets in the limited time, from the long-term point of view, is expected (every node sends them at different times; thus packets from other sources are distributed uniformly among the links from the long-term point of view). Therefore, each packet is delayed with the same number of packets in average (the number of packets is different for each data flow and it is dependent on the logical path and the other data flows that use this logical path). From this it is possible to estimate the end-to-end delay as: ETEL ¼ ðPackL þ linkÞ Tp ,
ð30Þ
where link is the number of links on the path for the packet and Tp is the delay which the packet needs to pass through a link and the fifo queue, as computed according to (10). If there is no any other load, the end-to-end delay will only be: ETEk ¼ link TP :
ð31Þ
The estimation of the end-to-end delay was verified in OPNET software which allows the network modelling and gaining network
T. Fencl et al. / Control Engineering Practice 19 (2011) 1287–1296
1295
Nod_0 Nod_1 Nod_2 rcv1
Nod_4.Switch0 Nod_3
Nod_1.Switch0
Nod_4
rcv0
Net0
Switch0
rcv2 Switch
Nod_7
Nod_5 Nod_6
xmt1
Nod_6.Switch0
xmt2
xmt0 rcv3
Rcv
Src
xmt3
Proc
Xmt Fig. 7. Opnet model.
ETE32 4.5 Opnet No Delay 63% 86% 95%
4 End−To−End Delay [msec.]
statistics. The example of the OPNET network is in Fig. 7. The network contains nodes Nodi that includes the main part Net0 and Switch0. Net0 is the source of the data frames that are created in block src. The source and the destination addresses are filled into the packets according to the data flow matrix F in block proc. Then the packets continue to block Switch0 which sends each packet to an appropriate communication port according to the routing table, which is gained by the proposed algorithm in the second step during the logical topology design. Then, the packet is received by the switch that is connected to another end of the communication link and the packet is stored in the fifo queue that ensures loss-free communication. The packet leaves the fifo queue if all preview packets are retransmitted. Then the main block of the switch (called Switch) accepts the packet and transfers it to the main node or to the other communication link (according to the routing table). Every main block proc of the node stores the statistics of the incoming packets (end-to-end delay). Every block of the OPNET network (e.g. proc, switch) is created as a state machine (it is the basic concept of the OPNET environment) that reacts on the events, e.g. incoming, outgoing packets. Matrix F of the data flows and routing tables (which were gained during the logical topology design) were imported as header files into the OPNET environment. The model of the fifo queue belongs to the OPNET modeller. One can compare the gained results in Fig. 8 where the data in light blue (without marker) is gained in the OPNET software. Other lines correspond to the data gained by the estimation. The black line (without marker) is the end-to-end delay for the data flow if any other flow is present in the network; there should be 63% of the end-to-end delay under the blue line (with box marker). It corresponds to Eq. (30). There should be 86% under the green line (circle marker) and 95% of the end-to-end delay under the red line (diamond marker). One can see in Fig. 8 that the estimation does not correspond exactly to the simulated results gained by the OPNET software. It is possible to compare the results gained by the Opnet and the estimation of the end-to-end delay in Table 5. Table 5 was
3.5 3 2.5 2 1.5 1 0
100
200 300 400 500 Simulated Time [sec.]
600
700
Fig. 8. End-to-end delay (Opnet). (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)
Table 5 End-to-end delay. Software (%) Opnet (%)
63 95
86.4 98
95 99
gained for a network composed of 8 nodes and 52 data flows (the network in the OPNET has of course the same structure). The numbers of the packets, which should reach the destination sooner than a certain time, are in row software. The numbers of packets, which really reach the destination before the estimated time, are in row Opnet. It is possible to conclude that one is able to
1296
T. Fencl et al. / Control Engineering Practice 19 (2011) 1287–1296
predict the network behaviour before building a real network actually, since the estimation describes the common behaviour of the network and the end-to-end delay calculation according to (10) describes the behaviour in the worst-case scenario.
of Networked Embedded System Applications’’, and by the Czech Ministry of Education, Youth, and Sports under the EuSophos project No. 2C06010.
6. Conclusion
References
A new algorithm for the fault-tolerant network topology design has been described in this paper. The algorithm is able to ensure that the level of fault tolerance in different parts of the network may vary according to the application requirements. The resultant network keeps the constraints of the data delivery time, i.e. the deadlines for data delivery, and also keeps the network acquisition costs as small as possible. This is a huge improvement with respect to other existing algorithms for the network topology design. The numerical results, which have been described in Section 5.1, show that the topology gained by the algorithm has smaller acquisition costs than the topology gained by other existing algorithms (the algorithm that only uses k-connectivity as the test of fault tolerance; in case that they are able to design really fault-tolerant network); the variance of the results for both algorithms is quite similar. Networks designed by the existing algorithm always have bigger acquisition costs because of the applied principle. It is obvious that the existing algorithms, which use k-connectivity, design a more expensive network than the proposed iterative algorithm, when a network topology with different degrees of fault tolerance is needed. The cost savings are more than 10% and these results prove that the described algorithm designs networks with smaller acquisition costs. The obtained physical topology is used as the input to the second part of the algorithm, which was described in Section 3.4. The results that were gained in the second part of the described algorithm show that it is possible to use the method of network behaviour estimation and predict it before really building the network. Due to the nonlinearity of (8) and the expectation of communication in bursts one can be sure that the designed network allows reliable real-time communication in all situations. In the case when it is not possible to fulfill the demands for low acquisition costs and fault tolerance, one can use the estimation of the real behaviour which is not so pessimistic and can predict the network behaviour more accurately provided that it is not necessary to know the behaviour in the worst-case scenario. The algorithm division into two parts allows effortless algorithm adaptation for different types of communication technology and communication channels if is needed. Moreover, it is intended to incorporate another way of calculating the end-to-end delay in the second part of the algorithm to be able to use it for the design of the hard real-time networks such as Profinet IRT. In such a case, the way of calculating the delay must take the message schedule into account. The whole algorithm provides a design of a fault-tolerant inexpensive network that can be used in control engineering because the network satisfies the fault tolerance demands. Furthermore, it can be guaranteed that the data is delivered earlier than the deadline expires. Additionally, the price of the resultant network is as small as possible.
Altiparmak, F., & Dengiz, B. (2009). A cross entropy approach to design of reliable networks. European Journal of Operational Research, 199(2), 542–552. Altiparmak, F., Dengiz, B., & Smith, A. (2009). A general neural network model for estimating telecommunications network reliability. IEEE Transactions on Reliability, 58(1), 2–9. Cheng, S.-T. (1998). Topological optimization of a reliable communication network. IEEE Transactions on Reliability, 47(3), 225–233. Coello, C. A. (2000). An updated survey of GA-based multiobjective optimization techniques. ACM Computing Surveys, 32(2), 109–143. Cormen, T., Leiserson, C., Rivest, R., & Stein, C. (2001). Introduction to algorithms (2nd ed.). New York: McGraw Hill New York. Diestel, R. (2005). Graph theory, electronic edition (3rd ed.). Heidelberg, New York: Springer-Verlag. Dutta, A., & Mitra, S. (1993). Integrating heuristic knowledge and optimization models for communication network design. IEEE Transactions on Knowledge and Data Engineering, 5(6), 999–1017. Elbaum, R., & Sidi, M. (1996). Topological design of local-area networks using genetic algorithms. IEEE/ACM Transactions on Networking, 4(5), 766–778. Fencl, T., Burget, P., & Bilek, J. (2008). Network topology design. Preprints of the 17th IFAC World Congress (vol. 17 (1)). Gerla, M., & Kleinrock, L. (1977). On the topological design of distributed computer networks. IEEE Transactions on Communications, 25(1), 48–60. Han, J., Malan, G., & Jahanian, F. (2002). Fault-tolerant virtual private networks within an autonomous system. Proceedings of 21st IEEE symposium on reliable distributed systems (pp. 41–50). Hanzalek, Z., Burget, P., & Sucha, P. (2010). Profinet IO IRT message scheduling. IEEE Transactions on Industrial Informatics, 6(3), 369–380. Hayashi, M., & Abe, T. (2008). Evaluating reliability of telecommunications networks using traffic path information. IEEE Transactions on Reliability, 57(2), 283–294. Jan, R.-H., Hwang, F.-J., & Chen, S.-T. (1993). Topological optimization of a communication network subject to a reliability constraint. IEEE Transactions on Reliability, 42(1), 63–70. Ko, K.-T., Tang, K.-S., Chan, C.-Y., Man, K.-F., & Kwong, S. (1997). Using genetic algorithms to design mesh networks. Computer, 30(8), 56–61. Kumar, G., Narang, N., & Ravikumar, C. (1998). Efficient algorithms for delaybounded minimum cost path problem in communication networks. HIPC ’98. Fifth international conference on high performance computing (pp. 141–146), December. Lin, Y.-K. (2010). Reliability of separate minimal paths under both time and budget constraints. IEEE Transactions on Reliability, 59(1), 183–190. Lin, Y.-K., & Yeh, C.-T. (2010). Evaluation of optimal network reliability under components-assignments subject to a transmission budget. IEEE Transactions on Reliability, 59(3), 539–550. Liu, B., & Iwamura, K. (2000). Topological optimization models for communication network with multiple reliability goals. Computers & Mathematics with Applications, 39(7–8), 59–69. Misra, S., Guoliang, X., & Yang, D. (2009). Polynomial time approximations for multi-path routing with bandwidth and delay constraints. In INFOCOM 2009. IEEE (pp. 558–566), April. Sun, X., Sung, Y.-W., Krothapalli, S.-D., & Rao, S.-G. (2010). A systematic approach for evolving VLAN designs. In INFOCOM, 2010 proceedings. IEEE (pp. 1–9), March. Szlachcic, E. (2006). Fault-tolerant topological design for computer networks. DepCos-RELCOMEX ’06. International conference on dependability of computer systems (pp. 150–159), May. Szlachcic, E., & Mlynek, J. (2009). Efficiency analysis in communication networks topology design. DepCos-RELCOMEX ’09. Fourth international conference on dependability of computer systems (pp. 184–191), July. Thompson, D., & Gilbro, G. (1998). Comparison of two swap heuristics with a genetic algorithm for the design of an ATM network. Proceedings of 7th international conference on computer communications and networks (pp. 833–837), October. Wang, C.-S., & Chang, C.-T. (2008). Integrated genetic algorithm and goal programming for network topology design problem with multiple objectives and multiple criteria. IEEE/ACM Transactions on Networking, 16(3), 680–690. Zili, D., Nenghai, Y., & Zheng, L. (2008). Designing fault tolerant networks topologies based on greedy algorithm. DepCos-RELCOMEX ’08. Third international conference on dependability of computer systems (pp. 227–234), June.
Acknowledgements This paper has been co-financed by the Grant Agency of the Czech Republic, project number 102/08/1429 ‘‘Safety and Security