Simulation of learning in communication networks

Simulation of learning in communication networks

Simulation Elsevier SIMPRA Practice and Theory 41 1 (1993) 41-48 005 Simulation of learning in communication networks F. Lehmann, Fakultiit&r ...

605KB Sizes 0 Downloads 65 Views

Simulation Elsevier

SIMPRA

Practice

and Theory

41

1 (1993) 41-48

005

Simulation of learning in communication networks F. Lehmann, Fakultiit&r

R. Seising and E. Walther-Klaus

Informatik,

Received 29 March

Uniuersitiit der Bundeswehr Miinchen, 8014 Neubiberg,

Germany

1993

Abstract Lehmann, F., R. Seising and E. Walther-Klaus, Practice and Theory 1 (1993) 41-48.

Simulation

of learning

in communication

networks,

Simulation

A theory with subjective and objective probabilities is described and studied by simulation. Subjective probabilities are used to quantify the knowledge of a person about objective probabilities. This is shown to be useful for estimation from small samples, The theory contains a procedure for learning from observations, which leads with probability 1 to the true distribution, if it had not been excluded by the person at the very beginning. The application of the theory in causal networks is discussed, which describes the propagation of the consequences of a fault in a communication network. Keywords: Uncertainty; causal networks; communication subjective probability; learning from observations.

networks;

fault management;

objective

probability;

1. Introduction Computer communication networks are systems of very high complexity. Local area networks (LANs), metropolitan area networks (MANS) and wide area networks (WANs) may contain many thousands of components, e.g., computers, workstations, bridges, routers, communication lines, etc. Networks of different types might be connected via bridges or high-speed backbone networks. During operation different protocols for medium access, routing, error control and other communication related tasks are employed. Unfortunately not all of the hardware and software components of the net work correctly and faults may occur. The management of the network has to deal with these faults [l]. One of the tasks of fault management is to detect and to locate faults. A human operator employs many ways of reasoning for this diagnostics. His work is considerably complicated by the fact that the network itself has built-in fault tolerance and that the routing algorithm might avoid faulty stations for some time. To assist the diagnostician expert systems and other features of artificial intelligence are considered [2, 111. Learning is one area of artificial intelligence which might be useful in fault management systems. In the next section some aspects of learning under uncertainty in communication systems are discussed. We are interested especially in learning from observations of random events. The third section gives a short introduction to a learning theory, which combines two concepts of probability according to different types of uncertainty met in real systems. In communication networks a fault is not normally detected at the place where it occurred, but it Correspondence to: F. Lehmann, Fakultat fur Informatik, Germany. Email: [email protected]. 0928-4869/93/$06.00

0

1993 - Elsevier Science Publishers

Universitlt

der Bundeswehr

B.V. All rights reserved

Mtinchen,

8014 Neubiberg,

42

F. Lehmann et al.

might cause errors at some other node in the network and be detected there. The correlations between such events at different places can be described to some extent by causal networks. In the fourth section causal networks are sketched and the learning theory is extended to causal networks. Some aspects of the application of these nets to communication systems are discussed. To study the behaviour of the learning process, simulations and analytical investigations have been employed. The fifth section contains results.

2. Uncertainty

in communication

networks

The manager of a communication network faces several types of uncertainty. Hosts, nodes and other components of the network need not act correctly all the time. Damages of hardware components, disturbance of transmissions by external physical influences, malfunction of software components or misbehaviour of users may cause faults. The time and location of the occurrence of faults are unpredictable, i.e. uncertain. This uncertainty is caused by the randomness of physical processes in the network and it is independent of any observer. The classical tool to deal with this type of uncertainty is the concept of objective probability. The properties of this probability can be captured in axioms departing from simple properties of random experiments. Based on these axioms one can show that objective probabilities fulfill the calculus of Kolmogorov [3, 91. A fault of a component of the network may have consequences at other components. It may happen that its occurrence is not noticed at the place where it occurred, but that the management observes the consequences at some other spot of the network. The diagnostic task consists then in departing from the observed malfunction to infer which component of the network is the ultimate cause. Fault manifestations at different nodes of the network are correlated. These correlations can in many cases be described by conditional probabilities in a causal network. This knowledge may be used to infer the faulty component by probabilistic reasoning (see e.g. [S]). This type of reasoning assumes that the probabilities are known and that the use of more and more information leads to conditional probabilities for each component to be the faulty one. Ideally one obtains at the end a conditional probability of “1” for one component and “0” for every other. Normally for the next diagnostic task the same probabilities are used. Having accepted objective probabilities governing the random processes in the system, one observes another type of uncertainty: the manager might not know the objective probabilities exactly, but he might be uncertain about their values. This type of uncertainty is caused by his lack of knowledge. A tool for modelling this type of uncertainty is subjective probability. As opposed to objective probabilities subjective probabilities depend on the person considered and his knowledge. The person uses the subjective probabilities for decisions. Therefore the subjective probabilities of a person can be inferred from his behaviour in certain situations, e.g. in betting. From simple assumptions about fair bets a system of axioms for subjective probabilities can be obtained (see e.g. [lo]). They lead again to the calculus of Kolmogorov, which therefore is valid for objective and subjective probabilities. If the manager has a good knowledge about the objective probability he should have an advantage for his decisions over somebody having no preknowledge and using therefore observed frequencies only. It has been emphasized that subjective probabilities depend on the knowledge of a person, for example, the manager of the network. During the operation of the network the manager observes the occurrence or non-occurrence of faults. In this way his knowledge has changed and accordingly his subjective probabilities have to be adjusted to his new state of knowledge. This adjustment of the subjective probability is a way to learn from observations. In this paper we deal with this type of learning. There are other learning processes in communication nets possible, e.g. learning of faults patterns from diagnostics [7].

Simulation

of learning

in communication

networks

43

3. A learning theory

In this section a learning theory with objective and subjective probabilities is sketched. Given a random experiment with a finite number of outcomes that is governed by an objective probability distribution P,; this distribution is unknown to the manager (observer). For him the true distribution is an element of a set M of possible objective distributions. The manager need not give the same weight to each distribution in M, because he may already have some information about the experiment. This knowledge is quantified by a subjective distribution over M which is assumed to have a density 4. In decisions concerning an event E the person will use his subjectively expected probability P,(E) = ~WIP)

$(P) dp>

(1)

where P(Elp) is the objective probability of E, if p is the true objective probability distribution, i.e. P,(E) is composed from all of the possible objective probabilities each weighted according the knowledge of the person. If the experiment has been executed and the event A has been observed, the subjective density 4 has to be changed to incorporate this new knowledge to a new density 4”. Departing from weak assumptions about the rational behaviour of the person, one obtains the following learning formula [lo], (2)

4”(P) = c~wlP)*4P)~ where 4(p) is the old subjective density, P(Alp) is as explained above and “c” is a normalizing constant. The new subjectively expected probability of an event E is then J’:(E) = ~P(EIP)$~(P)

dp.

(3)

A more detailed description of this learning theory is contained in [4, 51. From a statistical point of view the subjectively expected probability of an event E can be considered as an estimate of the unknown objective probability of E. The classical way to estimate probabilities is to use relative frequencies. Some simulation results to compare the two methods are given below.

4. Causal networks To deal with correlated manifestations of faults in communication systems causal networks might be used. A causal network in our sense is a directed graph of tree structure. To each node of the graph a random experiment is associated. Two nodes are connected by an arc if the experiments in the nodes are not independent. The direction of the arc indicates “causal dependencies”. The causal influence is quantified by objective conditional probabilities which are associated to the arcs. If the experiments at the nodes and the conditional probabilities at the arcs are to belong to a common probability space, they have to fulfill certain consistency conditions that are given by the well-known formula for the total probabilities. In this study of communication networks it is assumed that the conditional probabilities at the arcs are fixed and known to the manager, but that he is uncertain about the objective probability distributions governing the experiments at the nodes. This subjective uncertainty is quantified by a subjective density at each node. The consistency conditions that are given by the formulas of total probability in the single case of a probability space with the events A,A’ (complement of A) at node a and B,B’ at node b and the conditional probabilities P(BJA), P(BJA’) at the arcs lead to consistency conditions

44

F. Lehmann et al.

for the subjective

distributions

4, and &, at the nodes a and b: (4)

with

2 = P(BIA) - P(B(A”)

and

/I = P(BIA’).

Now the manager observes an event at node a. According to the learning formula (1) he has to adjust the subjective density c$, at node a. By this the consistency of the causal net is destroyed. Using (4) or its inverse (that exists in the case of dependence) he has to change the subjective densities in the neighbouring node and so within the whole network, if further nodes are connected to them by weighted arcs. For the application of causal nets in the management of communication nets the following aspects should be taken into account: the causal nets describe the logical relations between events and their consequences. For fault management this means that the structure of the causal net does not need to be isomorphic to the structure of the communication net. Therefore the tree structure assumed in this section might be useful, even if the communication net is more complicated. However, if feedback cannot be ignored, the theory must be extended.

5. Simulation

results

To study the usefulness of the given concepts, simulation and analytical investigations have been employed. A causal network has been defined. In the simulation, random events with a given objective probability at one node have been generated. This objective probability is not known to a simulated observer, He has a subjective density for each node. The densities fulfill the consistency conditions. He gets to know the events at a certain node and now has to evaluate the subjective densities 4 and the subjectively expected distributions P,(E) according to formula (1) and to adjust all other subjective densities and the subjectively expected distributions at each other node by using (4). This procedure has been reproduced in the simulation. First, typical results from observations at one node are discussed. Especially the statistical aspects of the procedure are of interest. Figure 1 shows the estimated variances of the relative frequencies ( x ) and of the subjectively expected probabilities ( + ) of an event E with true probability of 0.2. Each symbol is obtained from 30 samples of the learning process. The subjective density at the beginning has been the equidistribution in the interval [0,0.25]. The results show the advantages of the knowledge that the true probability is not greater than 0.25 for a small number of observations: the variances of the subjective expected probabilities are much smaller than the variances of the relative frequencies. For larger numbers of observations this advantage diminishes because the knowledge accumulated in the relative frequencies approximates the knowledge of the observer with preknowledge. Figure 2 shows the subjectively expected probability under the same conditions as the variances in Fig. 1 have been obtained. One observes a disadvantage of this method: the subjectively expected probability is not an unbiased statistic for the probability of E. However, if the bias can be computed analytically, the estimation procedure can be adjusted accordingly; e.g. if the subjective distribution at the beginning is the equidistribution on the interval [O,l], the objective expectation of the subjectively expected probability is equal to (5) where p is the true probability

and N the number

of learning

steps [7].

Simulation

YI I. 05

L.04.

Prob=

of learning

in communication

45

networks

0.2

x x xx

L.03.

10

Fig. 1. Variances

20

of the relative

Prob=

30

40

frequencies

50

60

70

( x ) and of the subjectively

80

expected

90

100

probabilities

J (+ )

0.2

Fig. 2. Bias of the subjectively

expected

probabilities.

Obviously this expectations converge to the true value, if the number of observations tends to infinity. Figure 3 shows the subjective densities after N = 200, 400, 600, 1000 observations. The true probability is 0.2 and the subjective distribution at the beginning was the equidistribution in the interval [0,0.25]. The simulations show that with a growing number of observations the subjective distributions concentrate around the true probability. From theory it can be shown that the subjective distributions converge to the distribution concentrated on the true objective distribution governing the random experiment with objective probability “1” [lo]. If the objective distribution was not expected to be possible at the beginning, that means it was not in the support of 4, the subjective distributions concentrate at the boundary of M (Fig. 4). This observation makes clear that it is important for the manager not to exclude the true distribution from the beginning. Figure 5 shows subjective densities at two adjacent nodes. On the right-hand side the density

46

F. Lehmann et al.

36 32 28 24 20 16 12 8 4

0.1

0.2

0.3

0.4

Fig. 3. Subjective

0.5 densities

0.6

0.7

with N=200,

0.8

0.9

1.0

400, 600, 1000.

Prob

32 28 24 20 16 12 8 4 I-

-l__

0.1

0.2

0.3

0.4

Fig. 4. Subjective

0.5

densities

0.6

0.7

0.8

with an inappropriate

0.9

1.0

support

at node b is shown, on the left-hand side the distribution at node a. The conditional distributions were p(BIA) =0.6, p(B(A”)=0.4. The densities are related by formula (4). Variations of the conditional probabilities on an arc show that the variance of the subjective distributions at the neighbour nodes are smaller, the smaller the difference of P(BIA) - P(BIA”) is. This does not mean that learning at nodes far away from the node causing the fault would be easier. It merely is a consequence of the consistency conditions. Learning at a distant node does not improve learning at the causative node. This can be seen theoretically by looking carefully at the consistency formulas as well.

6. Conclusions and extensions A theory has been described containing both subjective and objective probabilities. Subjective distributions are used to incorporate knowledge about unknown probability distributions.

Simulation

oflearning

in communication

networks

32

24

16

0.10.20.30.40.50.60.70.80.91.0

0.10.20.30.40.50.60.70.80.92.0

Fig. 5. Subjective

densities

at two adjacent

nodes

Simulations and theoretical studies show that this knowledge can improve the precision of estimates from a small number of observations. However, the subjectively expected probabilities are biased; if the bias can be computed, as it is possible in many cases, it can be removed. The combined theory contains a learning theory, which shows how subjective distributions are to be adapted to new knowledge from observations. This type of learning could be useful in many applications (e.g. fault management, expert systems). The learning theory is successful in the sense that an increasing sequence of independent observations in an experiment will lead to the true probability distribution if it had not been excluded from the beginning. The learning theory can be used in causal networks which model the propagation of consequences of a fault in communication networks, if conditional probabilities for the communication links are known. They lead to consistency conditions, which allow the computation of the subjective densities at every node of the network from the density at any one node. As a consequence observations for learning can be made anywhere in the network. There are several restrictions for the applicability of the theory in communication networks in this paper. Firstly it has been assumed that at each node only occurrence or not-occurrence of an event is considered. In practice there might be more than just two possible manifestations of a fault at a node. This can be handled in principle in the same manner as described; at each node there is a vector of probabilities of disjunct events. The subjective densities are densities in a space of appropriate dimensions. The arcs are to be weighted by matrices of conditional probabilities. The consistency equations become vector equations; to avoid singularities, one has to look carefully at dependency relations. The second assumption is that the causal net describes the consequences of one type of fault at one node. In practice faults may occur at different places in the network. If they are independent and if at each node it is possible to distinguish between the different causes, one can use a separate causal net for each of the generic faults. If they are not independent or if it

48

F. Lehmann et al.

is not possible in a node to tell from cause the consequence is being observed, the situation more complicated and deserves further studies. A third assumption is that the conditional probabilities on the arcs are known. This may not the case in practice. The theory can be used to learn the conditional probabilities on arcs, too, by considering the joint distributions of the events at all of the nodes. This leads subjective distributions in higher dimensions, which can be adapted to new observations exactly the same way as described above. By learning dependencies or independencies structure of the causal network might change.

is be the to in the

References [I] R.E. Caruso, Network management: A tutorial overview, IEEE Comm. Mag. 4 (3) (1990) 20-25. [2] C. Joseph, J. Kindrick, K. Muralidhar, C. So and T. Toth-Fejel, MAP fault management expert system, in: B. Meandzija and J. Westcott, eds., Integrated Network Management, I (North-Holland, Amsterdam, 1989) 627-636. [3] A.N. Kolmogorov, Grundbegr@e der Wahrscheinlichkeitsrechnung (Springer, Berlin, 1933). [4] F. Lehmann, R. Seising and E. Walther-Klaus, Objektive und subjektive Wahrscheinlichkeiten in Bayes-Netzen, Forschungsbericht, UniversitHt der Bundeswehr Miinchen, Fakultiit fiir Informatik, Nr. 9006, 1990. [S] F. Lehmann and E. Walther-Klaus, Combination of different concepts of probability, in: Proceedings of the 6th UK Computer and Telecommunications Performance Engineering Workshop, Bradford, UK, 1990. [6] F. Lehmann, R. Seising and E. Walther-Klaus, Analysis of learning in Bayesian networks for fault management, Forschungsbericht, Universitiit der Bundeswehr Miinchen, Fakultst fiir Informatik, Nr. 9106, 1991. [7] B. Pagurek, N. Daves and R. Kaye, A multiple paradigm diagnostic system for wide area communication networks, in: F. Belli and F.J. Radermacher, eds., Industrial and Engineering Applications of Artificial Intelligence and Expert Systems, 5th International Conference, IEAIAIE-92, Paderborn. Berlin, Germany (1992) 256-265. [S] J. Pearl, Fusion, propagation, and structuring in belief networks, Artificial Intelligence 29 (1986) 241-288. [9] H. Richter, Zur Grundlegung der Wahrscheinlichkeitstheorie, Teil I, Math. Ann. 125 (1953) 129-139; Teil II, Math. Ann. 125 (1953) 223-234; Teil III, Math. Ann. 125 (1953) 335-343: Teil IV, Math. Ann. 126 (1953) 362-374; Teil V, Math. Ann. 128 (1954) 305-339. [lo] H. Richter, Eine einfache Axiomatik der subjektiven Wahrscheinlichkeit, Inst. Nazionale di Alta Matematica Symp. Mathem. 1X (1972) 59-77. [I I] T. Yamakira, Y. Kirilia and S. Sakota, Unified fault management scheme for network trouble shooting expert system, in: B. Meandzija and J. Westcott, eds., Integrated Network Management, I (North-Holland, Amsterdam, 1989) 637-646.