International Journal of Approximate Reasoning 55 (2014) 2109–2125
Contents lists available at ScienceDirect
International Journal of Approximate Reasoning www.elsevier.com/locate/ijar
Combining dependent evidential bodies that share common knowledge Takehiko Nakama ∗ , Enrique Ruspini European Center for Soft Computing, c/Gonzalo Gutiérrez Quirós, s/n, 33600 Mieres, Spain
a r t i c l e
i n f o
Article history: Received 25 March 2013 Received in revised form 24 April 2014 Accepted 30 May 2014 Available online 12 June 2014 Keywords: Theory of evidence Dependent evidential body Conditional independence Dempster–Shafer formula Transferable belief model Probability theory
a b s t r a c t We establish a formula for combining dependent evidential bodies that are conditionally independent given their shared knowledge. Markov examples are provided to illustrate various aspects of our combination formula, including its practicality. We also show that the Dempster–Shafer formula and the conjunctive rule of the Transferable Belief Model can be recovered as special cases of our combination formula. © 2014 Elsevier Inc. All rights reserved.
1. Introduction and summary Many studies have been conducted to examine how to combine dependent evidential bodies, to which the classical Dempster–Shafer formula (Shafer [35]) or the Transferable Belief Model (TBM) conjunctive rule (e.g., Smets and Kennes [39], Smets [38]) is not applicable (e.g., Ling and Rudd [21], Elouedi and Mellouli [13], Cattaneo [5,4], Dubois and Yager [12], Denœux [8], Destercke and Dubois [9]). A general characterization of dependent evidential bodies is that they are based on “overlapping experiences” (Dempster [7], Denœux [8]), which lead to some shared knowledge. (Also see, for instance, Liu and Hong [22] for a probability-theoretic characterization of dependence.) In this paper, we establish a formula for combining dependent evidential bodies that are conditionally independent given their overlapping knowledge. Conditional independence underlies various mathematical frameworks such as Markov processes (e.g., Rust [34], Sutton and Barto [40], Puterman [28]), Kalman filters (e.g., Meinhold and Singpurwalla [24], Harvey [19], Thrun et al. [41]), and queuing theory (e.g., Feller [14,15], Gross and Harris [17], Ross [31]), which have been applied highly successfully to a wide variety of fields—statistical mechanics (e.g., Metropolis et al. [25], Hastings [20]), text generation and speech recognition (e.g., Fine et al. [16], Rabiner [29]), telephone networks (e.g., Viterbi [42]), Internet search engines (e.g., Brin and Page [3]), asset pricing (e.g., Lux [23]), and Bayesian networks (e.g., Pearl [26]), for instance. Therefore, there are many practical cases to which our combination formula can be applied (Section 4 provides concrete examples). To our knowledge, our study is the first to develop a combination formula for this important form of dependence; although other studies have examined how to combine evidential bodies assuming several specific forms of dependence among them (for instance, assuming complete positive or negative correlation among the confidence levels in the correctness of sources;
*
Corresponding author.
http://dx.doi.org/10.1016/j.ijar.2014.05.010 0888-613X/© 2014 Elsevier Inc. All rights reserved.
2110
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
see Dubois and Prade [10], Dubois and Yager [12], and Smets [36]), none of them has examined how to combine dependent evidential bodies that satisfy the conditional independence. It is important to note that our approach is fundamentally different from those taken by the previous studies that have investigated how to combine dependent evidential bodies without assuming any specific form of dependence (e.g., Cattaneo [5,4], Denœux [8], Destercke and Dubois [9]). These studies take cautious approaches based on the least commitment principle, which suggests that given a set of evidential bodies that satisfy requirements, the most appropriate is the least informative (specific) one (e.g., Smets [37], Denœux [8], Destercke and Dubois [9]). For instance, Cattaneo [5] proposes a procedure for combining dependent evidential bodies so that the resulting combined evidential body is the least specific one among those that minimize conflict (defined as a total belief mass assigned to the empty set), and Denœux [8] develops a conjunctive combination rule with which the resulting combined evidential body is the least committed one among those that are more informative than each of the dependent evidential bodies. (He also develops a disjunctive combination rule, which is applicable to cases where not all information sources are assumed to be reliable.) Thus, these studies make no assumption about the form of dependence among evidential bodies, and the results of their combination rules can be regarded as the most cautious approximations or descriptions of the combined information (Cattaneo [4]). In our study, on the other hand, we take advantage of conditional independence and establish a combination formula based on an exact expression that characterizes an a priori evidential body from which the dependent evidential bodies derive. See Sections 3–4. Also, we can recover the Dempster–Shafer formula and the TBM conjunctive rule from our combination formula. Thus, our formulation generalizes the Dempster–Shafer formulation and TBM for independent evidential bodies. See Section 6. We derive our combination formula using probability theory. As in the previous studies that utilized probability theory to examine evidence fusion (e.g., Dempster [7], Ruspini [32,33], Voorbraak [43], Fagin and Halpen [18], Liu and Hong [22]), we represent evidential bodies by probability spaces and define their conditional independence as a rigorous probabilistic concept. We describe evidence fusion in terms of the probability mass functions of the probability spaces. The remainder of this paper is organized as follows. We explain our probability-theoretic characterization of evidential bodies in Section 2. In Section 3, we provide an overview of our formulation. In Section 4, we present concrete examples of combining dependent bodies of evidence about a Markov process to illustrate various aspects of our combination formula, including its practicality. In Section 5, we establish our combination formula using a probability-theoretic framework. In Section 6, we recover the Dempster–Shafer formula and the TBM conjunctive rule from our combination formula. 2. Probability-theoretic characterization of evidential bodies Several previous studies have represented evidential bodies by probability spaces (e.g., Dempster [7], Ruspini [32,33], Voorbraak [43], Fagin and Halpen [18], Liu and Hong [22]). One way to understand this probability-theoretic formulation in terms of the classical Dempster–Shafer formulation is as follows. Let Θ denote a frame of discernment, which is usually asΘ sumed to be finite and described as a set of possible answers to a given question that must be answered. Let m : 2 → [0, 1] denote a basic probability assignment; m satisfies m(∅) = 0 and A ∈2Θ m( A ) = 1. We can also disregard the requirement m(∅) = 0 and consider a basic belief assignment in TBM (e.g., Smets and Kennes [39], Smets [38]). For each A ∈ 2Θ , m( A ) is understood to be the measure of belief committed exactly to A. The ordered pair (m, 2Θ ) can be characterized as an eviΘ dential body, which Dubois and Prade [11]). supplies information about the degrees of beliefs committed to sets in 2 (e.g., Since m satisfies m ( A ) = 1, it can be treated as a probability mass function defined on 2Θ . In our framework, we apΘ A ∈2 ply probability theory to establish a mathematically rigorous formulation of evidence fusion, so we treat each evidential body not as an ordered pair of the form (m, 2Θ ) but as a probability space, which is denoted by a triple (Ω, σ (Ω), P ) consisting of a sample space (denoted by Ω ), a σ -field in the sample space (denoted by σ (Ω)), and a probability measure defined on the σ -field (denoted by P ). In evidence theory, Ω has been typically set to 2Θ and σ (Ω) to the power set of Ω . Hence m corresponds to the probability mass function uniquely determined by P ; m is defined by m(ω) = P ({ω}) for each ω ∈ Ω . Here note that, since Ω is assumed to be countable (usually it is assumed to be finite, but it does not have to be finite in our formulation), a probability mass function on Ω is uniquely derived from P , and the probability mass function also uniquely determines the probability measure on σ (Ω); see, for instance, Billingsley [2] and Chung [6]. Therefore, if we represent an evidential body by (Ω, σ (Ω), P ), we say that the evidential body is formed on Ω and that it provides evidence or knowledge about Ω , and vice versa. 3. Overview We will use Fig. 1 to provide an overview of our framework. Consider establishing evidence on a sample space Ω . In our formulation, we express Ω as a direct product Ω1 × Ω2 × · · · × Ωn × Ωc . Note that there is no loss of generality in expressing a sample space as a direct product; even if the original sample space is not a direct product, we can embed one in it (see, for instance, Billingsley [2] and Chung [6]). It will become clear that the direct-product representation is suitable for characterizing dependent evidential bodies that are conditionally independent given their shared knowledge. Our results are valid for any n, but for concreteness, we consider n = 2; thus Ω = Ω1 × Ω2 × Ωc . See Fig. 1. The probability space (Ω, σ (Ω), P ), which is described as an a priori probability space in Fig. 1, represents a source of evidential bodies; each evidential body provides partial information about it. (This characterization will be made precise in Sections 4–5.) In his probability-theoretic formulation of evidence fusion, Voorbraak [43] also supposes that evidential
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
2111
Fig. 1. Formulation overview. We form evidence on the sample space Ω = Ω1 × Ω2 × Ωc by combining two evidential bodies: evidential body 1 formed on Ω1 × Ωc and evidential body 2 formed on Ω2 × Ωc . Knowledge on Ωc is assumed to be shared by the two evidential bodies, and it introduces dependence among them. We establish a procedure for combing the dependent evidential bodies by assuming their conditional independence given their shared knowledge so that the resulting combined evidence faithfully reflects the a priori evidence (Ω, σ (Ω), P ).
bodies result from a common a priori probability space (see also Ruspini [32]). The a priori probability space can also be regarded as the most refined evidential body that can be formed on Ω . We do not assume that the a priori probability space is known, as it is unknown in most real-world problems. In our study, we assume that reliable agents form evidential bodies; thus, in the terminology of TBM, we develop a conjunctive combination of evidential bodies (see, for instance, Smets [37] and Denœux [8]). Consider establishing evidence on Ω by combining two evidential bodies shown in Fig. 1: evidential body 1 (represented by (Ω1 × Ωc , σ (Ω1 × Ωc ), P 1 )) formed on Ω1 × Ωc , and evidential body 2 (represented by (Ω2 × Ωc , σ (Ω2 × Ωc ), P 2 )) formed on Ω2 × Ωc . (Here we can have Ω1 = Ω2 = Ωc or not; in Sections 4 and 6, we examine cases where Ω1 = Ω2 = Ωc .) This depicts a rather realistic situation in forming evidence by employing multiple agents; in practice, a sample space can be too large for any one agent to process, and when multiple agents are available, each of them will be responsible for gaining knowledge on a portion of the entire space. It is also reasonable to assume in many practical situations that there is some overlap among the assigned portions. (In Section 4, we will present cases where overlapping knowledge is necessary to establish proper evidence on a whole sample space.) Evidence formed on Ωc is assumed to be shared by the two evidential bodies, and it introduces dependence among them. How should we combine these dependent evidential bodies in order to establish evidence on the whole sample space Ω ? The resulting combined evidence (Ω, σ (Ω), P ), which is shown at the bottom of Fig. 1, is considered ideal if it faithfully reflects the a priori probability space (Ω, σ (Ω), P ), from which evidential bodies 1 and 2 derive; thus, we want P to coincide with P . This idea has also been suggested by Voorbraak [43]. It is also informative to compare our figure to Figs. 1 and 2 of Liu and Hong [22], who also examined evidence fusion in a probability-theoretic framework (however, note that in our framework, we do not discuss the multivalued mappings described in their study). Due to the shared knowledge on Ωc , independence between the two evidential bodies cannot be assumed, so it is not appropriate to combine them using the Dempster–Shafer formula or the TBM conjunctive rule. In this study, we formulate a mathematically rigorous procedure for combining the dependent evidential bodies by assuming their conditional independence given their shared knowledge. It is important to keep in mind that we do not assume that the a priori probability space is known; we merely assume the conditional independence of the evidential bodies, and this assumption suffices to recover the knowledge represented by the a priori probability space from the dependent evidential bodies by our combination formula. (There are many real-world problems in which the conditional-independence assumption can be verified easily; see Section 4.) 4. Markov examples We will formally derive our combination formula in Section 5, but in this section, we provide simple, concrete examples of combining dependent bodies of evidence about a Markov chain to illustrate various aspects of our combination formula, including its practicality. Markov chains have been successfully used in a wide variety of fields—statistical mechanics (e.g., Metropolis et al. [25], Hastings [20]), text generation (e.g., Fine et al. [16]), speech recognition (e.g., Rabiner [29]), telephone networks (e.g., Viterbi [42]), Internet search engines (e.g., Brin and Page [3]), asset pricing (e.g., Lux [23]), Bayesian networks
2112
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
Fig. 2. Markov example. At each time t (t is a nonnegative integer), a robot is in one of two states—s1 or s2 . In this figure, the robot is in s1 at time 1, s2 at time 2, and s1 at time 3.
(e.g., Pearl [26]), reinforcement learning (e.g., Sutton and Barto [40]), robotics (e.g., Thrun et al. [41]), to name a very few. Our combination formula can be very useful for combining evidential bodies about Markov processes or models, so it can be applied to many real-world problems. We will keep our examples very simple so that the reader can easily understand how to use our combination formula. We consider a robot that at each time t (t is assumed to be a nonnegative integer) is in one of two states—s1 or s2 . See Fig. 2. For each t, the robot in state si at time t will be in state s j at time t + 1 with probability p i j (1 ≤ i ≤ 2, 1 ≤ j ≤ 2). For instance, if p 1 2 = 34 , then for each t, the robot in state s1 at time t will be in state s2 at time t + 1 with probability 34 . The four probabilities p 1 1 , p 1 2 , p 2 1 , and p 2 2 are called transition probabilities. Let X t denote a random variable that represents the state of the robot at time t. Then ( X t )t ≥0 forms a time-homogeneous discrete-time Markov chain with transition probability matrix Q defined by
Q :=
p1 1 p2 1
p1 2 p2 2
(see, for instance, Ross [30]). Notice that ( X t )t ≥0 satisfies the Markovian property: For all states si 0 , si 1 , ..., sit −1 , si , s j , and all t ≥ 0,
P { X t +1 = s j | X t = si , X t −1 = sit −1 , ..., X 1 = si 1 , X 0 = si 0 } = P { X t +1 = s j | X t = si } = p i j .
(1)
The set S := {s1 , s2 } is called the state space of the chain. In Sections 4.1–4.3, we examine various cases of establishing knowledge about the robot’s behavior. To facilitate our exposition, we consider establishing a Bayesian basic probability assignment as evidence.1 Note that our formula is not limited to establishing a Bayesian basic probability assignment; we limit ourselves to this case in order to keep our examples simple. 4.1. Combining two dependent evidential bodies that are independent given their shared knowledge First, we consider establishing knowledge about the behavior of the robot from time 1 to time 3, i.e., knowledge about the robot’s state at times 1, 2, and 3. Thus, we establish a Bayesian basic probability assignment on Ω := S × S × S as evidence. In practice, such knowledge can be obtained by running the chain repeatedly and observing the behavior of the robot. Some information about the chain may be available to the agents who develop the knowledge. For instance, they may be told that the behavior of the robot can be characterized as a Markov process, although its transition probabilities and initial distribution may remain unknown to them.2 They may also obtain partial information about the transition probabilities or the initial distribution of the chain. In this section, we examine how to combine the two dependent evidential bodies shown in Fig. 3. For convenience, we will express Ω = S × S × S as S 1 × S 2 × S 3 (S 1 = S 2 = S 3 = S), where S t (1 ≤ t ≤ 3) is the set of possible states at time t. As shown in this figure, to establish evidence on Ω , we consider combining two evidential bodies: evidential body A (EB A) formed by agent A observing the behavior of the robot only from time 1 to time 2, and evidential body B (EB B) formed by agent B observing the behavior of the robot only from time 2 to time 3. This depicts a rather realistic scenario. In real-world applications of Markov decision processes, the state space S can be large. If S consists of 100 states, for instance, then Ω consists of 106 ordered triples. Thus, even if we limit ourselves to establishing a Bayesian basic probability assignment, determining the 106 values of the function may be infeasible for any one agent to handle. On the other hand, S × S consists of 104 ordered pairs, so each of EB A and EB B requires determining 104 values of a probability mass function; this may be feasible.
1
More generally, we establish a Cartesian belief assignment described in Section 5.1 as evidence. Probabilistically, the behavior of the robot can be fully characterized by the transition probabilities and the initial distribution of the Markov chain; see, for instance, Ross [30]. 2
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
2113
Fig. 3. Evidential bodies about the behavior (state) of the robot from time 1 to time 3. We form evidence on Ω = S 1 × S 2 × S 3 by combining two evidential bodies: evidential body A formed by agent A observing the behavior of the robot only from time 1 to time 2 and evidential body B formed by agent B observing the behavior of the robot only from time 2 to time 3. Knowledge about the robot’s state at time 2 is shared by the two evidential bodies.
Table 1 EB A (empirical probability mass function f A on S 1 × S 2 ) and EB B (empirical probability mass function f B on S 2 × S 3 ). (a) EB A
(b) EB B
st 1 ∈ S 1
st 2 ∈ S 2
f A (st 1 , st 2 )
st 2 ∈ S 2
st 3 ∈ S 3
f B (st 2 , st 3 )
s1
s1
.152
s1
s1
.111
s1
s2
.439
s1
s2
.346
s2
s1
.305
s2
s1
.406
s2
s2
.104
s2
s2
.137
Thus, we represent EB A and EB B by ( S 1 × S 2 , σ ( S 1 × S 2 ), P A ), and ( S 2 × S 3 , σ ( S 2 × S 3 ), P B ), respectively. Since we establish Bayesian basic probability assignments as evidence, we characterize these evidential bodies using their corresponding empirical probability mass functions; agent A develops an empirical probability mass function on S 1 × S 2 , whereas agent B develops an empirical probability mass function on S 2 × S 3 . As described in Section 3, the two agents are assumed to be reliable. Notice that they both form knowledge about the robot’s state at time 2; this results in their shared knowledge. For concreteness, we let the agents observe 1000 runs of the robot resulting from the following initial distribution π0 and transition probability Q :
π0 = Q =
1 3 1 4 3 4
2 3 3 4 1 4
(2)
, .
(3)
Here, we do not assume that any information about π0 or Q is available to the two agents; they merely observe 1000 runs of the chain. In each run, agent A observes the robot’s state only at times 1 and 2, whereas agent B observes the robot’s state only at times 2 and 3. Table 1 shows the resulting empirical probability mass functions f A and f B established by agents A and B, respectively. (We have actually simulated the 1000 runs of the chain to obtain f A and f B .) We can compare these empirical probability mass functions with the actual probability mass functions on S 1 × S 2 and S 2 × S 3 . Their values can be computed analytically as follows. Let πt denote the distribution of the chain at time t:
πt := π0 Q t .
(4)
Thus, for each state si , πt (si ) denotes the probability that the robot is in state si at time t. Let f A∗ denote the actual probability mass function on S 1 × S 2 . Then the following are the four values of f A∗ :
f A∗ (s1 , s1 ) = π1 (s1 ) p 1 1 , f A∗ (s1 , s2 ) = π1 (s1 ) p 1 2 , f A∗ (s2 , s1 ) = π1 (s2 ) p 2 1 , f A∗ (s2 , s2 ) = π1 (s2 ) p 2 2 . Similarly, we can compute the values of the actual probability mass function f B∗ on S 2 × S 3 analytically. Table 2 shows f A∗ and f B∗ . Comparing Tables 1 and 2, we can see that with 1000 observations, the empirical probability mass functions are
2114
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
Table 2 Probability mass function f A∗ on S 1 × S 2 and probability mass function f B∗ on S 2 × S 3 . (a) f A∗
(b) f B∗
st 1 ∈ S 1
st 2 ∈ S 2
f A∗ (st 1 , st 2 )
st 2 ∈ S 2
st 3 ∈ S 3
f B∗ (st 2 , st 3 )
s1
s1
≈ .146
s1
s1
s1
s2
≈ .438
s1
s2
s2
s1
≈ .313
s2
s1
s2
s2
7 48 7 16 5 16 5 48
≈ .104
s2
s2
11 96 11 32 13 32 13 96
≈ .115 ≈ .344 ≈ .406 ≈ .135
Table 3 Empirical ( f c ) and actual ( f c∗ ) probability mass functions on S 2 . st 2 ∈ S 2
f c (st 2 )
s1
.457
s2
.543
f c∗ (st 2 ) 11 24 13 24
≈ .458 ≈ .542
Table 4 ∗ ) on Ω . Combined evidence (m), empirical probability mass function ( f Ω ), and actual probability mass function ( f Ω st 1 ∈ S 1
st 2 ∈ S 2
st 3 ∈ S 3
m(st 1 , st 2 , st 3 )
f Ω (st 1 , st 2 , st 3 )
s1
s1
s1
.037
.038
s1
s1
s2
.115
.114
s1
s2
s1
.328
.328
s1
s2
s2
.111
.111
s2
s1
s1
.074
.073
s2
s1
s2
.231
.232
s2
s2
s1
.078
.078
s2
s2
s2
.026
.026
∗ (s , s , s ) fΩ t1 t2 t3 7 ≈ .036 192 7 ≈ .109 64 21 ≈ .328 64 7 ≈ .109 64 5 ≈ .078 64 15 ≈ .234 64 5 ≈ .078 64 5 ≈ .026 192
fairly close to the actual ones.3 Here, note that agents A and B cannot compute the values of f A∗ and f B∗ analytically because they do not know π0 and Q in (2)–(3). Using this simple Markov example, we will clarify the intuitive but rather vague descriptions in Section 3 about an a priori probability space and how it is related to evidential bodies. As described above, f A is an approximation of f A∗ whereas ∗ denote the actual probability mass function on Ω . (We can compute the values f B is an approximation of f B∗ . Now, let f Ω ∗ analytically; see (9) and Table 4.) Then, notice that f ∗ is the marginal mass function of f ∗ on S × S whereas f ∗ of f Ω 1 2 Ω B A ∗ on S × S . Therefore, when agents A and B establish f and f , respectively, they is the marginal mass function of f Ω 2 3 A B ∗ both make observations that derive from (the marginal mass functions of) f Ω . To characterize this, we say that EB A and EB B derive from a common a priori probability space (Ω, σ (Ω), P ), where σ (Ω) in this case is the power set of Ω and P ∗ (see Section 2). Clearly, (Ω, σ (Ω), P ) is the probability measure uniquely determined by the probability mass function f Ω characterizes the Markov chain ( X t )t ≥0 from time 1 to time 3. In this Markov example, the only thing that we will assume to know about this a priori probability space when we apply our combination formula is that ( X t )t ≥0 is a Markov chain, and we do not need to assume to know anything else about it; for instance, we do not need to assume to have any ∗ is the probability mass function that we want to obtain or approximate by knowledge about π0 or Q . Also, notice that f Ω ∗ as possible. In this regard, combining f A and f B ; we want the outcome of our combination formula to be as close to f Ω we can describe (Ω, σ (Ω), P ) as the most refined knowledge that can be established on Ω by combining the evidential bodies. Let f c denote the empirical probability mass function on S 2 . This function represents the knowledge shared by the two agents, and it can be derived from either EB A or EB B; f c equals the marginal mass function of f A on S 2 , which also equals the marginal mass function of f B on S 2 . The two marginal mass functions exactly coincide because agents A and B observe the same runs of the chain; the number of times the robot is found in state s1 (or state s2 ) at time 2 is the same for both agents. In fact, we have 3 We computed the mean of f A∗ − f A and the mean of f B∗ − f B over 1000 simulations, each consisting of 1000 runs of the chain; both means were less than .024. Table 1 shows the outcome of one of the 1000 simulations. Note that by the strong law of large numbers, for all st 1 ∈ S 1 and st 2 ∈ S 2 , f A (st 1 , st 2 ) converges to f A∗ (st 1 , st 2 ) almost everywhere as the number of observations increases to infinity, and, similarly, for all st 2 ∈ S 2 and st 3 ∈ S 3 , f B (st 2 , st 3 ) converges to f B∗ (st 2 , st 3 ) almost everywhere. A stronger form of convergence can be proved for the empirical distribution function; see the Glivenko–Cantelli Theorem (e.g., Billingsley [2]).
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
2115
f c (s1 ) = f A (s1 , s1 ) + f A (s2 , s1 )
= f B (s1 , s1 ) + f B (s1 , s2 ) = .457, f c (s2 ) = f A (s1 , s2 ) + f A (s2 , s2 )
= f B (s2 , s1 ) + f B (s2 , s2 ) = .543. Again, the values of the actual probability mass function f c∗ on S 2 (i.e., the distribution of the chain at time 2) can be computed analytically; we have
f c∗ (s1 )
f c∗ (s2 ) = π2 = π0 Q 2 .
Table 3 compares f c and f c∗ . Note that f c∗ can be obtained from either f A∗ or f B∗ , just as f c can be obtained from either f A or f B . In practice, there may be cases in which two agents observe different sets of runs; for instance, agent A may observe the first N a runs of the chain, and agent B may observe the next N b runs of the chain. Even in such cases, their marginal mass functions on S 2 will be very close to one another as long as N a and N b are sufficiently large. Also, the two agents can combine their knowledge to establish the shared knowledge. For instance, suppose that agent A estimates f c (s1 ) to be na / N a , where na denotes the number of times agent A finds the robot in state s1 at time 2. Similarly, suppose that agent B estimates f c (s1 ) to be nb / N b , where nb denotes the number of times agent B finds the robot in state s1 at time 2. Then they can merge their knowledge and set the value of f c (s1 ) to (na + nb )/( N a + N b ). By the strong low of large numbers, the value converges with probability 1 to the actual probability as the number of observations goes to infinity. When they merge their knowledge on the overlap, the other values of their mass functions must be adjusted accordingly. See Appendix A. We want to combine EB A and EB B to establish knowledge about the robot’s behavior from time 1 to time 3; more precisely, we want to combine f A and f B to establish a probability mass function on S 1 × S 2 × S 3 . If the two evidential bodies are conditionally independent given their shared knowledge (we will mathematically formalize conditional independence of evidential bodies in Section 5; see Definition 5.2), then we can apply our combination formula to combine them. We can show that in this case the conditional independence holds if, for all states si , s j , sk such that P { X 2 = s j } = 0, we have
P { X 1 = si , X 3 = sk | X 2 = s j } = P { X 1 = si | X 2 = s j } P { X 3 = sk | X 2 = s j }.
(5)
In words, the conditional independence of EB A and EB B given their shared knowledge holds if the robot’s states at times 1 and 3 are conditionally independent given the robot’s state at time 2. This condition can be verified as follows. We have
P { X 1 = si , X 3 = sk | X 2 = s j } =
= =
P { X 1 = si , X 2 = s j , X 3 = sk } P { X2 = s j } P { X 3 = sk | X 1 = si , X 2 = s j } P { X 1 = si , X 2 = s j } P { X2 = s j } P { X 3 = sk | X 2 = s j } P { X 1 = si , X 2 = s j } P { X2 = s j }
,
(6)
where the last equality follows from the Markov property (1). Also, we have
P { X 1 = si , X 2 = s j } P { X2 = s j }
= P { X 1 = si | X 2 = s j }.
(7)
From (6)–(7), we obtain (5). Therefore, EB A and EB B are conditionally independent given their shared knowledge about the state of the robot at time 2. Notice that this condition can be verified by simply knowing that ( X t )t ≥0 forms a Markov chain; we do not need any information about its transition matrix Q or initial distribution π0 . In fact, any Markov process satisfies the condition (5), so it suffices to only know that EB A and EB B derive from a common Markov process. Upon verifying the conditional independence of the dependent evidential bodies, we can apply our combination formula, which will be established in Theorem 5.1. Let m denote the resulting probability mass function on Ω of the combined evidence. Then, for each (si , s j , sk ) ∈ Ω , our formula gives
m(si , s j , sk ) =
f A (si , s j ) f B (s j , sk ) f c (s j )
(8)
.
For instance, we have
m(s 1 , s 1 , s 1 ) = m(s 1 , s 1 , s 2 ) =
f A (s1 , s1 ) f B (s1 , s1 ) f c (s1 ) f A (s1 , s1 ) f B (s1 , s2 ) f c (s1 )
= .037, = .115,
2116
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
m(s 1 , s 2 , s 1 ) =
f A (s1 , s2 ) f B (s2 , s1 ) f c (s2 )
= .328.
Notice that the functions on the right-hand side of (8) all come from EB A and EB B: f A from EB A, f B from EB B, and f c derived from either EB A or EB B. Table 4 shows the resulting m as well as the empirical probability mass function f Ω on Ω . Here note that neither agent A nor agent B alone can establish f Ω ; establishing f Ω requires observing the behavior ∗ of the robot from time 1 to time 3 in each of the 1000 runs. The table also shows the actual probability mass function f Ω on Ω , the values of which can be computed analytically; for instance, we have
f Ω ( s 1 , s 1 , s 1 ) = π1 ( s 1 ) p 1 1 p 1 1 =
7 11 12 4 4
7
=
192
(9)
.
We can see that the combined evidence m obtained by our combination formula is very close to the empirical probability ∗. mass function f Ω , which in turn is very close to the actual probability mass function f Ω Thus, once EB A and EB B are established, we can obtain the combined evidence in two steps; first derive their shared knowledge ( f c ) from either EB A ( f A ) or EB B ( f B ), and then apply our combination formula. It is important to note that if we apply our combination formula to combine f A∗ and f B∗ (the actual probability mass functions on S 1 × S 2 and S 2 × S 3 , respectively), then the outcome m∗ of the combination formula exactly coincides with the actual probability mass ∗ on Ω , as stated by the following theorem: function f Ω Theorem 4.1. For each (si , s j , sk ) ∈ Ω , we have ∗ fΩ (si , s j , sk ) = m∗ (si , s j , sk ) =
f A∗ (si , s j ) f B∗ (s j , sk ) f c∗ (s j )
(10)
.
Proof. We have ∗ fΩ (si , s j , sk ) = P { X 1 = si , X 2 = s j , X 3 = sk }
= P { X 1 = si , X 3 = sk | X 2 = s j } P { X 2 = s j } = P { X 1 = si | X 2 = s j } P { X 3 = sk | X 2 = s j } P { X 2 = s j },
(11)
where the last equality follows from (5). Here we have
P { X 1 = si | X 2 = s j } = P { X 3 = sk | X 2 = s j } =
P { X 1 = si , X 2 = s j } P { X2 = s j }
P { X 2 = s j , X 3 = sk } P { X2 = s j }
(12)
,
(13)
.
Thus, it follows from (11)–(13) that ∗ fΩ (si , s j , sk ) =
= =
P { X 1 = si , X 2 = s j } P { X 2 = s j , X 3 = sk } P { X2 = s j }
P { X2 = s j }
P { X 1 = si , X 2 = s j } P { X 2 = s j , X 3 = sk } P { X2 = s j } f ∗ (s A
∗ i , s j ) f B (s j , sk ) ∗ f (s ) c
as desired.
P { X2 = s j }
j
= m∗ (si , s j , sk ),
2
Theorem 4.1 is the justification for (8); the formula is based on the exact expression (10). 4.2. Combining non-overlapping evidential bodies It is actually important that the two evidential bodies share knowledge about the robot’s state at time 2 in forming knowledge about Ω . To demonstrate this point, consider establishing knowledge on Ω using the three evidential bodies shown in Fig. 4: EB D formed by agent D observing the robot’s state only at time 1; EB E formed by agent E observing the robot’s state only at time 2; and EB F formed by agent F observing the robot’s state only at time 3. Thus, there is no overlapping knowledge among these evidential bodies. To compare this case with the case of EB A and EB B described in Section 4.1, we let agents D, E, and F observe the same 1000 runs of the chain that agents A and B observed in establishing EB A and EB B. Tables 5–7 show the empirical probability mass functions f D , f E , and f F established by agents D, E, and F, respectively. These tables also show the actual probability mass functions f D∗ , f E∗ , and f F∗ on S 1 , S 2 , and S 3 , respectively.
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
2117
Fig. 4. Non-overlapping evidential bodies. In this case, evidence on Ω = S 1 × S 2 × S 3 is formed by combining three evidential bodies: EB D formed by agent D observing the robot’s state only at time 1, EB E formed by agent E observing the robot’s state only at time 2, and EB F formed by agent F observing the robot’s state only at time 3.
Table 5 Empirical ( f D ) and actual ( f D∗ ) probability mass functions on S 1 . st 1 ∈ S 1
f D (st 1 )
s1
.591
s2
.409
f D∗ (st 1 ) 7 12 5 12
≈ .583 ≈ .417
Table 6 Empirical ( f E ) and actual ( f E∗ ) probability mass functions on S 2 . st 2 ∈ S 2
f E (st 2 )
s1
.457
s2
.543
f E∗ (st 2 ) 11 24 13 24
≈ .458 ≈ .542
Table 7 Empirical ( f F ) and actual ( f F∗ ) probability mass functions on S 3 . st 3 ∈ S 3
f F (st 3 )
s1
.517
s2
.483
f F∗ (st 3 ) 25 48 23 48
≈ .521 ≈ .479
Table 8 Comparisons of m (combined evidence obtained by our combination formula), m (combined evidence obtained by the Dempster–Shafer formula), and f Ω (empirical probability mass function on Ω ). st 1 ∈ S 1
st 2 ∈ S 2
st 3 ∈ S 3
m(st 1 , st 2 , st 3 )
f Ω (st 1 , st 2 , st 3 )
m (st 1 , st 2 , st 3 )
s1 s1 s1 s1 s2 s2 s2 s2
s1 s1 s2 s2 s1 s1 s2 s2
s1 s2 s1 s2 s1 s2 s1 s2
.037 .115 .328 .111 .074 .231 .078 .026
.038 .114 .328 .111 .073 .232 .078 .026
.139 .130 .166 .155 .097 .090 .115 .107
Note that f D∗ = π1 = π0 Q , f E∗ = π2 = π0 Q 2 , and f F∗ = π3 = π0 Q 3 . (See (4).) Also, notice that f E in Table 6 coincides with f c in Table 3. We use the Dempster–Shafer formula to combine EB D, EB E, and EB F. Let m denote the resulting probability mass function on Ω . Then for each (si , s j , sk ) ∈ Ω ,
m (si , s j , sk ) = f D (si ) f E (s j ) f F (sk ).
(14)
Table 8 compares m, m and f Ω (the empirical probability mass function on Ω ). We can see that the difference between m and f Ω is large compared to the difference between m and f Ω . The mean of m − f Ω and the mean of m − f Ω
computed over 1000 simulations, each consisting of 1000 runs of the chain, were .010 and .262, respectively. It is important ∗ on Ω when it combines to note that outcome of the formula (14) does not equal the actual probability mass function f Ω ∗ ∗ ∗ the actual probability mass functions f D , f E , and f F on S 1 , S 2 , and S 3 , respectively; if we let m ∗ denote the resulting probability mass function, then for each (si , s j , sk ) ∈ Ω , we have ∗ m ∗ (si , s j , sk ) = f D∗ (si ) f E∗ (s j ) f F∗ (sk ) = f Ω (si , s j , sk ).
(15)
2118
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
Fig. 5. Combining three evidential bodies. Evidence on Ω = S 1 × S 2 × S 3 × S 4 is formed by combining three evidential bodies; EB α formed by agent α observing the robot’s state at times 1 and 2; EB β formed by agent β observing the robot’s state at times 2 and 3; EB γ formed by agent γ observing the robot’s state at times 3 and 4.
Compare (15) and Theorem 4.1. Equality does not hold in (15) because the evidential bodies are probabilistically dependent, although there is no overlapping knowledge among them. There is no general formula to obtain f ∗ (si , s j , sk ) from f D∗ , f E∗ , and f F∗ . 4.3. Combining more than two dependent evidential bodies Our combination formula can be applied repeatedly to establish a formula for combining more than two evidential bodies. Consider combining the three evidential bodies shown in Fig. 5: EB α formed by agent α observing the robot’s state at times 1 and 2, EB β formed by agent β observing the robot’s state at times 2 and 3, and EB γ formed by agent γ observing the robot’s state at times 3 and 4. Notice that EB α and EB β are conditionally independent given their shared knowledge about the robot’s state at time 2 and that EB β and EB γ are conditionally independent given their shared knowledge about the robot’s state at time 3. Let f α , f β , and f γ denote the empirical probability mass functions obtained by agents α , β , and γ , respectively. To establish a probability mass function on Ω , we first combine EB α and EB β using our combination formula. Let f α β denote the outcome of the combination formula. Then, for each (si , s j , sk ) ∈ S 1 × S 2 × S 3 , we have
f α (si , s j ) f β (s j , sk )
f α β (si , s j , sk ) =
f c (s j )
(16)
,
where f c is the empirical probability mass function on S 2 derived from either f α or f β . We describe the resulting combined evidential body as EB α β . Next, we combine EB α β and EB γ to obtain a probability mass function on Ω . We can show that EB α β and EB γ are conditionally independent given their shared knowledge about the robot’s state at time 3, so again we apply our combination formula to combine them. Let μ denote the resulting probability mass function on Ω . Then for each (si , s j , sk , sl ) ∈ Ω , we have
μ(si , s j , sk , sl ) =
f α β (si , s j , sk ) f γ (sk , sl ) f d (sk )
(17)
,
where f d is the empirical probability mass function on S 3 , which can be derived from either f β or f γ . From (16)–(17), we have
μ(si , s j , sk , sl ) =
f α (si , s j ) f β (s j , sk ) f γ (sk , sl )
(18)
f c (s j ) f d (sk )
for each (si , s j , sk , sl ) ∈ Ω . This is the combination formula for combining the three evidential bodies in Fig. 5. It is easy to see that we can also obtain (18) by first combining EB β and EB γ and then combining the resulting evidential body with EB α . ∗ denote the actual probability mass functions on Now we provide a justification for (18). Let f α∗ , f β∗ , f γ∗ , f c∗ , f d∗ , and f Ω S 1 × S 2 , S 2 × S 3 , S 3 × S 4 , S 2 , S 3 , and Ω , respectively. Then we have the following theorem: Theorem 4.2. For each (si , s j , sk , sl ) ∈ Ω , we have ∗ fΩ (si , s j , sk , sl ) =
f α∗ (si , s j ) f β∗ (s j , sk ) f γ∗ (sk , sl ) f c∗ (s j ) f d∗ (sk )
.
(19)
Proof. This proof is analogous to the proof of Theorem 4.1, so we will omit details. We have ∗ fΩ (si , s j , sk , sl ) = P { X 1 = si , X 2 = s j , X 3 = sk , X 4 = sl }
= P { X 1 = si , X 3 = sk , X 4 = sl | X 2 = s j } P { X 2 = s j } = P { X 1 = si | X 2 = s j } P { X 3 = sk , X 4 = sl | X 2 = s j } P { X 2 = s j } f ∗ (si , s j ) P { X 2 = s j , X 3 = sk , X 4 = sl } . = α f c∗ (s j )
(20)
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
2119
Here
P { X 2 = s j , X 3 = sk , X 4 = sl } = Hence (19) follows from (20)–(21).
f β∗ (s j , sk ) f γ∗ (sk , sl ) f d∗ (sk )
(21)
.
2
Just as Theorem 4.1 is the justification for (8), Theorem 4.2 is the justification for (18). We can easily extend (18) to combining more than three dependent evidential bodies. 5. Derivation of combination formula In this section, we establish our formula for combining evidential bodies that are conditionally independent given their shared knowledge. This section is organized as follows. In Section 5.1, we explain the concept of Cartesian belief assignment and describe how a basic belief assignment can be derived from a Cartesian belief assignment. In Section 5.2, we explain how evidential bodies can be represented by probability spaces. In Section 5.3, we describe the concept of a priori probability space. In Section 5.4, we describe how evidential bodies derive from an a priori probability space. In Section 5.5, we formally define conditional independence of evidential bodies given their shared knowledge. Finally, in Section 5.6, we establish a formula for combining evidential bodies under the conditional-independence assumption. 5.1. Cartesian belief assignment As described in Sections 3–4, we consider evidential bodies that are formed on Cartesian products—in Fig. 1, for instance, evidential bodies are formed on Ω1 × Ωc and Ω2 × Ωc . As illustrated in Section 4, Cartesian-product representations are effective for describing conditioning events and conditionally independent events. In this section, we characterize such evidential bodies in detail. It is important to recognize that our formulation is an extension of the Dempster–Shafer framework, not a specialization of it in forming evidence on Cartesian products. In fact, we can fully recover the Dempster–Shafer formula or the TBM conjunctive rule from our formulation (see Section 6), clearly demonstrating that our formulation is not limited to Cartesian products. n In our formulation, when we consider an evidential body formed on χ := χ1 × χ2 × · · · × χn = i =1 χi , each factor χi of Θ i the Cartesian product is typically considered to be a power set 2 of some frame of discernment Θi . Here we do not have to assume that the frame of discernment is finite; we just assume that it is countable. In the Dempster–Shafer framework and TBM, a basic probability or belief assignment is defined on the power set of a frame of discernment (e.g., Shafer [35], Smets and Kennes [39], Smets [38]). In our formulation, we define an analogous function, which we call a Cartesian belief assignment:
n
Definition 5.1. The Cartesian belief assignment of a body of evidence formed on a Cartesian product χ = i=1 χi , where χi = 2Θi and Θi is a frame of discernment, is defined to be a function m : χ → [0, 1] such that x∈χ m(x) = 1. Here, for each x ∈ χ , m(x) represents the degree of belief assigned to x by the evidential body. If a Cartesian product consists of only one factor, then a Cartesian belief assignment defined on it becomes a basic belief assignment. In investigating how to combine dependent evidential bodies that are conditionally independent given their shared knowledge, it is rather natural to consider evidential bodies formed on Cartesian products, and we will deal with Cartesian belief assignments when we combine evidential bodies.4 It is possible to derive a basic belief assignment from a Cartesian belief assignment. See the following example: Example 5.1. Suppose that we want to identify a car that was involved in an accident. It is believed to be one of three cars (cars 1, 2, and 3). Let Θ := {car1, car2, car3} denote the frame of discernment of this problem. Consider establishing a basic belief assignment defined on 2Θ . The three cars in Θ have the following characteristics: (i) car 1 is red and has a big engine; (ii) car 2 is red and has a small engine; (iii) car 3 is green and has a big engine. See Table 9. Thus, we assign descriptive labels (red, big), (red, small), and (green, big) to cars 1, 2, and 3, respectively. Let Θ1 := {red, green} and Θ2 := {big, small}; Θ1 denotes the frame of discernment regarding the car color, and Θ2 denotes the frame of discernment regarding the engine size. Instead of directly establishing a basic belief assignment m on 2Θ , an agent first establishes a Cartesian belief assignment f on 2Θ1 × 2Θ2 . (It may be the case that evidence can be collected only on the car color and the engine size.) We can derive m from f using the function ρ : 2Θ1 × 2Θ2 → 2Θ shown in Table 10. Here any belief assigned to ∅ is interpreted 4 In Section 4, we treat the probability mass functions (e.g., f A and f B in Section 4.1) as Bayesian basic probability assignments in order to simplify our exposition, but we need Cartesian belief assignments to deal with more general evidential bodies. For instance, in Section 4.1, if we treat S 1 , S 2 , and S 3 as frames of discernment and reformulate EB A and EB B as evidential bodies on 2 S 1 × 2 S 2 and 2 S 2 × 2 S 3 , respectively, then agents A and B establish Cartesian belief assignments that represent their knowledge. We can still use our combination formula to combine them.
2120
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
Table 9 Characteristics of three cars in Θ . Car
Color
Engine
car 1 car 2 car 3
red red green
big small big
Table 10 Function
ρ : 2Θ1 × 2Θ2 → 2Θ .
x ∈ 2Θ1 × 2Θ2
ρ (x)
(∅, ∅) (∅, {big}) (∅, {small}) (∅, {big, small}) ({red}, ∅) ({red}, {big}) ({red}, {small}) ({red}, {big, small}) ({green}, ∅) ({green}, {big}) ({green}, {small}) ({green}, {big, small}) ({red, green}, ∅) ({red, green}, {big}) ({red, green}, {small}) ({red, green}, {big, small})
∅ ∅ ∅ ∅ ∅ {car1 (= (red, big))} {car2 (= (red, small))} {car1, car2} ∅ {car3 (= (green, big))} ∅ {car3} ∅ {car1, car3} {car2} {car1, car2, car3}
as expressing support for an element that is not in the frame of discernment (e.g., Smets and Kennes [39], Destercke and Dubois [9]). For instance, ρ (∅, {big}) = ∅ because if the color is neither red nor green, then the car is not one of the three cars in Θ . Also, ρ ({green}, {big, small}) = {car3} because the singleton {green} identifies car 3 (only car 3 is green) while {big, small} applies to all the cars (the engine is either big or small for the three cars). For each θ ∈ Θ , we set
m(θ) =
f (x).
(22)
x∈2Θ1 ×2Θ2 :ρ (x)=θ 1 Table 11 shows the values of m when f (x) = 16 for all x ∈ 2Θ1 × 2Θ2 . Hence, once the agent establishes the Cartesian belief Θ Θ assignment f on 2 1 × 2 2 , we can derive a basic belief assignment m on 2Θ through (22). 2
An important form of ρ is described in Section 6, where we recover the Dempster–Shafer formula and the TBM conjunctive rule from our combination formula. As stated earlier, our formulation is an extension of the Dempster–Shafer formulation, not a specialization of it in forming evidence on Cartesian products. In addition to being able to handle cases where evidential bodies are dependent but conditionally independent given common knowledge, our formulation can also handle conventional cases where evidential bodies are assumed to be independent and formed on the power set of a frame of discernment. The function ρ can be used to induce a basic belief assignment from a Cartesian belief assignment, and it specifies how one frame of discernment can be obtained from another. It is also quite flexible. For instance, in Example 5.1, Θ1 and Θ2 can be considered the defining attributes of elements in Θ , and ρ indicates how the attributes correspond to the elements in Θ . In Section 6, on the other hand, ρ represents the operation of intersection in a Boolean algebra used in the Dempster–Shafer formulation. In both Example 5.1 and Section 6, ρ is an onto function, but it may not. As can be inferred from Example 5.1, ρ can also be used to identify an evidential structure (Besnard, Jaouen, and Perin [1]).5 See [1] for details. n Thus, a Cartesian belief assignment defined on i =1 χi represents the knowledge established by an evidential body formed on the Cartesian product. To simplify notation, we will avoid expressing χi as 2Θi unless it is necessary. Also, our formulation does not require that each χi is a power set. (In Section 4.1, for instance, 2 S 1 × 2 S 2 and 2 S 2 × 2 S 3 are reduced 5 An evidential structure refers to a triple that consists of a countable set Ω , a distributive lattice with a greatest element and a least element generated from Ω , and a set of pairs of elements of Ω such that the meet of the elements of each pair is equivalent to a least element (Besnard, Jaouen, and Perin [1]). This structure was introduced to examine the nature of contradiction resulting from evidence fusion. One of the examples described in [1] is as follows. Consider forming evidence about an object whose form is either “liquid” or “solid” while its color is either “white” or “black” and that one source describes the object as “white liquid” while another source describes it as “black liquid.” If this is formalized in the classical Dempster–Shafer framework, then the two descriptions will be simply considered contradictory, ignoring the fact that they agree on “liquid”. By analyzing ρ ’s pre-image of the empty set, we can see how a contradiction occurs.
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
Table 11 Values of m derived from (22) when f (x) = x ∈ 2Θ1 × 2Θ2 .
θ ∈Θ ∅ {car1} {car2} {car3} {car1, car2} {car1, car3}
1 16
2121
for all
m(θ) 1 2 1 16 1 8 1 8 1 16 1 16
{car2, car3}
0
{car1, car2, car3}
1 16
to S 1 × S 2 and S 2 × S 3 , respectively, and evidential bodies are established on the reduced sample spaces. Our combination formula can still be applied to combine such evidential bodies properly.) 5.2. Probability spaces that represent evidential bodies As described in Section 2, we represent each evidential body by a probability space. Let (χ , σ (χ ), P ) denote an evidential body. As explained in Section 5.1, χ is a Cartesian product, and the knowledge established by this evidential body, which is a Cartesian belief assignment on χ , is the probability mass function of (χ , σ (χ ), P ). As χ is countable, σ (χ ) will be set to 2χ . The probability measure P is uniquely determined by the Cartesian belief assignment, because it is the probability mass function of this probability space. Since we establish our combination formula using probability spaces, we will also refer to the Cartesian belief assignment of each evidential body as the probability mass function of the corresponding probability space. 5.3. A priori probability space Let Ω := Ω1 × Ω2 × Ωc . We let (Ω, σ (Ω), P ) denote an a priori probability space. As described in Section 4.1, evidential bodies are assumed to derive from this probability space; (Ω, σ (Ω), P ) is the source of observations that agents make in forming evidential bodies. The probability mass function of (Ω, σ (Ω), P ) represents the Cartesian belief assignment that we want to attain by combining the Cartesian belief assignments of dependent evidential bodies. 5.4. Evidential bodies We consider two evidential bodies—evidential body 1 (EB 1) formed by agent 1 and evidential body 2 (EB 2) formed by agent 2—that share knowledge on Ωc . We let (Ω1 × Ωc , σ (Ω1 × Ωc ), P 1 ) and (Ω2 × Ωc , σ (Ω2 × Ωc ), P 2 ) represent EB 1 and EB 2, respectively. Thus, agents 1 and 2 form evidence on Ω1 × Ωc and Ω2 × Ωc , respectively. These two probability spaces are related to (Ω, σ (Ω), P ) as follows. Let m1 , m2 , and μ denote the probability mass functions of EB 1, EB 2, and the a priori probability space, respectively. Then m1 is defined to be the marginal probability mass function of μ on Ω1 × Ωc : For each (ω1 , ωc ) ∈ Ω1 × Ωc ,
m1 (ω1 , ωc ) =
μ(ω1 , ω2 , ωc ).
(23)
ω2 ∈Ω2
Similarly, m2 is defined to be the marginal probability mass function of m on Ω2 × Ωc : For each (ω2 , ωc ) ∈ Ω2 × Ωc ,
m2 (ω2 , ωc ) =
μ(ω1 , ω2 , ωc ).
(24)
ω1 ∈Ω1
Let mc denote the marginal probability mass function of
mc (ωc ) =
μ(ω1 , ω2 , ωc ).
μ on Ωc ; for each ωc ∈ Ωc , (25)
ω1 ∈Ω1 ω2 ∈Ω2
Notice that the marginal probability mass function of m1 on Ωc and the marginal probability mass function of m2 on Ωc both coincide with mc ; for each ωc ∈ Ωc ,
mc (ωc ) =
ω1 ∈Ω1
m1 (ω1 , ωc ) =
m2 (ω2 , ωc ).
(26)
ω2 ∈Ω2
Hence, mc can be derived from either EB 1 or EB 2. We let (Ωc , σ (Ωc ), P c ) denote the probability space that represents the shared knowledge. Thus, P c is the probability measure uniquely determined by the probability mass function mc on Ωc .
2122
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
5.5. Conditional independence of evidential bodies given their shared knowledge To characterize conditional independence of evidential bodies given their shared knowledge, we first define the following sets for each ω1 ∈ Ω1 , ω2 ∈ Ω2 , and ωc ∈ Ωc :
A ω1 :=
(ω1 , x2 , xc ) ,
x2 ∈Ω2 , xc ∈Ωc
B ω2 :=
x1 ∈Ω1 , xc ∈Ωc
C ωc :=
(x1 , ω2 , xc ) ,
(x1 , x2 , ωc ) .
x1 ∈Ω1 , x2 ∈Ω2
Note that A ω1 , B ω2 , and C ωc are in we have
P ( A ω1 ) =
σ (Ω). These sets can be understood intuitively as follows. Notice that for each ω1 ∈ Ω1 ,
μ(ω1 , ω2 , ωc ),
ω2 ∈Ω2 ωc ∈Ωc
i.e., P ( A ω1 ) is the marginal probability mass assigned to
P ( A ω1 ) =
ω1 ∈ Ω1 . Hence, it can also be derived from EB 1 alone;
m1 (ω1 , ωc ).
ωc ∈Ωc
Therefore, each A ω1 can be regarded as an event that provides knowledge on Ω1 formed by EB 1. Similarly, each B ω2 can be regarded as an event that provides knowledge on Ω2 formed by EB 2. Likewise, each C ωc can be regarded as an event that provides knowledge on Ωc that is shared by EB 1 and EB 2;
P (C ωc ) = mc (ωc ) =
m1 (ω1 , ωc ) =
ω1 ∈Ω1
m2 (ω2 , ωc ).
ω2 ∈Ω2
Note that the intersection of A ω1 , B ω2 , and C ωc is a singleton of Ω :
A ω1 ∩ B ω2 ∩ C ωc = (ω1 , ω2 , ωc ) . Now we are ready to formally characterize conditional independence of EB 1 and EB 2 given their shared knowledge. Definition 5.2. EB 1 and EB 2 are said to be conditionally independent given their shared knowledge if for all
ω2 ∈ Ω2 and for all ωc ∈ Ωc such that mc (ωc ) = 0, we have P ( A ω1 B ω2 |C ωc ) = P ( A ω1 |C ωc ) P ( B ω2 |C ωc ).
ω1 ∈ Ω1 and (27)
In words, the expression (27) means that the knowledge established by EB 1 on Ω1 and the knowledge established by EB 2 on Ω2 are conditionally independent given their shared knowledge on Ωc .6 The condition (27) can be verified easily in many real-world problems. For instance, the conditional independence holds for the evidential bodies described in Sections 4.1 and 4.3, provided that ( X t )t ≥0 forms a Markov chain. (No knowledge about the chain’s initial distribution or transition matrix is necessary.) In the case of these chains, we can be assured that condition (5), which is (27) for the Markov examples, is satisfied. 5.6. Combination formula Finally, via the following theorem, we establish a formula for combining EB 1 and EB 2 when they are conditionally independent given their shared knowledge. Under the conditional-independence assumption, this theorem shows how to derive, from EB 1 and EB 2, an exact expression for μ, the probability mass function (i.e., Cartesian belief assignment) of the a priori probability space. We want the probability mass function of the combined evidence to equal μ; thus, the expression m from the theorem serves as our combination formula.
6
This statement can be made mathematically precise. Consider the following classes of sets:
A := { A ω1 | ω1 ∈ Ω1 } ∪ {∅},
B := { B ω2 | ω2 ∈ Ω2 } ∪ {∅},
C := { A ωc | ωc ∈ Ωc } ∪ {∅}.
Let σ (A ), σ (B), and σ (C ) denote the σ -fields generated by A , B, and C , respectively. (Thus, for instance, σ (A ) is the intersection of all σ -fields containing A .) Notice that they are all sub-σ -fields of σ (Ω). Then, (27) of Definition 5.2 is equivalent to the conditional independence of σ (A ) and σ (B) given σ (C ). See, for instance, Billingsley [2] and Pollard [27].
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
2123
Theorem 5.1. Suppose that EB 1 and EB 2 are conditionally independent given their shared knowledge. Define m : Ω → [0, 1] by
m(ω1 , ω2 , ωc ) =
0
m1 (ω1 ,ωc )m2 (ω2 ,ωc ) mc (ωc )
if mc (ωc ) = 0, otherwise
(28)
for each ω1 ∈ Ω1 , ω2 ∈ Ω2 , and ωc ∈ Ωc . Then
μ(ω1 , ω2 , ωc ) = m(ω1 , ω2 , ωc ) for each ω1 ∈ Ω1 , ω2 ∈ Ω2 , and ωc ∈ Ωc . Proof. Since
μ(ω1 , ω2 , ωc ) ≤ mc (ωc ), we have μ(ω1 , ω2 , ωc ) = 0 if mc (ωc ) = 0. Suppose that mc (ωc ) = 0. Then,
μ(ω1 , ω2 , ωc ) = P ( A ω1 B ω2 C ωc ) = P ( A ω1 B ω2 |C ωc )mc (ωc ) = P ( A ω1 |C ωc ) P ( B ω2 |C ωc )mc (ωc ),
(29)
where the last equality follows from (27). Here
P ( A ω1 |C ωc ) = P ( B ω2 |C ωc ) =
P ( A ω1 C ωc ) mc (ωc ) P ( B ω2 C ωc ) mc (ωc )
m1 (ω1 , ωc )
= =
,
(30)
.
(31)
mc (ωc ) m2 (ω2 , ωc ) mc (ωc )
Therefore, from (29)–(31), we obtain
μ(ω1 , ω2 , ωc ) = = as desired.
m1 (ω1 , ωc ) m2 (ω2 , ωc ) mc (ωc )
mc (ωc )
m1 (ω1 , ωc )m2 (ω2 , ωc ) mc (ωc )
mc (ωc )
,
2
Notice that all the terms on the right-hand side of (28) can be derived from EB 1 and EB 2; m1 from EB 1, m2 from EB 2, and mc from either EB 1 or EB 2. The expression (28) is applied to combine evidential bodies in Sections 4.1 and 4.3. 6. Obtaining Dempster–Shafer formula and TBM conjunctive rule from our combination formula In our formulation, the Dempster–Shafer formula or the TBM conjunctive rule can be considered a special case of our combination formula (28). In this section, we explain how to recover them from (28). In the Dempster–Shafer formulation and TBM, agents form evidence on a common sample space, call it χ := 2Θ , where Θ denotes a frame of discernment. Thus we set χ1 = χ2 = χc = χ and suppose that agent 1 forms evidence EB 1 on χ1 × χc whereas agent 2 forms EB 2 on χ2 × χc .7 Let m1 and m2 denote the Cartesian belief assignments established by agents 1 and 2, respectively. We also let mc denote the Cartesian belief assignment (basic belief assignment) on χc that represents the knowledge shared by EB 1 and EB 2. The independence of evidential bodies assumed in the Dempster–Shafer framework and TBM translates into vacuousness of the shared knowledge in our framework; vacuous common knowledge does not cause dependence among them. In this case, EB 1 can be considered providing evidence only on χ1 although it is formed on χ1 × χc . Mathematically, this can be characterized by
m1 (ω1 , ωc ) = 0 if ωc = Θ
(32)
for each ω1 ∈ χ1 . Therefore, the basic belief assignment m∗1
m∗1 (ω1 ) = m1 (ω1 , Θ)
: χ1 → [0, 1] defined by
∀ω1 ∈ χ1
characterizes the evidence provided by m1 . Similarly, for each
ω2 ∈ χ2 , we have
m2 (ω2 , ωc ) = 0 if ωc = Θ, and the basic belief assignment m∗2 ∗
m2 (ω2 ) = m2 (ω2 , Θ) 7
(33)
: χ2 → [0, 1] defined by
∀ω2 ∈ χ2
Since we later assume that their common knowledge on
χc is vacuous, χc can be any nonempty set.
2124
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
characterizes the evidence provided by m2 . Notice that from either (32) or (33),
mc (ωc ) =
1 0
if ωc = Θ, otherwise,
clearly representing vacuous common knowledge. Note that with this mc , the right-hand side of (28) becomes either 0 or m1 (ω1 , ωc )m2 (ω2 , ωc ). Let f denote the Cartesian belief assignment of the combined evidence obtained by our combination formula. Then, using the m1 , m2 , and mc described above, (28) can be reexpressed as follows: For each ω1 ∈ χ1 , ω2 ∈ χ2 , and ωc ∈ χc ,
f (ω1 , ω2 , ωc ) =
0 m1 (ω1 , Θ)m2 (ω2 , Θ) = m∗1 (ω1 )m∗2 (ω2 )
if ωc = Θ, otherwise.
(34)
Since the knowledge on χc is vacuous, we eliminate it from (34) to obtain the following Cartesian belief assignment f on χ1 × χ2 : For each ω1 ∈ χ1 and ω2 ∈ χ2 ,
f (ω1 , ω2 ) = m∗1 (ω1 )m∗2 (ω2 ).
(35)
To obtain a basic belief assignment m on
ρ : χ1 × χ2 → χ defined by
χ=
2Θ from the Cartesian belief assignment f , we consider the function
ρ (ω1 , ω2 ) = ω1 ∩ ω2 for each ω1 ∈ χ1 and ω2 ∈ χ2 . Recall that follows: For each θ ∈ 2Θ ,
m(θ) =
(ω1 ,ω2 ρ (ω1 ,ω2 )=θ
ω1 ,ω2
f (ω1 , ω2 ) =
)∈2Θ ×2Θ ,
=
χ1 = χ2 = 2Θ . Then, as shown in (22) of Example 5.1, we can derive m from f as
ω1 ,ω2
f (ω1 , ω2 )
∈2Θ , ω
1 ∩ω2 =θ
m∗1 (ω1 )m∗2 (ω2 ).
(36)
∈2Θ , ω
1 ∩ω2 =θ
This is the TBM conjunctive rule. Notice that m∗1 and m∗2 are basic belief assignments on 2Θ . Clearly, the right-hand side of (36) can be normalized to obtain the Dempster–Shafer formula. Thus, we can recover the Dempster–Shafer formula and the TBM conjunctive rule from our combination formula. 7. Conclusion To our knowledge, our study is the first to rigorously formulate the process of combining dependent evidential bodies that are conditionally independent given their shared knowledge. As described in Sections 1 and 4, such dependent evidential bodies can be found in a variety of real-world problems, and as demonstrated in Section 4, applying our combination formula to them is quite straightforward. As shown in Section 6, we can recover the Dempster–Shafer formula and the TBM conjunctive rule as special cases of our combination formula. Appendix A. Adjusting a Cartesian belief assignment after establishing common knowledge We will explain the procedure using a simple, concrete example. Let Ω1 := {a, b, c }, Ω2 := {x, y }, and suppose that an agent establishes a Cartesian belief assignment g on Ω1 × Ω2 shown in Table 12. The marginal belief assignments of g on Ω1 and Ω2 are denoted by g Ω1 and g Ω2 , respectively. Suppose further that this agent establishes common knowledge on Ω1 with another agent and that they agree to use g Ω shown in Table 13 as their shared marginal belief assignment 1 on Ω1 . (See Section 4.1 for the discussion on how to establish common knowledge from different observations.) Then the original Cartesian belief assignment g must be adjusted accordingly. The values of g (a, x) and g (a, y ) will be multiplied by (a) gΩ 1
g Ω1 (a)
. Similarly, the values of g (b, x) and g (b, y ) will be multiplied by
multiplied by
(c ) gΩ 1
g Ω1 (c )
(b ) gΩ 1
g Ω1 (b)
, and the values of g (c , x) and g (c , y ) will be
. Table 14 shows the resulting adjusted Cartesian belief assignment g .
Table 12 Cartesian belief assignment g on Ω1 × Ω2 (Ω1 := {a, b, c }, Ω2 := {x, y }). g (a, y ) =
g (c , x) =
1 4 1 6 1 12
g (c , y ) =
1 8 1 8 1 4
g Ω2 (x) =
1 2
g Ω2 ( y ) =
1 2
g (a, x) = g (b, x) =
g (b, y ) =
g Ω1 (a) = g Ω1 (b) = g Ω1 (c ) =
3 (will be changed to 13 ) 8 7 (will be changed to 13 ) 24 1 3
T. Nakama, E. Ruspini / International Journal of Approximate Reasoning 55 (2014) 2109–2125
2125
Table 13 Marginal belief assignment g Ω on Ω1 1 established as common knowledge. (a) = gΩ 1
(b ) = gΩ 1 (c ) = gΩ 1
1 3 1 3 1 3
Table 14 Adjusted Cartesian belief assignment g obtained from g. 1 9
(a) = gΩ 1
g (b, x) = 4 g (c , x) =
21 1 12
g (b, y ) = 17 g (c , y ) =
1 4
(b ) = gΩ 1 (c ) = gΩ 1
(x) = gΩ 2
125 252
( y) = gΩ 2
127 252
g (a, x) =
2 9
g (a, y ) =
1 3 1 3 1 3
References [1] P. Besnard, P. Jaouen, P.J. Perin, Extending the transferable belief model for inconsistency handling, in: Proceedings of the International Conference on Information Processing and Management of Uncertainty in Knowledge Based Systems, IPMU, 1996, pp. 143–148. [2] P. Billingsley, Probability and Measure, 3rd edition, Wiley Interscience, New York, 1995. [3] S. Brin, L. Page, The anatomy of a large-scale hypertextual Web search engine, Comput. Netw. ISDN Syst. 30 (1) (1998) 107–117. [4] M.E.G.V. Cattaneo, Belief functions combination without the assumption of independence of the information sources, Int. J. Approx. Reason. 52 (3) (2011) 299–315. [5] Marco E.G.V. Cattaneo, Combining belief functions issued from dependent sources, in: Proceedings of the International Symposium on Imprecise Probabilities and Their Applications, ISIPTA, vol. 3, 2003, pp. 133–147. [6] K.L. Chung, A Course in Probability Theory, 3rd edition, Academic Press, London, 2001. [7] A.P. Dempster, Upper and lower probabilities induced by a multivalued mapping, Ann. Math. Stat. 38 (2) (1967) 325–339. [8] T. Denœux, Conjunctive and disjunctive combination of belief functions induced by nondistinct bodies of evidence, Artif. Intell. 172 (2) (2008) 234–264. [9] S. Destercke, D. Dubois, Idempotent conjunctive combination of belief functions: extending the minimum rule of possibility theory, Inf. Sci. 181 (18) (2011) 3925–3945. [10] D. Dubois, H. Prade, On the unicity of Dempster rule of combination, Int. J. Intell. Syst. 1 (2) (1986) 133–142. [11] D. Dubois, H. Prade, A set-theoretic view of belief functions, in: Classic Works of the Dempster–Shafer Theory of Belief Functions, 2008, pp. 375–410. [12] D. Dubois, R.R. Yager, Fuzzy set connectives as combinations of belief structures, Inf. Sci. 66 (3) (1992) 245–276. [13] Z. Elouedi, K. Mellouli, Pooling dependent expert opinions using the theory of evidence, in: Traitement d’information et gestion d’incertitudes dans les systèmes à base de connaissances. Conférence internationale, 1998, pp. 32–39. [14] W. Feller, An Introduction to Probability Theory and Its Applications, vol. 1, 3rd edition, Wiley, New York, 1968. [15] W. Feller, An Introduction to Probability Theory and Its Applications, vol. 2, 2nd edition, Wiley, New York, 1971. [16] S. Fine, Y. Singer, N. Tishby, The hierarchical hidden Markov model: analysis and applications, Mach. Learn. 32 (1) (1998) 41–62. [17] D. Gross, C.M. Harris, Fundamentals of Queuing Theory, Wiley, New York, 1998. [18] J.Y. Halpern, R. Fagin, Two views of belief: belief as generalized probability and belief as evidence, Artif. Intell. 54 (3) (1992) 275–317. [19] A.C. Harvey, Forecasting, Structural Time Series Models and the Kalman Filter, Cambridge University Press, 1990. [20] W.K. Hastings, Monte Carlo sampling methods using Markov chains and their applications, Biometrika 57 (1) (1970) 97–109. [21] X. Ling, W.G. Rudd, Combining opinions from several experts, Appl. Artif. Intell. 3 (4) (1989) 439–452. [22] W. Liu, J. Hong, Reinvestigating Dempster’s idea on evidence combination, Knowl. Inf. Syst. 2 (2) (2000) 223–241. [23] T. Lux, The Markov-switching multifractal model of asset returns: GMM estimation and linear forecasting of volatility, J. Bus. Econ. Stat. 26 (2) (2008) 194–210. [24] R.J. Meinhold, N.D. Singpurwalla, Understanding the Kalman filter, Am. Stat. 37 (2) (1983) 123–127. [25] N. Metropolis, A.W. Rosenbluth, M.N. Rosenbluth, A.H. Teller, E. Teller, Equation of state calculations by fast computing machines, J. Chem. Phys. 21 (6) (1953) 1087–1092. [26] J. Pearl, Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference, Morgan Kaufmann, 1988. [27] D. Pollard, A User’s Guide to Measure Theoretic Probability, Cambridge University Press, New York, 2002. [28] M.L. Puterman, Markov Decision Processes: Discrete Stochastic Dynamic Programming, Wiley Series in Probability and Statistics, vol. 414, Wiley, 2009. [29] L. Rabiner, A tutorial on hidden Markov models and selected applications in speech recognition, Proc. IEEE 77 (2) (1989) 257–286. [30] S.M. Ross, Stochastic Processes, 2nd edition, Wiley, New York, 1996. [31] S.M. Ross, Introduction to Probability Models, 9th edition, Academic Press, New York, 2006. [32] E.H. Ruspini, Logical foundations of evidential reasoning, Technical Note 408, SRI International, Menlo Park, CA, 1986. [33] E.H. Ruspini, Epistemic logics, probability, and the calculus of evidence, in: Classic Works of the Dempster–Shafer Theory of Belief Functions, Springer, 2008, pp. 435–448. [34] J. Rust, Structural estimation of Markov decision processes, Handb. Econom. 4 (4) (1994). [35] G. Shafer, A Mathematical Theory of Evidence, Princeton University Press, Princeton, NJ, 1976. [36] P. Smets, Combining non-distinct evidences, in: International Conference of the North American Fuzzy Information Processing Society, NAFIPS 1986, 1986, pp. 544–549. [37] P. Smets, Belief functions: the disjunctive rule of combination and the generalized Bayesian theorem, Int. J. Approx. Reason. 9 (1) (1993) 1–35. [38] P. Smets, The Transferable Belief Model for quantified belief representation, in: Handbook of Defeasible Reasoning and Uncertainty Management Systems, vol. 1, 1998, pp. 267–301. [39] P. Smets, R. Kennes, The transferable belief model, Artif. Intell. 66 (2) (1994) 191–234. [40] R.S. Sutton, A.G. Barto, Reinforcement Learning, MIT Press, Cambridge, MA, 1998. [41] S. Thrun, W. Burgard, D. Fox, Probabilistic Robotics, MIT Press, Cambridge, MA, 2005. [42] A.J. Viterbi, Error bounds for convolutional codes and an asymptotically optimum decoding algorithm, IEEE Trans. Inf. Theory 13 (2) (1967) 260–269. [43] F. Voorbraak, On the justification of Dempster’s rule of combination, Artif. Intell. 48 (2) (1991) 171–197.