A comfort measure for diagnostic problem solving

A comfort measure for diagnostic problem solving

149 A Comfort Measure for Diagnostic Problem Solving* YUN PENG and JAMES A. REGGIAt Departmenl of Computer Science, University of Masyland, Co...

2MB Sizes 10 Downloads 130 Views

149

A Comfort Measure for Diagnostic Problem Solving* YUN

PENG

and JAMES

A. REGGIAt

Departmenl

of Computer

Science, University of Masyland,

College Park, Maqhxnd

20742

ABSTRACT

In order to apply Bayes’ theorem for diagnostic problem solving when multiple disorders can occur simult~eo~ly, several previous proposals have suggested using the ratio of posterior probabilities to rank diagnostic hypotheses. Approaches using such relative likelihoods lose the measure of absolute strengths of hypotheses, and thus are incapable of evaluating the “quality” of a problem solution. In this paper, we propose to impose a quantity called a “comfort measure” on the solution: a solution of a diagnostic problem is a minimal-size set of hypotheses such that the sum of their posterior probabilities exceeds a given comfort measure. Based on a probabilistic causal model developed previously, a problem-solving strategy is presented which does not require the m~festation independence assumption required with direct Bayesian classification, and which is applicable to multimembership classification problems. This strategy selectively generates diagnostic hypotheses and calculates both their relative likelihood and the lower and upper bounds of their posterior probabilities. These bounds are successively refined as more hypotheses are generated. Using these bounds, not the real posterior probabilities, the problem-solving strategy identifies a solution satisfying the given comfort measure, usually after only a small portion of all possible hypotheses have been generated.

1.

INTRODUCTION

During the last several years, a great deal of research in the AI comm~ty has gone into developing “abductive” inference mechanisms that construct plausible hypotheses during general diagnostic problem solving where multiple *This work is supported in part by ONR award NO001485K0390 and by NSF Award DCR-8451430 with matching funds from Software A&E, AT&T Information Systems, Control Data, and Allied Corporation Foundation. ‘Also with Department of Neurology, UMAB, and with University of Maryland Institute for Advanced Computer Studies. OElsevier Science Publishing 655 Avenue of the Americas,

Co., Inc. 1989 New York, NY 10010

0020-0255/89/$03.SO

YUN PENG AND JAMES A. REGGIA

150

disorders can occur simultaneously [14, 16, 6, 15, 3, lo]. In such applications, we can let D denote the set of all possible disorders, and A4 the set of all possible manifestations (symptoms, findings). A subset M+ c M can be used to represent a specific case where all manifestations in M+ are known to be present and all other manifestations absent; and a subset DI c D can be used to represent a “hypothesis” for the given case that all disorders in DI are occurring and all others are not. If any collection of disorders can occur simultaneously, then there are a total of 21D1possible hypotheses, and they are mutually exclusive and exhaustive. Among them, some hypotheses are more plausible or more likely to be true than others given the presence of M+, and it is an open research question in AI today how best to judge such plausibility. One measure for plausibility or likelihood of a hypothesis DI given M+ is its posterior probability P( D,IM+ ). By Bayes’ theorem,

P(D,(M+)

=

P(M+ID,)P(D,) P(M+)

(1)

Thus we can obtain P( DI IM+ ) if probabilities P( M+ 1DT), P( D,), and P( M+ ) either are given a priori or can be computed from other given probabilities. Among the several fundamental difficulties involved in such a direct application of Bayesian classification is how to obtain the value of P( M+ ). It is impractical to require that P( M’ ) be available prior to the diagnostic process, because, among other things, this would require 21”I probabilities for all combinations of occurring manifestations in M, which is astronomically large for a large set M. On the other hand, because manifestations are not independent of each other (e.g., manifestations that may be caused by the same disorder are interrelated more closely than those that may not), it is very difficult to compute P( M+ ) from P( mj), the prior probabilities of individual manifestations mj. A number of proposals, including those of the present authors, have been made in the past to overcome this problem [l, 2,13, 8, 9, 31. The central idea of many of these proposals is to use the ratio of posterior probabilities as relative likelihood measures of hypotheses. Since P( M+ ) is a constant for all hypotheses D, for a given specific problem represented by M+, it can be factored out from the relative likelihood measure, thus avoiding the computation of P( M+ ). In other words, we can compare the relative merit of DI and DJ using

P(D,tM+) P(M+lDr)P(D,) P(D,IM+)= f’(M+ Io,)P(D,) without

consideration

of P( M+ ). This widely advocated

approach

of using

COMFORT

MEASURE

FOR PROBLEM

SOLVING

151

relative likelihood measures is valid if the goal of diagnostic problem solving is to identify only the single mosr probable hypothesis. In many real-world diagnostic applications, however, identifying the single most probable hypothesis does not really solve the problem. For example, if D,_ c D is identified to be the most probable hypothesis of a given case, and it has relative likelihood measure P( Mf )D,,)P( D,,,,) = 0.006, then how comfortable or confident are we in concluding that D_ alone is a reasonable solution of the given problem? If the relative likelihood of all other hypotheses is much much smaller than 0.006, then this conclusion is probably acceptable. If, however, there are some other hypotheses with relative likelihood smaller but comparable to 0.006, then D max alone may not be acceptable as a problem solution (its posterior probability may be quite small even though its relative likelihood is the largest). In the latter situation, a human diagnostician might include more than one hypothesis in his conclusion as alternative possibilities, or might pursue the diagnostic process further by asking appropriate questions to disambiguate among the more probable hypotheses. The relative likelihood of the most probable hypothesis alone does not directly tell us whether we should accept the single hypothesis as the problem solution, or whether we should include more hypotheses and how many more should be included. That is, it provides us no sense of the “quality” of a problem solution based on a single hypothesis. The main reason for this difficulty is that by using the relative likelihood measure, we lose the measure of the absolute strength or belief in a hypothesis in a given case. In other words, by replacing the posterior probability with the relative likelihood to measure hypothesis plausibility, we get rid of the problem of computing P( M+ ), but face a new problem of controlling the “quality” of the diagnostic problem solution. The goal of this paper is to develop a diagnostic problem-solving strategy which still avoids the direct computation of P( D,(M+ ), yet also controls the quality of the problem solution. We propose that a quantity between 0 and 1, called a comfort measure (abbreviated as CM), can be used to measure the quality of the problem solution. A plausible solution is defined to be a minimal-size set of hypotheses (sets of disorders) such that the sum of their posterior probabilities is guaranteed to exceed the given CM value, even if the exact values of these posterior probabilities are not known. The problem-solving strategy is based on a formal probabilistic causal model developed previously, which integrates a symbolic causal relation and probability theory in a formal fashion [12, 131. This earlier strategy is a heuristically guided search process which identifies the single most likely hypothesis (or a set of most likely hypotheses in case of a tie) in a provably correct manner. In contrast, the methods presented in the current paper are not oriented towards finding the best hypothesis (the one with the highest likelihood), but towards finding a set

152

YUN PENG AND JAMES A. REGGIA

of most probable hypotheses satisfying the given comfort measure. To do this, techniques are developed to calculate lower and upper bounds of the posterior probabilities of existing hypotheses, and whenever new hypotheses are generated during the search, these bounds are refined. It is the integration of best-first search and these techniques for bounding posterior probabilities that enables us to find a solution satisfying the given CM criterion. The rest of this paper is arranged as follows. Section 2 gives a brief review and summary of the probabilistic causal model’and the formal definition of diagnostic problems introduced earlier by the authors. Section 3 then extends this earlier work by introducing the notion of a “comfort measure” and by deriving the formulae for the lower and upper bounds of the posterior probability for a diagnostic hypothesis. Based on these results, Section 4 presents a search algorithm for problem solving and proves its correctness. Some issues concerning the efficiency of the proposed strategy are also discussed in this section. Finally, Section 5 compares this work with previous work of others and with other possible measures for controlling the “quality” of a diagnostic solution, and offers suggestions for further research. 2.

PROBLEM

FORMULATION

The probabilistic causal model used in this paper is based on a formalization of the causal associative knowledge underlying diagnostic problem solving referred to as “parsimonious covering theory” [16,17, lo]. The simplest type of diagnostic problems formulated in this model, and the one we will use in this paper, is defined as follows. DEFINITION1. A diagnostic problem is a 4-tuple P = (D, M, C, M+ ) where D= M=

{4,..., {m,,...,

d,} is a finite nonempty set of disorders; mk} is a finite nonempty set of manifestations;

C G D x M IS a relation with domain(C) = D and range(C) subset of M.

= M; and

M+ G M is a distinguished

relation C captures the intuitive notion of causal associations in a form, where (d,, mj) E C iff “disorder di may cause manifestation mj.” D, M, and C together correspond to the knowledge base in a diagnostic expert system, while M+ represents the features (manifestations) which are present in a specific problem. For simplicity, all m, P M+ are assumed to be absent.’ Two functions, “causes” and “effects,” are then defined: for all The

symbolic

‘For more general situations where each manifestation m, may be known to be present, or to be absent, or its presence/absence is unknown, see discussion in Section 5.

COMFORT

MEASURE

FOR PROBLEM

SOLVING

153

m, E M, causes(mj) = {d;)(d,, mj) E C }, representing all possible causes of manifestation m,; for all di E D, effects( di) = { m, 1(d,, m,) E C }, representing all manifestations which may be caused by d,. A set of disorders DI c D is called a hypothesis. A hypothesis DI is said to be a cover of a set of manifestations MJ c M if MJ c effects( DI), where by definition effects( 4) = Ud,E ,,effects( d,). Also, we define causes( MJ) = U,, E ,,causes( m,).

EXAMPLE. Figure 1 gives an example of a very simple diagnostic problem formulated according to definition 1. This problem has six disorders d, through d,, and five manifestations ml through m5 (real-world diagnostic problems are generally much larger). Here M+ = {ml, m3, m,}, indicating that ml, m3, and m4 are present, and m2 and ms are absent. In this example, causes(m,) = {d,, d,, d4}, effects(d,) = {m,, m2, m,}, and {dZ, d3} is a cover of { ml, m3, m4 >. The probabilistic causal model is an integration of the symbolic causal knowledge represented in the above formalization and formal probability theory. In what follows we give a very brief description of the important results derived from this model and some informal justification of them. For the formal derivations and proofs, readers are referred to [12]. In this model, the knowledge base is extended by attaching a prior probability 0 < p, < 1 to each d, E D, and a causal strength 0 x c,] G 1 to each causal association (d, , m, ) E C representing how frequently d, causes mj. For any (d,, m,) E C, c,, is assumed to be zero. In Figure 1, prior probabilities are shown below each d, E D and causal strengths are shown adjacent to each link (d,, m,) E C. A very impor-

D

d, .Ol

d,

$3

.l

2

d,

M+ = { WL ,,m Fig. 1.

d,

.2

An abstract

.oF)

3,m

4)

diagnostic

d, .02

CM = problem

prior

0.95

probabilities

YUN PENG AND JAMES A. REGGIA

154

tant point here is that c,~ 6CP(m,ld,). The probability cj, = P(d, causes mild,) represents how frequently d, causes mj; the probability P( m, 1d, ), which is what has generally been used in previous statistical diagnostic systems, represents how frequently mj occurs when d, is present. Since typically more than one disorder can occur simultaneously, P( mj Idi) > ci,. For example, if di cannot cause m, at all, c,j = 0, but P( mj Id,) > 0 because some other disorder present simultaneously with d, may cause mj. Also, we make the following assumptions: 2 (1) (2) cause (3)

disorders d, are independent of each other; causal strengths are invariant, i.e., whenever di occurs, it may always mj with the same strength or tendency c,,; and no manifestation can be present without being caused by some disorder.

Based on the assumptions presented above, and recalling that the set notation DI denotes the event that all disorders in DI are occurring and no disorders not in D, occur, and that M’ denotes that all manifestations in M+ are present and all other manifestations are absent, it can be proven [12] that for any m, E M and D, c D

P(mjlDr) =l- ,FD (l-c,,> I

(34

,

and

P(510,)=dvD (l-ci,); 1

and that manifestations

I

under a given DI, so

are independent

II P(KIQ). P(M’ID,)= t?l,GM' II P(mjlQ) m,cM-M+ Also, by the disorder independence

f’(Dr) =d,D I

Pi’ I

I-I

dk E D - D,

assumption (l-P,)

Since ll dkE o(l - pk) Z 0 in Equation

2Assumptions similar evidence aggregation [9].

(4

=~,~~,&$gl-Pk,.

(5) t

(5) is a constant

to these are sometimes

called

“noisy

for a given knowledge

OR gate”

assumptions

for

COMFORT

MEASURE

FOR PROBLEM

155

SOLVING

base, both this value and P( M+ ) can be factored out from Bayes’s formula if only the relative likelihood of hypotheses D, is of interest. Put otherwise, can calculate the ratio P( D,IM+ )/P( DJI M’ ), using Equations (2) to based solely on pi and cij stored in a knowledge base. This leads to following definition (from [13]): DEFINITION2.

(1) we (5) the

The relative likelihood measure of DI 5 D given M+ among

all subsets of D is defined as

(6) L( DI, M+ ) differs from the posterior probability P( D, 1M+ ) by a constant factor [l-l, E D(l- p,)]/P(M+) for any given problem [see Equation (l)]. Therefore, a hypothesis D, with the largest L( D,, M+ ) is the most probable (has the largest posterior probability) one among all hypotheses. It follows that L( D,, Mi ) is the product of three components: L(D,,M+)

=L,(D,,M+)L,(D,,M+)L,(D,,M+),

(7a)

where the first product

L,(DI,M’) = II

m,EM+

f’(mjlD,)= II

m,EM+

(1~4 II DI (l-c,,)) (7b) E

informally can be thought of as a weight reflecting how likely DI is to cause the presence of manifestations in the given M+; the second product

L,(D,,M+) = d, E D,

ml E

I-I

effects(d,)-

M+

(I-

c,,)

(74

can be viewed as a weight based on manifestations expected with DI but actually absent; and the third product represents a weight based on prior probabilities of DI:

L,(Q,M+) =

n

(7d)

d,GD,$ki

Note that both L, and & are products over di in Or; they can be combined into a single product over DI. For brevity, let (Yi=

I-I

ml E effects(d,)

forall - M+

diED.

156

YUN PENG AND JAMES A. REGGIA

Then from Equations

(7a-d)

we have

(8) Equations 7(a-d) and (8) enable us to calculate the relative likelihood of any hypothesis DI using probabilities provided in the knowledge base (p,‘s and ci, ‘s), and thus to rank hypotheses. Also note that all of these equations only use probabilities relevant to DI and M+, not the whole knowledge base, thus making computation of the relative likelihood tractable. It is clear that if DI is not a cover of M+, there exists some m, E MC not covered by 0,.Then, from Equation (3a), p(m,ID,)=O, and in turn L,(D,,M+), L(D,,M+), and P( D,IM+ ) equal zero. This is consistent with the idea that only covers are possible hypotheses [16, 17, 10, 3, 151. EXAMPLE. Using the probabilities in Figure 1, for M’ = { m,, m3, m4 }, we have cr, = 0.00404, a2 = 0.088889, (us = 0.175, (Ye= 0.015, 0~~= 0.010526, and LYE = 0.004082. The relative likelihood of cover { d,, d, } of M+ is computed as

=[l-(l-0.9)(1-O)][l-(l-O)(l-0.5)] .[l-(l-0.3)(1-0.8)]x0.088889x0.175=0.00602.

The relative likelihood L,({d,,d,},M+)

of { d,, d, } (which is not a cover of M’ ) is zero, because =[l-(l-0.2)(1-O)][l-(l-0.8)(1-0.2)] .[l-(1-0)(1-O)]

=o.

Another quantity associated with each hypothesis DI which is very useful during problem solving is the upper bound UB of the relative likelihood of all hypotheses DJ a Or, i.e., all supersets of Dr [13]. Since L,(D,, M+) ~1 for all hypotheses DJ, then for any DJ 2 Dr

COMFORT

MEASURE

FOR PROBLEM

157

SOLVING

Thus we define UB(D,,M+)

=

n a; 4 EQ

n d, E D -

ak.

(9a)

D,

ak >l

In cases where 0~~~1 for all d, E D (true for many if not most real-world diagnostic problems because the prior probabilities of disorders are usually very small and n m, E effects(d,) - M + (1 - c,,) are less than l), the second factor becomes 1 and can be dropped from the equation. In these situations, we have UB(D,,M+)=

n (Y,. 4 6Q

LEMMA1. For any DJ 3 D,, UB(D,,

M+)
(9b) M+).

Proof. Let d, E D - DI. If 0~~> 1, then, from Equation 9(a), ak is a factor ofUB(D,,M’);therefore,UB(D,U{d,},M+)=UB(D,,M’).Ifa,~l,then UB(D, u{dk}, M+)=UB(D,, M+). ak < UB( DI, M+ ). That is, the function UB is monotonically nonincreasing when D, is enlarged by including one more disorder. The lemma then follows immediately. n For brevity, in the text that follows L( D,, M’ ) and UB( D,, M+ ) are sometimes called the L value and UB value of hypothesis D,, respectively.

3. COMFORT MEASURES AND ESTIMATING POSTERIOR PROBABILITIES The relative likelihood measure described in the previous section enables one to rank two diagnostic hypotheses D, and DJ relative to one another, and can serve as the basis for a heuristically guided search procedure guaranteed to identify a hypothesis D,, with the largest posterior probability [13]. However, although Dmaxwith P( D,,,, 1M+ ) > P( D, (M+ ) for all DI in 2O can be identified using L( D,,,, , M+ ) in this fashion, L( D,,,,, M+ ) tells us nothing about the absolute value of P( Dmax1M+ ), which might actually be quite small. What would be preferable in most real-world diagnostic applications is the determination of a set of DI including Dmaxthat is very likely to include the actual set of disorders D+ that are present. The probabilistic causal model developed earlier and described in the previous section is now extended to address this issue. Let a comfort measure CM be given with 0 < CM < 1, representing how certain we wish to be that a collection of diagnostic hypotheses { D,, 4,. , DK } includes the actual set of causative disorders D+ that are present. For example,

158

YUN PENG AND JAMES A. REGGIA

CM = 0.95 means that we want a collection of diagnostic hypotheses that include the actually occurring set of disorders 11’ with a probability of 0.95. Then the solution of a diagnostic problem can be defined as follows. DEFINITION 3. For a given diagnostic problem comfortmeasureO
P = (D, M, C, M’ ) and a issaid tobea

a CM, and

The existence of a solution for any diagnostic problem as defined here immediately follows from the facts that the hypothesis space 2O is a mutually and that O be a solution of problem P, and let DK have the least posterior probability among all hypotheses in Sol(P). If, however, another hypothesis DR +ZSol(P) has the same posterior probability as DK, then the set {D,, D2,.._, DK_l, DR } is also a solution of P. In general two or more distinct hypotheses are unlikely to have exactly the same likelihood value, and even if so, the alternative solutions only differ on hypotheses with the lowest likelihood, i.e., on the least significant hypotheses. Therefore, the problem-solving strategy presented in this paper is only concerned with finding ~1 solution for a given problem. Equations (6)-(8) in the last section, taken from previous work on the probabilistic causal model, are for computing the relative likelihood of hypotheses. However, deriving the problem solution defined above would appear to require calculation of posterior probabilities, thus leading to prohibitive calculations as explained earlier. That this is not the case will be demonstrated with the problem-solving algorithm presented in Section 4, which, instead of using posterior probabilities, uses their lower and upper bounds to determine a solution. This algorithm searches a portion of the hypothesis space 2”. It starts with an initial hypothesis 0, and gradually generates or makes explicit more and more hypotheses by expanding hypotheses D, chosen from those already generated. An expansion of DI generates a set of hypotheses each of which is created by adding one or more disorder to Dr. Thus, at any moment during the problem-solving process, among all hypotheses in 2O, some will have been made

COMFORT

MEASURE

FOR PROBLEM

159

SOLVING

explicit, and others not. Among the explicit hypotheses, some will be covers of M+ and others will be noncovers (with zero L value). In the context of such a problem-solving strategy, it is now shown that it is possible to estimate effectively the lower and upper bounds of posterior probabilities of hypotheses without actually calculating the posterior probabilities involved. This is done by using L values and UB values of just those hypotheses already made explicit at any given moment during the problem solving. Since all D, in 2’ are mutually exclusive and exhaustive, P( M+ ) = c DIE pP( M+ 1D,)P( DJ). Then, from Equations (5) and (6) and by Bayes’s theorem (l), we can represent the posterior probability of Dr given M+ in terms of relative likelihoods:

P(D,IM+)

=

f’(M+ c

LtD13f+)~~~tl-~d

lD#‘(D,)

f’(M+ID,)J’(D,)

c

=

=

L(D,,M+)

c

L(4,M+k).d~U(l-~d k

D,E~~

D,E~~

L(D,,M+)

L(D,,M+) =

c

(10)

L(D,,M+).

DJ E 2D Dr covers M+

DJ ~2~

The last equality comes from the fact that L( DJ, M+ ) = 0 if DJ does not cover M+, as explained earlier. The normalization equation (10) above cannot be used directly in a large problem (one with a large disorder set D), because it requires the generation of all members of 2’ (or at least all covers of M+ ) and calculation of their relative likelihood values. However, it can be used to derive lower and upper bounds for P( 0,) M+ ) when only some hypotheses have been generated or made explicit. At any moment during problem solving, let Lknown= EL( D,, M+ ) for all covers D, which have been generated or made explict so far. Assume somehow we have derived a value L,,, 2 I3L( 4, M+ ) for all covers DJ which have not been generated yet. Then we have

L known

6

and thus from Equation

c

L(

D? 7 M+

)

G Lknown

(10) we have the lower bound

+

Lest

9

inf( 0,) and the upper

160 bound

YUN PENG AND JAMES A. REGGIA sup( D,) on P( D, IM+ ) given by

irW4) =

L(D,M+) Lknown+

L

~P(DAM+)G

L(D,N+) L

es1

known

= sup( 4).

(11)

It is these bounds, not the true posterior probabilities of hypotheses, that are used during problem solving to determine if a solution according to Definition 3 (i.e., satisfying the given comfort measure) can be formed. Note that by Equation (ll), if P(D,IM+) 2 P(D,IM+) then sup(D,) > sup(D,) and inf(D,) 2 inf( DJ) for any DI and DJ E 2D. The upper bound sup( DI) on P( DI IM+ ) is easy to derive, because Lknown can be obtained by accumulating L values of each newly generated cover during the problem-solving process. Thus the real problem here is how to define and obtain L,,,, the estimated upper bound on the sum of L values of all unknown covers, without actually generating these covers and calculating their L values. To derive Lest, we first develop an upper bound on the sum of L values of all proper supersets of any D1, i.e., an upper bound on C, , D,L( DJ, M+ ). It will be seen later that L,,, can be defined in terms of these upper bounds for some already generated hypotheses 4. LEMMA 2. For any DJ 3 D,, UB(D,, M+)
From Equation

UB(DJ,M+)= n

(9a)

n

&,

d, E DJ

&ED-4 ak

>

ak=dvDa, t

1

n

a,dE;_Dak

d, ED~- 4

I

k

J ak

aJdrcG?-, elk >

where the inequality the last product.

M+)rId,EDJ_D,~k.

1

(Y~=UB(D,,M+)

>

1

n d, E DJ ~ D,

I

o(/’

comes from the facts that D - DI 13 D - DJ and 01~> 1 in W

Lemma 2 leads immediately paper, Theorem 3.

to one of the most important

results in this

3. Let D, be a cover of M+. Then

THEOREM

DJ =Q c

L(D,,M+)


I I

d gD

(1+5-l

1. (12)

Proof. Consider any DJx Dr. If ~D,~=(D,l+l, then DJ =D,U{d,} for some {d, } c D - DI. From Lemma 2 and the definition of UB, L( DJ, M+ ) <

COMFORT

MEASURE

FOR PROBLEM

UB( DJ,M+ )Q UB( D,, Mi )aj. Therefore, L(D,,M+)

c 4 = 0, ID/l

= ID,I

<

161

SOLVING for all such DJ UB( Of, M+).a,

c (d,)sD-D,

+l

=UB(D,,M+). [ Similarly,

if )DJ) = 1D, I+ 2, then L(D,,M+)


for some {d, , d, } c D - DI. Therefore,

lb

{~,)s?-D,~~].

c DJ 3 4

L(D,,M+)

<

Q_JB(D,,M+)a,,a, for all such DJ UB( D,, M+)

c (d,.d,)

c Dp

yak

D,

= ID,I+z

c

=UB(D,,M+).

a;ak

{d,,d,)GD-D,

1

Then for all proper supersets of D,, we have

c

L(Q,M+)dJB(D,,M+).

c

a,+

(d,)cD-D,

DJ = D/

c

a,’

+

...

+ d, t

=UB( D,,M+).

I ,

1+a,)-1

dE!pD(

EXAMPLE. all d, E D,

c

(yk

(d,.d,)cDpD,

l-I D

a/ D,

1

n

Let DI = {d,, d, } in the example of Figure 1. Because (Y,< 1 for

DJ= (d,.d,)

L( oJ, M+)

G(Y,(Y,.

m+4(1+

%x1+ 4(1-t 4 - 11

= 0.0155556 x0.0340318

= 0.000529.

n

162

YUN PENG AND JAMES A. REGGIA

For notational convenience, let lest( Or, M+ ) denote the upper bound of the sum of all DJ I 0,. Then we have

L,(D,,M+) =UB(D,,M+)For any given hypothesis

1+a,)-1 2D’ I I

.

(13)

I

DI, we can now readily compute its relative likelihood

L( DI,M+ )by Equations (7) and (8); the upper bound of the relative likelihood of all its supersets UB( Or,M+ )by Equation (9a) or (9b); and the upper bound of the sum of the relative likelihoods of all its proper supersets, I,,, (D,,M+ ), by Equation (13). These three values associated with each hypothesis play important roles in the problem-solving process which is presented in the next section.

4.

PROBL
STRATEGY

An algorithm for solving diagnostic problems formulated according to Definition 3 is now presented together with an illustrative example. The correctness proof of the algorithm is then given, and the issue of its efficiency is discussed. 4.1

AN ALGORITHM

FOR PROBLEM

SOLUTION

A heuristically guided best-first search process can be employed to find a solution of a diagnostic problem as defined in Section 3. As briefly mentioned earlier, this process is also a process of hypothesis generation. It starts at the initial hypothesis 0 E 2’ and generates new hypotheses from existing ones until K most probable hypotheses (those with largest L values) are generated satisfying the CM criterion. In doing so, the search algorithm only generates (searches, makes explicit) a small portion of the large hypothesis space 2’. Before presenting the details of the algorithm search, we first describe three functions it uses (expand, update, and testsol), and present the underlying concepts in an informal fashion. It may be helpful to examine the example given at the end of this subsection together with the illustration in Figure 6 and relevant data in Table 1 while reading the algorithm descriptions given below. In the algorithm search, to be presented shortly (in Figure 5), new hypotheses are generated by expanding an existing hypothesis so that each new hypothesis contains one more disorder. The function expand for hypothesis expansion is given in Figure 2. The then part of this function expands a noncover hypothesis by selecting an arbitrary present manifestation m, which

COMFORT 1 function

MEASURE

variable

3.

begin

d, disorder,

D, hypothesis;

if Di is itot a cover of M’

5.

then return {D,u {dk } 1 dbE causes(n~~)

6

else return (D, u (d, } / dl. E D - D, }

7.

163

SOLVING

expand(D,)

‘2.

4.

FOR PROBLEM

for at,

arhitkuy

w, E

?,I‘-

rffrcis(I3,

)}

end. Fig. 2.

Function expand.

is not covered by 4, and then returning a set of new hypotheses where each possible cause d, of rnj has been added into D, to form a new hypothesis (note that a new hypothesis D, U {d, } thus covers more of M” than D, does). The else part of this function expands a cover where each disorder not in D, is added into D, to form a new hypothesis. As the example in Figure 6 shows, the initial hypothesis 0 is a noncover, so we arbitrarily select m3 from Mt for expanding 0. Since mn3has three causes, 4, d,, and d4, three new hypotheses are thus generated. On the other hand, the expansion of hypothesis (d,, d, } generates four new hypotheses because there are four disorders not in this hypothesis (i.e., 4, d,, d,, and d6). The reason for separating the two cases is to reduce the total number of noncover hypotheses generated [causes(mj) is usually a smalI subset of D - D,]. For example, although the diagnostic problem given in Figure 1 has a total of 22 noncover hypotheses, it will be seen in a later example that we will only generate 5 of them (including the initial hypothesis 0) before a solution is reached. During problem solving, the algorithm search maintains two sets: Frontier and Candidates. Frontier is the set containing all hypotheses which have been generated but not expanded yet (may contain covers as well as noncovers of M+ ), and is initialized to be { 0 } (i.e., Frontier contains only the initial hypothesis 0). Candidates is the set of all hypotheses which are covers of M’ and have been expanded already. Candidares is initialized to be empty. (It will be seen later that hypotheses composing the solution are chosen from the set Cu~id~tes). Thus, at any moment during the search process, any cover L), of M’ that already has been generated by previous expansions must either be in Frontier or Candidates. Therefore, we have

L known =

c

D, E Frontier U Candidates D, cwers M+

L(4d-q.

(14

YUN PENG AND JAMES A. REGGIA

164

It is shown later (Lemma 6) that at any moment during the search process, any cover of M+ which has not been generated is a proper subset of some hypothesis DJ in Frontier. Since I,,,( DI, M’ ) is an upper bound on the sum of L values of all proper subsets of DI (by Theorem 3), we define

=

c

UB(D,,M+)+

(15)

D, E Frontrer

It is clear that L,,, &?cL( 4, M’) for all DJ not generated. Thus, at any moment during the search process we can compute Lknown and Lest by Equations (14) and (15), respectively, from the current Frontier and Cundid~te~, and in turn compute inf and sup of any hypothesis by Equation (11). Since Frontier and Candidates are initialized to { 0 } and 0, respectively, for a given problem we have Lknown = L( 0, M+ ) and L,,, = I,,,( 0, M” ) initially, derived from Equations (14) and (15). [In the case where M+ f 0, then 0 does not cover M+; therefore Lkno,,,,,= L( 0, M+ ) = 0.1 When the function expand is called to expand some hypothesis Dr in Frontier, the sets Frontier and candidates are then changed, as are L,,,,_ and L est. The procedure update which expands a hypothesis DI chosen from Frontier and updates Frontier, Candidates, Lhownr and L,,, is given in Figure 3. Since DI is expanded at line 4, it is removed from Frontier at line 5. If DI is a cover of M+, it is added into Candidates at line 7. For each DJ E expand( D,), if it was not in Frontier or Candidates before, it is added into Frontier at line 9. Thus, after expand( IIf),,>, Frontier continues to contain all hypotheses generated but not expanded, while ~~ndid~te~ continues to be the set of all covers already expanded. Correspondingly, since Df is removed from Frontier, I,,,( 4, M+ ) is subtracted from L,., at line 6. For each DJ added into Frontier, lest( DJ, M+ ) is added to Lest at line 10, so that L,,, is updated to be ED, E Fronrrerlest(DJ, M+ ) for the updated new Frontier. At line 12, for each DJ added into Frontier, if DJ is also a cover of M’, then L(D,, M+ ) is added to Lknown (such D,‘s are new covers of M+ generated by the current expansion of DI). Thus, Lknown is updated to be

c

D, E Frontier U Candidates D, covers M+

for the updated

Frontier and Candidates.

L(%M+)

COMFORT

MEASURE

1. procedure 2.

variable

3.

begin

FOR PROBLEM

update(D,

)

S set of hypotheses;

4.

S := expand(DI);

5.

Frontier := Frontier

6.

Lest := Lest - leet(Dr i W;

7.

if D, is a cover of M+ then

8.

for each DJc

- {Df };

Candidates

S and DJ$ Frontier

Frontier

10.

Lest := Lest + lest(D, t M+);

11.

if DJ covers M+ then

Frontier

L known..__ ‘--- L known

IL?.

14.

:=

u

$-

u

:=

Candidates

Candidates

u

{DI $

do

(Dj >;

9.

13.

165

SOLVING

L(D, .hP)

endfor end. Fig. 3. Procedure update.

After each expansion and update, algorithm search calls a function testsol to test if the solution for the given problem can be formed by selecting hypotheses from Candidates. Among all hypotheses RI in Candidates, a set of those with their L values greater than or equal to the largest UB value of all hypotheses in F&tier is selected (it is shown later by Corollary 7 that these hypotheses Q are more probable than those in Frontier and those which have not been generated yet; thus they are the most probable covers of all) and passed to testsol. Algorithm tests01 then successively selects hypotheses from S in descending order of their L values. If K hypotheses L),, 4,. , . , D, are selected in this way such that t I-1

inf(D,)

>CM>

Kilsup(D,), I=1

(16)

YUN PENC AND JAMES A. REGGIA

166

then the set of these K hypotheses is returned as a solution of the given problem. Note that in Equation (16), instead of the posterior probabilities of hypotheses, the lower and upper bounds are used to determine if a solution can be formed according to Definition 3. The appropriateness of this test criterion can be ~fo~~ly justified as follows. Recall that inf(B,) 4 P(tt,, M+ > Q sup(I+} [see Equation (ll)J, so Equation (16) implies that Zf_lP(Df[M’ ) a CM > Cf:;P( D, Ii@ ). Since these K hypotheses are most probable ones, and K of them together satisfy the CM but K - 1 of them do not, they compose a smallest set of hypotheses satisfying the given CM, thus meeting the two conditions set in Definition 3. The formal proof of this criterion (Lemma 8) and an illustrative example are given later. If no such set of hypotheses can be found (either because not enough most probable hypotheses can be identified and selected from Candidates, or because the inf and sup values are not refined enough), then the procedure returns an empty set of hypotheses indicating the need for further expansions. Function testsol is given in Figure 4. The variable Sol is used to store the partially formed solution, and is initially empty. During each pass of the while loop (lines 5 to lo), the 1. function

testsol

2.

variable

3.

begin

Sol set of hypotheses;

4.

Sol := 0;

5.

while S # 0 and

C

sup(D[)

< CM do

D,&ol G.

select

7.

s:=s-

8.

if

D, from S with the largest {D,}; C

inf(D,)

+ inf(D1) > CM then

D ,&A 9.

else

10.

endwhile;

11.

return

12.

L(D,,M+);

Sol := Sol u { DI }

(0)

end. Fig. 4.

Function testsol.

return

(Sol u {Dr })

COMFORT

MEASURE

FOR PROBLEM

SOLVING

167

hypotheses Dr with the largest L value is selected and deleted from S (line 7). The test condition of line 8 is the same as the first inequality of Equation (16). At line 5, the beginning of the while loop, if S = 0, then there are no more hypotheses left for further selection; if CM Q IED,Eso,sup( DJ>, then the second inequality of Equation (16) will be violated. In either case, no solution satisfying Equation (16) can be found; the loop terminates and returns 0. We now turn to algorithm search. Whenever function testsol fails to find a solution, algorithm search always selects the hypothesis DI E Frontier with the largest UB value to expand next. A new expansion generally reduces the largest UB value of the current Frontier, and thus leads to identifying more hypotheses in Candidates which may be selected to form a solution. Also, by generating more hypotheses, L,,,,, and L,., are refined, leading to the refinement of inf and sup vaIues of ~di~du~ hypotheses (narrowing the bounds on their posterior probabilities). Thus, each expansion leads the search closer to a solution. Algorithm search is given in Figure 5. In procedure search, lines 4 to 7 initialize Frontier, Candidates, Lknown, and L est. The while loop (lines 8 to 15) successively expands a hypothesis selected from the current Frontier (line 9) with the largest UB value. Expansion and subsequent updates of Frontier, Candidates, Lknownrand Le,, are then done by procedure update (line 10) as described earlier. After each hypothesis expansion, a subset S of ~andi~te~ is identified at line 12 which contains all hypotheses having L values greater than or equal to the largest UB value of all members of Frontier. If S is not empty, then function tests&(S) is called at line 13 to test if a solution can be constructed from hypotheses in S. If tests01 returns a nonempty set of hypotheses, this set is then returned by search as a solution for the given problem (line 14). Otherwise, the while loop continues to select new nodes from Frontier for expansion. A simple example should clarify the problem-soling strategy adopted by algorithm search. EXAMPLE. Figure 6 illustrates a few steps of hypothesis expansions for the problem,in Figure 1, while Table 1 gives L( DI), UB( Of), and lcst( D,, M+ ) for some generated hypotheses 0,. After expanding hypotheses 0 and { d, } in that order, { d,, d, } is the hypothesis in Frontier with largest UB value and it is a cover of M+. It is then expanded and moved from Frontier to Candidates, and its four descender&s are added to Frontier. Next, (d4} is selected and expanded, resulting in Frontier = {{d,}, (4, d3}, (d,, d,, d,}, {d,, d,, d4}, (d,,d3,d5) (dz,d3,d6} {dl,d4}+ {d,,d,)}, and Candidates= {(dZ,d3}}. Note that d, is not a cover of NC; it is not added into Candidates. Now, L({ d,, d, }, M+ ) = 0.00602 is found to be greater than the UB values of all hypotheses in Frontier (see Table 1); thus {d,, d, } is identified to be the most probable hypothesis for the given M+. If identifying the single most probable diagnostic hypothesis were the goal of problem solving, the algorithm could

168 I. 2

YUN PENG AND JAMES A, REGGIA procedure

searchj

variable

D,M,C,M+,CM)

S Solution

begin

4.

Fronlier

5.

Candidates

6.

L-l

7.

I‘?# :-= I,,, (0, M+);

8.

while

:=

‘=

(0);

:= 0; L(I,M+);

Frontier

# 0 do

{* search

9.

wlect Di from Frontier

10.

update(D,

11.

upl]w := max{UR(D,

1”

‘3 :L {D, E Caw~idntes 1L(D, ,M+) 2 upper};

13.

solution

14

if solution

Ii. IO.

D, hypothesis,

upper real,

L lROumL,

3,

Gandidntes set of hypotheses,

Frontier

part of the space zl) *}

with thr largest, UH vdue;

);

{* hypot.hesis ,M’)

*f

(* t,cst for solution

*}

is /‘alitd and r&uritrd

*}

1D, E Frontier);

:T testsol( # 0 then

expansion

return(solu

tion)

{* a soliition

endwhile end. Fig. 5.

Algorithm

search.

terminate. However, since the goal is to identify a smallest set of hypotheses whose posterior probabilities sum to more than CM, further work is necessary. Considering all hypotheses in the current Frontier, { d, } and { 4, d, } are not covers; their relative likelihood is zero. Then from Equation (14) and the table with Figure 6,

L known= 0.0001018 + 0.0000448 + 0.0001084

+ 0.~654 = 0.0064378;

-I-0.~254

+ 0.~72

+ 0.~602

COMFORT

Fig. 6. expanded:

MEASURE

A portion 0,

(d,},

of space (d2,d31,

FOR PROBLEM SOLVING

2O generated (d4).

by algorithm

Q 0

search. The sequence

of hypotheses

(dl),....

TABLE Important

169

Values Associated

W D,, M+ ) 1 0.00404 0.175 0.015 0.ooo7071 0.0155556 0.0000629 0.0002333 0.0001637 O.OON635 O.OQOo606 0.0013333 0.0003591 0.0000425 0.0000165

1

with Each Generated

LCD,,

M+)

0 0 0 0

0.0001018 0.00602 O.oOOO448 0.0001084 0.0000654 0.0000254 0 0.000072 0.0000793 0.0000014 o.OQOQOO5

Hypothesis

Lt(D,. bf+ ) 0.32299 0.0012835 0.0220405 0.0045515 0.0000858 0.0005294 0.0000019 0.0000044 0.0000038 o.OOOOO19 0.0000181 0.0002627 0.0000755 0.0000129 0.0000052

170

YUN PENG AND JAMES A. REGGIA

and from Equation

(15) and the table with Figure 6,

L,,, = 0.0012835 + 0.0000858 + 0.0000019 + 0.0000044 +0.0000038+0.OOOOO19+0.0000181+0.0002627 = 0.0016621. Therefore,

for the most probable

inf((d,,d,})

cover {d,, d3}, we have from Equation

(11)

0.00602 = 0.0064378 + 0.0016621 ” o’743

and

suP( { d, 3d, >) = (3;&;;&3

= 0.935.

Since P({ d,, d, } 1M+ ) < sup({ d,, d, }) < CM = 0.95, the single most probable hypothesis is not sufficient as a problem solution according to Definition 3, so problem solving continues. To identify the second most probable cover of Mt in this example, additional expansions are performed until more hypotheses are expanded and added into Candidates, and the hypothesis in Candidates with the second largest L value exceeds the largest UB value among all members of Frontier. After successively expanding hypotheses { dl }, { d,, d, }, { d,, d, }, { d,, d, }, {d,, d,, d4 }, and { d,, d,, d, }, the hypothesis {d,, d,, d4} in Candidates is found to be the second most likely cover, with relative likelihood 0.0001084. (Note that at this time we have identified that { 4, d, }, { d,, d, } { d,, d4 }, and { d,, d,, d, } also have their L values greater than the largest UB value of hypotheses in the current Frontier, and that they are thus the third through sixth most probable covers of M+.) At this moment, Lknown is calculated to be 0.0065296 and L,,, to be 0.000048. As occurs after each expansion, the algorithm now tries to form a solution using function testsol. If a solution cannot be formed, further expansions may be needed either to identify more hypotheses, or to achieve a better approximation of inf and sup values of hypotheses by reducing L,,, and increasing L known.Among the six most probable covers of M+ identified so far, the sum of inf values of the first four of them is given by

= 0.91523 + 0.01648 +0.01548 +0.01206 = 0.959,

COMFORT

MEASURE:

FOR PROBLEM

171

SOLVING

and the sum of sup values of the first three most probable

ones is given by

= 0.92196 +0.01660 + 0.01559 = 0.954. As calculated above, both Sum, and Sum, are greater than the given CM value. Sum, > CM implies that the sum of posterior probabilities of the first four most hypotheses exceeds the given CM, and thus they are sufficient to form a solution [see condition (1) of Definition 31. However, Sum, > CM implies, or at least does not preclude, the possibility that the sum of posterior probabilities of the first three most probable hypotheses might still exceed CM. In other words, the first four most probable hypotheses might not satisfy the ~~rn~ty condition set for a solution [see condition (2) of Definition 31. The reason for such uncertainty is that the current inf and sup values form bounds which are too coarse concerning the involved posterior probabilities. Further expansions are needed to narrow these bounds. After three more expansions (expanding hypotheses {d,, d,, d6}, {d,, d,, d3}, and { 4, d4}), Lknown is increased to 0.0065298, and L,,, reduced to 0.0000264. Now,

= 0.91821+ 0.01653 + 0.01553 = 0.9503,

and

sup({d,,d,})+sup({d,,d,,d,})=0.92193+0.0166=0.9385. Since 0.9385 < CM < 0.9503, we now can conclude that ( ( d,, d, >, ( d,, d,, d, }, ( d,, d, }} is a solution of the given problem as specified in Definition 3. For the small knowledge base used in the above example, there are a total of 26 = 64 hypotheses, among which 42 are covers and 22 are noncovers. When a solution is calculated by algorithm search as illustrated above, 31 hypotheses are generated, among which 5 are noncovers and 26 are covers. The apparent inefficiency (about haIf of all possible hypotheses are generated) is due to the facts that the problem size is ~e~stic~y small, that the prior probabilities of some disorders are unusually high (0.2 for p3 and p4), and that more than half of all manifestations (3 out of 5) are taken to be present, something which was done to facilitate the numerical illustration given above. In most real-world applications, where in general only a small fraction of all possible manifestations are present, only a small fraction of all possible hypotheses would be

172

YUN PENG AND JAMES A. REGGIA

generated before a solution is reached. (For more about this, see the discussions of experiments in Section 5.) 4.2.

CORRECTNESS

PROOF

FOR ALGORITHM

search

Before the formal proof of the correctness of algorithm search is given, some useful lemmas are established. Lemma 4 first gives an important property of function expand.

30,

LEMMA4. Let DJ I DI be a cover of Mt E expand( D, ) such that DJ 2 DK.

in a diagnostic problem.

Then

Proof. Then Mi -effects(D,) #0 (see the Case 1: DI is not a cover of M’. definition of cover in Section 2). Let an arbitrary manifestation mj E M’ effects( DI) be chosen to expand DI. Then any DJ 1 D, covering M’ must cover mj, i.e., ( DJ - Or) n causes( mj) f 0. Therefore, DJ 2 DI U {d, } for some dk E causes( mj). Case 2: DI is a cover of M+. Then any DJ I) DI is also a cover and D,-D,cD-D,#O.Therefore,DJ~D,U{dk}forsomedkED-D,. n An immediate consequence of Lemma 4 is that any proper superset of Dr which covers M+ can be generated by successive expansions starting with D,. Therefore, any cover of M+ can be generated by a series of successive expansions starting with hypothesis 0, the only hypothesis contained in Frontier when algorithm search begins problem solving. Next, we prove that procedure update correctly updates Frontier, Candidates, Lknown, and L,,,. THEOREM

5

of update). If at fine 10 of algorithm search the

(Correctness

statements (1) Frontier contains all hypotheses which have been generated but not yet expanded, (2) Candidates is the set of all covers which have been expanded, L(D,, M+ ), and (3) Lknown = c D, E Frontier U Candidates D, covers

(4) L,,, =

c

M+

1,,,(4,

M+ )

D, E Frontier

are all true before the execution of update( DI) where D, E Frontier, then they are also true after the execution of update( DI). Proof. (1): Since DI is expanded (line 4 of update), its deletion from Frontier (line 5) does not invalidate statement (1) concerning Frontier. For each DJ

COMFORT

MEASURE

FOR PROBLEM

SOLVING

173

generated by expanding 4, if it has not been expanded before, then it is not in Candidates; thus, adding such a DJ to Frontier (line 9) does not invalidate statement (1). Neither does omitting L$ from Frontier if DJ is already in Candidates, since in this case it was previo~ly expanded. Since these are the only places Frontier is altered, the statement (1) concerning Frontier maintains true after update( 4). (2): Obvious by line 7, the only alteration to candidates in update, (3): tet X = ( DK f Frontier U candidates 1DK covers MC ). Showing that update preserves statement (3) is equivalent to showing that the changes of L known exactly correspond to the changes of X. The only places Frontier or Candidates can be modified in update are lines 5,7, and 9. However, lines 5 and 7, which only involve 4, do not change X, since if 4 is not a cover of M ’ then Df Z X before and after the execution of updatef1),), while if D, covers M”., what lines 5 and 7 do is merely move D[ from Frostier to Candidates. Thus we only need to consider the changes of X by line 9. For all DJ added into Frontier at line 9, only those which are also covers will be added into X. Precisely for these DJ, their L values are added to Lknown (lines 11 and 12); thus statement (3) is preserved. (4): By reasoning similar to that for (3), where lines 6 and 10 for changing L,,, correspond to changes to Frontier at lines 5 to 9. 8 Having established that procedure update correctly updates Frontier, Cundiproperty concerning dates, Lknownr and &St, we now prove an important unknown hypotheses during the execution of algorithm search. LEMMA 6. At the beginning of each pass through the while loop (line 8) in algorithm search, for any cover DR of M’ which has not been generated by previous hy~othesi.~ ex~a~io~ (i.e., for any couer DR B Frontier u Ca~ld~~tes f, there exists some Ds E Frontier such that DR 3 Ds. Proof. By induction on the number of hypothesis expansions (number of passes of the while loop) performed. Base Case: At the begiinning of the first pass of the while loop, no expansions have been performed. Then Frontier = { G?}. The lemma holds trivially, since DR 10 for all covers of MC. Induction step: Assume the lemma holds for Frontier’“), which contains all hypotheses generated but not yet expanded when some #I > 0 hypothesis expansions have been performed. Now show that if any Dl E Frontier(“) is selected for expansion during the (n + 1)th pass, the lemma holds for the resulting Frontier (n+Z). Let DR be a cover of M’ which has not been generated after Dr is expanded. Then by the inductive hypothesis, DR 3 DJ for some DJ E Frontier(“). If we can find such a DJ which is not equal to D,, then DJ E Frontier(““’ because only Dr is removed from Front&(” by procedure

174

YUN PENG AND JAMES A. REGGIA

5). Otherwise, DR I D, and there does not exist other DJ E such that DR 3 DJ. Then by Lemma 4 and the fact that DR is not generated by expanding DI, we have DR 2 Ds for some Ds E expand( DI ). It remains to be shown that such a Ds cannot have been expanded before, so Ds 4 Candidates(“), and thus by lines 8 and 9 of update, Ds E Frontier(“+l’. Assume the contrary, that Ds had been expanded before during the kth pass of the while loop, where k 6 n. Then by Lemma 4, there exists some D, E expand( Ds) such that DR 3 D, 3 Ds. Repeating the above argument, if D, has also been expanded before, etc., then since there are only n hypothesis expansions before D, is expanded, there must exist some D,, such that DR EJ D,_ 3 Ds which has been generated but not expanded after the first n expansions, thus belonging to Frontier(“). Since D, is a descendant of D,, D,” # 0,. This contradicts the assumption that D, is”a proper superset of only W DI among all members of Frontier(“). update

(line

Frontier(“)

COROLLARY 7. Following each execution of update at line 10 of algorithm for any cover D, of M+,

search, Frontier},

then L(D,,

M’)

2 L(D,,

if L(D,,M+)>~~x{UB(D,,M+)]D,E M+)

f or any DJ which either is in Frontier

or has not been generated yet. Proof. Frontier

By Lemma 6, for any such DJ, there exists some DL in the current such that DJ 1 DL. Then by Equation (9a) and Lemma 1, we have

L(D,,M+)


6m={UB(DK,Mf)lDK

EFrontier}


W

By Theorem 5, any cover D, of M+ either is in Candidates (has been expanded) or is in Frontier (has been generated but not expanded) or has not been generated. Then by Corollary 7, covers in Candidates with their L values greater than or equal to the largest UB value of all members of Frontier are not less probable than any other covers, and thus compose a set of the most probable hypotheses. Among them, the one with the largest L value is the most probable hypothesis, the one with the second largest L value is the second most probable hypothesis, etc. It is for this reason that algorithm search constructs this set S as it does at line 12 and passes it to function testsol to see if a solution can be formed from hypotheses in S. Before we prove the correctness of function testsol, we first show that Equation (16) is appropriate as a test condition for a solution. LEMMA 8. Let D,, 4,. problem

P = (D, M,C,

. , DK be K most probable hypotheses of M+ for a

M’ > ordered in descending order of their L values (i.e.,

COMFORT

MEASURE

FOR PROBLEM

SOLVING

175

DK is the least probable one among the K hypotheses),

comfort measure. Then S = ( D,, 4,.

and let CM be the given . . , DK } is a solution for problem P if the

inequality given in Equation (16) is met, i.e., if

f inf(D,) I=1

>CM>Kclsup(DI).

(16)

I=1

Proof. By Equation (ll), since P(D,IM+) > inf(D,), XF=‘=lP(D,lM’) > CF=iinf( Dt) > CM, so condition (1) given in Definition 3 is met. Suppose that another set of hypotheses T also satisfies condition (1) of the problem solution, and 1TI = 1SI - n, where n > 0. Since D, through DK are a set of K most probable hypotheses and if P( D,IM+ ) > P( DJIM’ ) then sup( Dt>z+ sup( DJ), then 1 ~,,rsup(D,)
N)

of them,

4,

D2 ,...,

DK, satisfy the inequalities of Equation

(16),

then testsol( S) = { D1,D2,. . . , DK } is a solution of P according to Definition 3. Otherwise, testsol( S) = 0 , indicating failure to identil5, a solution. Proof. If the K most probable hypotheses in S satisfy Euation (16) it is obvious that the test at line 8 will be satisfied during the Kth pass of the while loop [S f 0 and C, E so,sup( DJ) < CM are always true before the Kth pass of the loop]. Then by Lemma 8, the set of hypotheses Sol returned at line 8 is a solution of the given problem. If there do not exist such K hypotheses, among the N hypotheses given in S, satisfying Equation (16) then either Cy=,inf( DJ) < CM or X:J”=isup( DJ) >, CM > XJ”=,inf( DJ) for some K Q N. The test at line 8 will always fail. In the first case, S = 0 at line 5 after N iterations. In the second case, C o, E so,sup( DJ) >, CM after K iterations. In either case, the while loop terminates and returns 0. n

Finally,

we prove the correctness

of algorithm

search.

THEOREM 10 (Correctness of search). For any diagnostic problem P = will terminate and return a solution (D, M,C, M+ ), search(D, M,C, M+,CM) of problem P. Proof. M+ # 0);

Note that Lknown= L( 0, M+ ) = 0 if 0 does not cover M+ (when therefore, regardless of whether 0 covers M+ or not, by initializa-

176

YUN PENG AND JAMES A. REGGIA

tions of lines 4 to 7, the statements in Theorem 5 about Frontier, Candidates, L known, and L,,, are true before the while loop of procedure search (lines 8 to 15) starts. Then by Theorem 5, they will always be true after each call of update in line 10. Thus, any cover of Mt either is in Frontier U Candidates or has not been generated. Then, by Corollary 7, all hypotheses 4 E S (S is formed at line 12) are the most probable hypotheses. If the most probable K of them satisfy Equation (16), then by Theorem 9, solution = testsoI(S) (line 13) is a solution of the given problem. Since for any problem there exists a solution, it only needs to be shown that such S can be obtained in a finite amount of time of the execution of algorithm search. Assume the contrary, that such an .S can never be found. Then solution = @ at line 14 is always true (by theorem 9), and the while loop continues forever. On the other hand, since the hypothesis space 2” is finite, and no hypothesis which has been generated can be generated again by any of its supersets (descendants), each time after the execution of update( 4) at line 10, the current Frontier is different from all previous ones. Then after a finite amount of time, the current Frontier becomes 0, and the current Candidates becomes the set of all covers of M+ (by Theorem 5). At this time upper = max{UB( DK, M+ ) 1DK E 0 } = 0 by definition at line 11, thus S = Candidates. Then the K most probable hypotheses D,, 4,. . , DK in S for some KG 21D1must be a solution of the given problem. To show that they satisfy Equation (16), notice that now L known

=

L(D,,M+) c Dt E Candidates

=

c

L(Df,

M+)

D,E~~

and L,,, =

t:

UB( 4,

M+ )( eA,f - 1) = 0;

D,EfZ

therefore, DI E 2O. 4.3.

by Equations

EFFICIENCY

(10) and (ll),

inf( Df ) = sup( D,) = P( DIi h4+ ) for all n

CONSIDERATIONS

FOR ALGORITHM

search

In presenting algorithm search, we have chosen to ignore a number of efficiency issues to keep our presentation as str~~tfo~~d as possible. We now turn to these issues, which would especially be of concern to someone interested in implementing the algorithm in this paper as computer programs. Algorithm search, like all best-first heuristic search procedures, is in general quite efficient. However, some statements in search and in the subroutines it calls may be implemented more efficiently than as shown. For example, if we

COMFORT

MEASURE

FOR PROBLEM

SOLVING

177

keep a record of UB( D,, M + ) and rIdk E D_ 4 (1 + 1yJ for each known hypothesis D,, then for the hypothesis DJ = D, U { d, } which is generated by expand( D, ), we have UB(D,,M+) =UB(D,, M+).a; if (Y;~1, or UB(D,, W) = UB( 4, Mi ) otherwise [see Equations (9a, b)] and

Thus the c~culations for I,,,( D,, M’ ) and L,,, in procedure update are quite tractable. As another example, in line 8 of function testsol, the summation can be accumulated in each iteration of the while loop, not recalculated over and over again as given in Figure 4. Lines 9,11, and 12 of algorithm search can also be implemented more efficiently if Frontier andcandidates are implemented by ordered lists (in the descending order of UB and L values, respectively) rather than sets. In this presentation we consider the hypothesis space to be the space of all subsets of D, i.e., 2O, which is usually very large when the total number of possible disorders is large. However, in most real-world problems, we may be only interested in disorders that are capable of causing some of the given present m~festations, i.e., in our notation, disorders in causes( Mt ). Therefore, the scope of problem solving can be restricted to causes( M+ ), which is usually a very small subset of D. Then the whole search space is substantially reduced from 2O to 2ca”ses(M+). Th e underlying reason for this is that the prior probability of a disorder di is usually very small (e.g., in medicine, p, < 10 I even for very common disorders in the general population, such as a cold or the flu, and is much smaller, say lo-‘, for rare disorders); thus d, is very unlikely to be in a plausible hypothesis if it is not supported by any of the present manifestation. Interested readers are referred to [ll]. Another efficiency issue is more subtle, and requires careful consideration. Requiring a minimal size set of hypotheses for a solution [condition (2) of Definition 33 might sometimes lead to excessive effort on the part of algorithm search. Consider the fo~~o~g situation. Let L),,D2,. . . , DK be K most probable hypotheses for a given M’, where CT= 1P( D, j M’ ) = CM + S for a positive 6 very close to zero, and Cr_1 K--lP ( D , JMC) CM would always be true. Thus, the test condition (line 7) of testsol would not be satisfied until Cf= 1P( DI) M+ )--YE;= iinf( DI) < S, which, since S = 0, implies that L,,, is close to zero and Lknown is close to the sum of L values of all covers of M.+

178

YUN PENG AND JAMES A. REGGIA

(i.e., until almost all covers are generated and expanded). One way for an implementer to cope with this potential problem is to loosen the definition of a problem solution slightly to allow it, in some cases, to contain one more than the minimal number of hypotheses. Thus, a problem solution might be formally redefined as follows. DEFINITION 4.

A set S = { &,L$,.

solution of the problem

. . , & } is said to be an almost minimal

P = (D, M, C, Mf ) if

(1) P( DI v D2 v ~5. v D,IM+) =C;&‘(D/IM+) (2) among all T G 2’ satisfying Z o,,TP(DIIM+)

2 CM, and > CM, ISI - ITI ~‘1.

If one is willing to accept almost minimal solutions, the test condition for identif~ng a problem solution can then be relaxed as in the following theorem. THEOREM 11. I&f Q,D2,. probable S=

among

. . , DK be K most probable covers for a problem

ordered in descending order of L( DI, M+ > (i.e.,

P=(D,M,C,M+),

the K covers).

DK is least

Let CM be the given comfort measure.

Then

is an almost minimal solution for problem P if

(Dl,Dz,...,DK}

and K

K-l

I=1

I=1

c hf(Dr) ’ c sup(Q).

(17b)

Proof. By Equation (II), since P( D, IM+ ) 2 inf( D,), CF= 1P( Dr IMC > 2 CiK,tinf( 4) > CM. Condition (1) given in Definition 4 is then met if Equation (17a) is met. To establish condition (2) of Definition 4 from Equations (17a, b), suppose that another set of hypotheses T also satisfies condition (1) of Definition 4, and ISI - I7’l= n, where n >I. Since DI through DK_n are K- n most probable hypotheses and I TI = K - n, then. Er:tP( DIL),IM+) 2 C, E TP( DJ [ Mt ), and in turn 1::: sup( 4) 2 IID, E r sup( 4). Since T satisfies condition (1) of Definition 4, we then have K-2 c

I=1

K-n

sup(Q)

3 [zI

SUP(~)

r&su~(n,)

I

dTP(QIM+) I

>*M.

COMFORT

MEASURE

On the other hand, Equation (17b) that

FOR PROBLEM

179

SOLVING

since inf( DK) G inf( DK-l) < sup( DK_l>, we have from

i.e., C;“,;” sup( DI) < XF<:,-,‘inf(D,). Then by Equation (17a),Zf::sup( 0,) < CM. This contradicts the previous derivation that Cf:fsup( D,) > CM. Therefore, n condition (2) is met if Equations 17(a, b) are met. Equations (17a, b) are more relaxed than Equation (16) in that they are implied by Equation (16) but not vice versa. Most importantly, some computation may be saved in worst-case scenarios, as was discussed in the beginning of this section and as is illustrated by the example below. EXAMPLE. Reconsider the example in Section 3.1, where the last hypothesis expansions (expanding { d, , d,, d, }, { 4, d,, d, }, and { d,, where necessary to satisfy Equation (16). These expansions would not been necessary if an almost minimal solution had been acceptable. Before four expansions, as shown in Section 3.1, we have

three d, })

have these

inf({d,,d,})+inf({d,,d,,d,})+inf({d,,d,}) = 0.91523 +0.01648 + 0.01548 = 0.947, sup({d,,d,})+sup({d,,d,,d,})+sup({d,,d,})

~0.954.

Thus, the conditions set by Equations (17a, b) are met by these four most probable hypotheses and they are returned as an almost minimal solution of the problem. As shown in Section 3.1, the first three of these hypotheses are enough to form a solution (with minimal size), but by allowing one extra hypothesis, namely { d,, d, }, to be included, fewer hypotheses are generated and expanded during problem solving. 5.

DISCUSSION

The diagnostic strategy presented in this paper does not require the manifestation-independence assumption, and thus avoids a major obstacle to applying Bayes’ theorem to diagnostic problem solving when multiple simultaneous disorders are present. The comfort measure CM provides a criterion for

180

YUN PENG AND JAMES A. REGGIA

controlling the “quality” of a problem solution. By computing and refining inf( DI) and sup( Or) values for each known hypothesis, not only the relative likelihood, but also the range of the posterior probabilities of hypotheses D, can be obtained and made progressively more precise during problem solving. This allows identification of a solution such that the probability for the correct diagnosis to be among the hypotheses in the solution is at least as high as the given CM value. Moreoever, the size of the resultant solution set provides a meaningful measure of the specificity of the solution. Smaller-sire solutions indicate that a higher degree of specificity is achieved, while the larger-size solutions indicate that the solution is unspecific (too many alternatives are offered). Such unspecificity is largely due to the fact that the given M+ is not “specific,” or “focused.” By a specific Mi we mean one which closely matches the pattern of manifestations expected by some hypothesis DI, and thus gives strong support to that hypothesis. A nonspecific M+, on the other hand, does not closely match a pattern of manifestations produced by any possible D,, and thus tends to render weak support to a lot of hypotheses. Therefore a large solution size may be used as an indicator that further inquiry for more manifestations (making M+ more specific) or consultation with a human expert might be needed. There are other possible criteria that might be adopted for “quality control” of a problem solution. For example, a solution of a problem could be defined as a set of n most-probable hypotheses for a desired number n. There are several problems with this n-most-likely criterion. First, it is difficult to specify the desired number n from a user’s viewpoint, at least in as natural and meaningful a fashion as the comfort measure we proposed. Second, a fixed n may be too large for some cases, thus causing unnecessary search, yet too small for other cases, thus making the search end too early. In other words, determining a reasonable value for n a priori would require the user of a problem-solving system to know in advance how many hypotheses are reasonable for a specific M+. Also, by using an n-most-likely criterion, the specificity inherent in the problems, as discussed earlier, is not revealed by the solution of a problem-solving system. As another alternative, a solution could be defined as a set of most-probable hypotheses such that the ratio of hypotheses in the solution is within a given range. This seems more meaningful than the fixed-number criterion and can be considered as an approximation of the comfort measure. However, neither of these alternative approaches identifies the risk that the actually occurring hypothesis is missing from the solution unless some kind of bounds on posterior probabilities of existing hypotheses can be developed. This then raises the question of how tight these bounds should be. It is based on these considerations that we propose to use a comfort measure as a coherent means to control the quality of the problem solving and to drive a search process to its completion.

COMFORT

MEASURE

FOR PROBLEM

181

SOLVING

The issue of “quality control” for diagnostic problem solving using relative likelihoods, to the authors’ knowledge, has not been raised in most related previous work except Cooper’s [2]. In his work, Cooper not only developed an algorithm which is capable of successively generating the n most probable hypothesis for a given number H, but also realized the importance of estimating the posterior probability of a hypothesis in order to gain the absolute confidence of that hypothesis. To calculate these bounds, he also used the sum of likelihood values of all existing hypotheses and an upper bound on the sum of likelihood values of all unknown hypotheses. After normalization and simplification, his formula for the first sum is exactly our Lknownt but the second sum is different. He essentially used the prior probabi~ty P( DJ ) as the upper bound of the relative likelihood for unknown hypothesis Q, and 1 -CD, is known P(Q) as the upper bound of the sum of likelihoods of all unknown hypotheses. After normalization and using the notation of this paper, this amounts to

-_1 ‘,*,’ = ‘( ’ )

c &(D,,M+), D, is known

where

~~~D~,~+)

=

P! fl d,En,I- P,

In the following, we use Le.,* to refer to the approach introduced by Cooper. Our work presented in this paper represents a significant advance over Cooper’s approach as follows. First, the comfort measure is a more natural and meaningful measure for solution quality control than the n-most-likely criterion, as discussed earlier in this section. Second, using L& to provide a tight bound on posterior probability usually leads to generation of the majority of potential hypotheses, something which is practically intractable in real-world problems. For example, using Lg, on the example given in Section 3, 60 of the 64 possible hypotheses would be generated before the given CM was satisfied, while using our formula only 31 of them are generated. The reason for this performance difference is that we used UB( DJ, M+ ) = L,(DJ ,Mt )L,(DJ,M+ ) as the upper bound for unknown hypothesis DJ, a measure which is much smaller than L,(DJ,M+ ) used in L,*,, because L,(DJ,M+ ) is usually much smaller than 1.3 To further investigate this matter, two experiments were conducted. Experiment 1 used all possible M’ for the diagnostic problem given in Figure 1 (a total of 31 different M+ are possible, excluding M’ = 0). Experiment 2 used 30 randomly generated M+, 10 each where 1M+ 1= 3, 4, 5,

%ooper’s model was intended to apply to general causal networks which allow intermediate entities and causal chaining, and thus are more complicated than the two-layer networks we use in this paper, Since it is difficult to calculate L, values for hypotheses-in this more general setting, prior probabilities were used as upper bounds for posterior probabilities for hypotheses.

182

YUN PENG AND JAMES A. REGGIA TABLE 2 E!xperimental Results Ratios of hypotheses generated to hypothesis space size Experiment 1

Experiment 2

l

Model

Mean

Standard deviation

Peng and Reggia Cooper

0.608 0.909

0.175 0.097

Coefficient of variation 0.288 0.107

Mean

Standard deviation

Coefficient of variation

0.288 0.884

0.117 0.126

0.406 0.143

respectively, on a randomly created diagnostic problem 1D I= 10 and /MI = 10 and p, < 0.1 for all d, E L?. The efficiency of problem solving was measured using the average ratio of the total number of hypotheses generated over the size of the hypothesis space for all different W. For the hypothesis space, we used 2ca”ses(M+) instead of 2’. The results of the experiments are given in Table 2. Three points can be made in considering the results of these experiments. First, in both experiments, the model developed in this paper has a significantly better efficiency (lower average ratio 0.608 and 0.288) than the model using L,*,, {higher average ratio 0.909 and 0.884). Second, experiment 2 is closer to real-world problems than experiment 1 in terms of problem size, prior probability range, and percentage of all manifestations known to be present. As anticipated (see discussion at end of Section 4.1), our model shows significant efficiency improvement in experiment 2 over experiment 1, but the model using L,*,, was only slightly better. Third, the degree of variation of the ratios for all cases, measured by coefficients of variation (standard deviation divided by the mean), were also different for the two models. The fairly high degree of variation of our model (0.283 and 0.406) indicates that the efficiency varies a lot for different inputs (different M’ ). The conjecture is that those more specific, or focused, M+ ‘s tend to generate fewer hypotheses before a solution is reached than less specific M+ ‘s. This should make our model more efficient in the real-world applications than in randomly generated examples as studied here, because for many real-world problems, the given findings are usually quite specific (because they are caused by a set of disorders which are actually occurring). However, the degree of variation using L,*,, is considerably lower (0.107 and 0.143), which implies that the specificity of M+ can only make a very small difference from the average behavior. Combining the above arguments, it is fair to predict that our model will be practically much more tractable than the earlier approach developed by Cooper. In conclusion, it should be noted that the strategy presented here requires that all manifestations be given in M + before the problem-solving process

COMFORT h%EASLIRE FOR FR~~L~M SOLVING

183

starts, and assumes that all mj G?M4 are absent. That is, it works for “closedworld” problems. In real-world problems, however, the presence or absence of manifestations of a case are usually made known incrementally during the diagnostic process tbrougb a sequence of questions. Therefore, at any moment, a manifestation is in one of the three states: known to be present, known to be absent, or presence/absence unknuwu. In other words, diagnostic problems are “open,” and problem solving is a hypothesize-and-test process [14, 5, 6, 4, 7, 181. Since, in algorithm search, all covers of known present manifestations are either kept in Frontier U Candidates or are proper supersets of some members in Frontier, it is not difficult to modify the present search algorithm to solve “open” problems in a fashion analogous to that used with an “open” diagnostic strategy identifying only the single most probable hypothesis 110, 13). Algorithm searchcaneasily be modified to also comply with such criteria, because it can successively generate and identify the most probable hypothesis, the second most probable hypothesis, etc. Perhaps an ideal approach would be to incorporate multiple different measures of solution quality, and to work in an open mode so that an optimal soIution can be found through questions about selected unknown manifestations during a sequence of hypothesize-and-test cycles. This is an important direction for future research. REFERENCES 1. E. Charniak, The Bayesian basis of common sense medical diagnosis, in Proceedings c$ National Conference on Artificial Intelligence, AAAI, 1983, pp. 70-73. 2. G. Cooper, NESTOR: A Computer-Based Nedical Diagnostic Aid That Integrates Causal and Probabilistic Knowledge, STAN-CS-84-1031, Ph.D. Dissertation, Dept. of Computer Science, Stanford Univ., Nov. 1984. 3. J. de Kleer and B. Wi&ms, Reasrtaing about muttipie fauhs, in P~~e~~~~~~uf t&e 5th Natiomd Conference on Artijkiaaf Inte~~~ge~~e, MI, P~adeIp~a~ 1986, pp. 132-139. 4. A. Elstein, L. ShuIman, and S. Sprafka, Medical ProHem Solving- An Anafvsis of Clinic.& Reasoning; Harvard U.P., 1978. 5. J. Josephson, Explanation and Induction, Ph.D. Thesis, Dept. of Philosophy, Ohio State univ., 1982. 6. J. Josephson, B. Chadrasekaran, and J. Smith, Assembling the best explanation, presented at IEEE Workshop on Principles of ~owledge-aged Systems, Denver, Dec. 1984. 7. 1. Kassirer and G. Gorry, Clinical problem solving: A behavioral analysis, Ann. Inr. Meu! 89:245--255 (1978). 8. J. Pearl, ~~st~buted revision of belief com~tment in muIti-h~o~~s~s inte~retation, in Proceedings of the 2nd AAA I Workshop on U~lcert~~ntyzn Artificiai Intelligence, Philadelphia, Aug. 1986. 9. J. Pearl, Fusion, propagation, and structuring in belief networks, Artificial InteNigence 29(3):241-288 {Sept. 1986). 10. Y. Peng, A Formalization of P~simon~ous Covering and Probabilistic Reasoning in Abductive Diagnostic Inference, Ph.D. Dissertation, Dept. of Computer Science. Univ. of Maryland, 1986.

YUN PENG AND JAMES A. REGGIA 11. Y. Peng and J. Reggia, Plausibility of diagnostic hypotheses: The nature of simplicity, in Proceedings of National Conference on Artificial Intelligence, AAAI, 1986, pp. 146-162. 12. Y. Peng and J. Reggia, A probabilistic causal model for diagnostic problem solving. Part one: Integrating symbolic causal inference with numeric probabilistic inference, IEEE Trans. Systems Man Cybernet. SMC-17(2):146-162 (Mar. 1987).

13. Y. Peng and J. Reggia, A probabi~s~c causal model for diagnostic probfem soking. Part two: Diagnostic strategy, IEEE Trans. Systems Man Cybernet., special issue for diagnosis, SMC-17(3):395-406 (May 1987). 14. H. Pople, Heuristic methods for imposing structure on ill-structured problems: The structuring of medical diagnostics, in Artificial Intekgence in Medicine, (P. Szolovits, Ed.), 1982, pp. 119-190. 15. R. Reiter, A Theory of Diagnosis from First Principles, TR-187/86, Dept. of Computer Science, LJniv. of Toronto, 1986. 16. J. Reggia, D. Nau, and P. Wang, Diagnostic expert systems based on a set covering model, Internat. J. Man-Machine Stud., NOV. 1983, pp. 437-460. 17. J. Reggia, D. Nau, P. Wang, and Y. Peng, A formal model of diagnostic inference, Inform. Sci. 31:227-285 (1985). 18. A. Rubin, The role of hypotheses in medical diagnosis, in Proceedings Joint Conference on Artificial mtelligence, IJCAI, 1975, pp. 856-862. Received

12 August

1987; revised 2 November

1987

of International