A general Bayesian model for hierarchical inference

ORGANIZATIONAL BEI-L4VIOR AND I t U I ~ A N PERFORMANCE 10, 388-403 (1973) A General Bayesian Model for Hierarchical Inference 1 CLINTON W . KELLY I...

Download PDF

621KB Sizes 18 Downloads 185 Views

Report

PDF Reader
Full Text

ORGANIZATIONAL BEI-L4VIOR AND I t U I ~ A N PERFORMANCE

10, 388-403 (1973)

A General Bayesian Model for Hierarchical Inference 1 CLINTON W . KELLY I I I AND SCOTT BARCLAY

Decisions and Designs, Inc. An inductive inference problem will often be structured so t h a t the target variable (hypotheses) are logically distant from the observable events (data). I n this situation it may be difficult or impossible to assess the probabilistie connection between them, b u t it may be possible to decompose the problem through the use of intermediate or explanatory variables. T h a t is, it will often be possible to assess the likelihood of the observed data given some intermediate variable, and the likelihood of t h a t intermediate variable given another, and so on, until the hypotheses of interest are reached. Inferences which incorporate one or more intermediate variables are called hierarchical, cascaded, or multistage inferences. The present paper presents a normative model for the solution of the general hierarchical inference problem. The formulation begins with a formal description of the hierarchical inference tree, including a discussion of various simplifying conditional independence assumptions. The solution is first derived for three special case models of differing structure, and then the algorithm for the general solution is given for two cases: one in which the conditional independence assumptions have been made and one in which they have not.

The state of knowledge 8 required to solve an inductive inference problem will often be partitioned so t h a t it is difficult or impossible to directly estimate likelihoods linking observable events and target variables, i.e., data and hypotheses. For example, one state of knowledge 81 might describe distributions involving a random variable, ~)~jk..., and a second state of knowledge 82 might describe distributions for a random variable h, but neither state alone is sufficient to calculate the joint distribution {~,k..., h}. In such situations, states of knowledge can be combined to yield {5)~¢k..., h} if one or more variables, epqr..., called intermediate or explanatory variables, can be found such t h a t

and

{e p~r..., h} ~ S2. 1 This paper was supported in part b y the Advanced Research Projects Agency and the Office of Naval Research G r a n t Nos. NONR-N-00014-73C-0149 and NR-197-023. 388 Copyright © 1973 b y Academic Press, Inc. _411 rights of reproduction in any form reserved.

BAYESIAN NIODEL FOR HIERARCHICAL INFERENCE

389

h

I

e

.........

e~

/ Dl11

D1221

FIG. I. Inference

tree.

The partitioning of the state of knowledge describing the relationship between predictor variables ~)ijk... and a target variable h leads to a class of inference problems variously called hierarchical, cascaded, or multistage inferences (Kelly, 1972). The sequential nature of a hierarchical inference can be described schematically b y a diagram called an i n f e r e n c e tree, in which the nodes of the tree correspond to predictor variables, intermediate variables, and target variables and where one node is shown b e l o w another only if the state of knowledge is partitioned, such t h a t an inference about the lower variable is required in order to m a k e an inference about the higher variable. Figure 1 illustrates ~ generalized inference tree. I t implies, for example, t h a t D 1H is diagnostic of e~, which in turn is diagnostic of e~, which impacts on h. 1~,2 Thus, an inference about the state of any variable is based on inferences about all the variables subordinate to it. For example, an inference about the true state of h is based on inferences about the state of e I, e 2, . . . , e% which in t u r n are based on inferences about variables subordinate to them, and so forth. FORMAL DESCRIPTION OF THE INFERENCE TREE Formally, the general inference tree is a graph in which the vertices represent hypotheses, intermediate variables, and data, and the edges 1~D~jk_. denotes a particular state of the random variable H~..-. This is the event which is unambiguously observed. 2 The convention will be adopted of identifying a particular random variable by using as a superscript the superscripts of all the variables directly connected to it from above, proceeding from the top down. Thus, the superscript " i j M ' designates a variable that is reached by starting at the top of the tree, taking the ith branch (counting from the left) emanating from the first node, the jth branch from the second node, and the kth branch from. the third node.

390

KELLY AND BARCLAY

b e t w e e n vertices i m p l y t h a t knowledge of one v e r t e x is diagnostic in inferring the state of a variable at a n o t h e r vertex. To construct this graph, it is necessary to define a vertex set and a relation on the v e r t e x set. • Definition 1-1. V = {v;Ig a one-to-one correspondence b e t w e e n the elements v ~ C V and the elements of the set {h, e 1..., . . . . e .... , D 1.--, •..

,D

. . . . }}

L e t L be a relation on V. • Definition 1-2. L = { (v ~, vJ)[v~Rv j iff the state of knowledge describing the inference p r o b l e m is decomposed such t h a t an inference a b o u t v ~ is necessary in order to m a k e an inference a b o u t v~}. T h e relation L is reflexive, a n t i s y m m e t r i c , and transitive; it is called a partial ordering of the set V. T h e partial ordering will be denoted b y v ~ < v3 if (v ~, vJ) C L, and the pair consisting of the set V and the partial ordering of V will be referred to as a partially ordered set and denoted b y (V, _~ ). T h e relation vi _~ vJ implies t h a t there are at least as m a n y variables in the inferential sequence (v ~, v.~), (vJ, vJ-1), . . . , (v2, h) linking v': and h as in the sequence linking vJ and h. A general inference tree will be defined as the g r a p h t h a t arises w h e n the set L ' of admissible relations is restricted to those relations C L t h a t yield an algebraic structure called an acyclic u p p e r semilattice, or an inference tree for short. Some t e r m i n o l o g y is required to describe this structure. • Definition 1-3. v i < vJ if v i < vJ and i ¢ j. This implies t h a t v5 is closer to h t h a n v ~, or higher in a hierarchy in which h is the top. • Definition 1-4. v] Covers v ~ if v i < vJ, and for no vk C V does it hold thatv ~
BAYESIAN

MODEL

FOR

HIERARCHICAL

INFERENCE

391

h

°,/\. \./'

h

D1 (

Dl12

D121

J

D2

D

FIG. 2. A tree and an upper semilattiee.

• Definition 7-6. A partially ordered set (V, __%)is an inference tree if it is an upper semilattice and its Hasse diagram is acyclic, i.e., it contains no closed paths.

A tree is the simplest instance of an upper senJlattice. The Hasse diagrams in Fig. 2 are both upper semilattiees, but only Fig. 2(a) is a tree. The diagram in Fig. 2(b), exhibiting multiple paths, shows why it was necessary to further restrict the structure of the partially ordered set (V, G) beyond t h a t of a general upper semilattice. Restricting the set L' to those relations t h a t yield an acyclic upper semilattice has two important interpretations. First, it means that a predictor variable or intermediate variable has one and only one immediate successor. For example, a datum can only impact directly on one intermediate variable. In practice, this has not proven to be a constraint. As an illustration, a datum that appears to impact on more than one intermediate variable typically is composed of several data (Kelly & Peterson, 1973). Second, a general tree permits data to relate to the target variable through any number of intermediate variables, and for any given intermediate variable to link several data that enter the hierarchy at different levels to h. For example, Fig. 3 shows that e1 mediates the diagnostic impact of D 12 and D m on h. The normati:¢e Bayesian model for hierarchical inference described b y Gettys and Willke (1969) is applicable only for the special case of the general inference tree shown in Fig. 4. The purpose of this paper is to extend the Gettys-Willke :model and develop a model for the general inference tree. h

/°1/\ e

D111

e

.

.

.

.

.

.

.

.

.

D12

FIG. 3. Intermediate variable.

.

.

em

392

XE~LLY AND BARCLAY h

el~~e2~~ e m Fro. 4. Multiple-input MBT model (Gettys-Willke model). The development will begin b y formally interpreting the general inference tree of Fig. 1 in terms of conditional independence conditions. Interpreted in this manner, a hierarchical inference describes a specific subset of the possible dependence relationships that can exist between n random variables .3 CONDITIONAL INDEPENDENCE ASSUMPTIONS The inference tree (V, _<) depicts dependencies between the random variables v~. However, to develop a model of the inferential process represented b y the tree, it is necessary to identify explicitly which of the possible dependency relations between n random variables are allowable. The choice of these allowable dependencies is dictated largely b y empirical properties of real-world hierarchical inference systems. Figure 2(a) will be used to illustrate the allowable dependencies. Let a path consist of a sequence of coverings (D ---~jk, e ''~j, e '''~, • • • , h) such t h a t D... ~ikis covered b y e...~J, e... iyb y e...~, and so forth. A subsequence (D .-.~sk, e ...~j. . . . ) of a path will be called a partial path. Then, the first set of assumptions requires that if v~ < vj then

{v~Iv~, ( . . . ) ,

[. • .], h} = {v'Iv~, ( . . . ) ,

h}

and

{~lv', (. • .), [. • .], h} = { v ~ ! ( . . . ) ,

h},

where (. , .) is the partial linking vj and h and [ . . . ] represents all other variables in the tree. This implies, for example, t h a t the reliability of an information source located in one branch of the tree is not affected b y activity in another branch, and that the probability of a given intermediate variable is affected only b y knowledge of the variables above it, not b y knowledge of those below it. In terms of Fig. 2(a), this has the following implications: Tribus (1970) shows that there are 12 possible dependencies between three random variables ~nd estimates that there are ~pproximately 3,000 for four random variables.

BAYESIAN IVIODEL FOR HIERARCtIICAL INFERENCE

393

{e~le~1, D m, D ~ , e ~, D ~ , D 2, h} = {ellh},

{eX~lel , D I1, D 112, e ~2, D 121, D ~, h} {e121e1, e m, D m, D 112, D 121, D 2, h} {DtnDX~2Ie~ , e ~, h, e ~, D ~x, D ~} { D ~ l d ~ , e ~, h, D m , D m , e ~, D ~}

= = -= =

{en{e ~, h}, {e12]e1, h},

{ D m D m { e m, e ~, h}, {D~[e ~, e ~, h},

and

{D21h, D m , D 112, e 11, e 1, e 12, D*2~} = {D=lh}. The second set of assumptions pertains to probability distributions of the form {e...~le...~,

. . . , h,

s}

and

{D...~Jkle ...~j, e ...~, . . . , h, a} relating the variables of any particular path. Typically, it will be assumed t h a t a form of Markov independence holds so that, for example {D~II01,

O , h,

5}

can be simplified to {Dmtem}. The third set of independence considerations determines how terms of the form

{D ....... ~~le .... , ( . . . ) ,

h, a}

{D ....... ; I { e .... , ( . . . ) ,

h, 5}

and

are combined when D ........ ~...1 and D ....... J.-.1 are related to node e .... through different partial paths. I t will be assumed in the model t h a t Dk ....... ~...1 and Dz ....... j...1 are conditionally independent given the variables (e .... , ( . . . . ), h, 5) and that the joint distribution is simply the product of the individual distributions. For example,

{D~I~D~2~Ie~, h, g} = {Dm[e 1, h, 5} X {Dm~le~, h, 5} eI

e1

{

/ \ ell

{

Dm

el2

1

Dm

ell

/ \

Dm

Dm

Last, a set of independence assumptions is required to determine how successive data directly linked to the same node are combined. Customarily,

394

KELLY AND BARCLAY

it will be assumed that they are conditionally independent given that node and its successors. For examFle, if two sources independently report about the same event { D n l D l 1 2 1 e l l , e ~, h, 5} =

{ D m I e n , e ~, h, 5} X

{D~I2Iell, e ~, h,

5}.

HIERARCHICAL MODELS The objective of a hierarchical model is to compute the posterior distribution { h i d ~..., . . . , D . . . . , 8 } .

The basic strategy in developing a model for any particular inference tree is, first, to use Bayes' theorem . . .

{ h i D ~... '

D .... ,5} = '

f2ff

{h[Sl{D 1"',. • . , D .... [h, 8} ; {01""," " " , D ......IX, 5}{X[8}dX

(1)

second, to use the partition theorem to obtain the likelihood { D ~ . . . , . . . , D . . . . [h,

S};

and, third, to use the dependency relations described b y the inference tree together with the assumptions given above (or some other set) to decompose the computation of the likelihood into a sequence of intermediate inferences, consistent with the partition of the state of knowledge 8. The derivations that follow will consider only how the likelihood is c 0 m p u t e d - - t h e second and third steps--since the subsequent application of Bayes' theorem is identical in all cases. A SINGLE-PATH TREE The development of a general hierarchical model will be preceded b y the derivation of a few special-case models to illustrate the general approach. Consider first the single-path inference tree shown in Fig. 5. By the partition theorem {Dl...l~I[h,

5} = f l . . -,~ " " " f , { D l ' ' ~ ! e l " ' ' ~ " ....

, . • " , e 1, h ' ~ } { e ~ ' " I " ' ellh, g } d e 1 . . . del.'.1%

(2)

where f is the general summation operator. 4 However, making use of the conditional dependency relations represented b y the tree and specified in the first set of assumptions above, Equation (2) can be rewritten to capture the sequential property that is essential to hierarchical inferences

{ D"'IJ"IIlz , S} = f ~ , { e i h~", 8} X " " " X f •

.

. , e .1, h , 5 }

f

......

........ { el"'"'-@ I"'A~-= ,

{D'...1,'[e'...I%...

, e 1, h, 5} {ex-'-l°ld-"--',

• . . , e 1, h, 8 } d e L " l ' d e

1'''*'-' . . . d e h

(3)

395

BAYESIAN ~IODEL FOR HIERARCHICAL INFERENCE h

1 e1

0 " " "ln'l

i

e 1. . . l n . l l

1..

n

1n 11n 1

Fro. 5. A single-path tree.

Making use of the second set of assumptions, the Markov-like independence assumptions, Eq. (3) m a y be written

1DIl°llh,

= £ , / e l l h , a/ x .

£ .....o_1/e'

• •×

l -q

• {Db~lel-..l~ }{d...l"iel...1--~}de~..-I× ; .....

. . . deL

(4)

Note that to use Eq. (4), estimates of {d...lqd...~-q

V i

must be made '~long with an estimate of { D ~ . . . 1 , ~ l d . . . 1 ~ I"

M U L T I P L E D A T A (EACH D A T U M IS D I A G N O S T I C OF A DIFFERENT INTERMEDIATE VARIABLE)

Although Eqs. (3) and (4) describe explicitly how one datum can be linked to h in successive steps, they do not allow for the inclusion of multiple data• Consider first the case shown in Fig• 6. Using the third set of independence assumptions, the model for this case is an extension of Eq. (3) {DI...1.1,

X

• • • ×

. . • , D ~2, D ~2, D 2 I h , ~} = [

{el.-.lo-qe ~-..~-.

{D~lh,

S}

fo {e~ih' s}

{ D ~ 2 1 £ , h,

. . e ~, h, S } i D ~ . . ' I - - ~ l d . . . 1 - - ' ,

g}

e ~.-.~-',

• . . , d , h , S } X _£...... {D"-"~l[£'"l 5 e 1..-1"-~, . • • , el, h, g} X

{ e l " " l " [ d "''~-~ • • • , e 1, h, g } d e 1.-1~ . . . d e L

(5)

Again, it m a y be possible to invoke the second set of independence assumptions to simplify Eq. (5). 4 For simplicity, it is assumed t h a t the variables h and e~].-- are discrete random variables. However, the subscripts describing the particular state of a variable will be suppressed until the section in which the general model is derived•

396

KELLY AND BARCLAY h

D 1, .lnl

D 1. . l n . ' / 2

D 112

. . . .

D12

D2

FIG. 6. Multiple data relevant to different variables.

M U L T I P L E D A T A D I A G N O S T I C OF T H E S A M E iNTERMEDIATE VARIABLE

Figure 7 illustrates a situation in which multiple data are relevant to the same variable. B y the fourth set of assumptions, the model for this case is given by Eq. (3) or (4) with the term {Dl""l"llel""i",

e 1"''1"-', . . . .

e 1, h, ~ } ,

replaced by Tt~

l]

{D~'"":]

e~'''~', e'l'"In-',

• • • , el, ]~, ~}.

j=l

If these assumptions cannot be made, then conditional likelihoods of the form {D1""l"~IDl""l"l,

e 1"'-1", . . . , £ , h, 8}, etc.

are required to compute {DI".l"lD

1'''i"2

. . . D1"'l"ml£'.'l",

. . . , e 1, h, 5 } .

h

I

el

ei.....1n

r~l

lni

D

n

D1""ln m

Fro. 7. Multiple data relevant to the same variable.

BAYESIAN

MODEL

FOR HIERARCHICAL

397

INFERENCE

A MODEL FOR THE GENERAL TREE The preceding special cases have served to illustrate t h a t hierarchical models are primarily decomposed forms of the partition theorem. However, because this decomposition can become very complex for the general inference tree, the general model will be derived using matrix notations. For this reason, subscripts will be given the h and e variables denoting particular states of these variables. It will be assumed that the random variable e ~jk.... has states 1, 2 , . . . , N~jk.... and h has states 1, 2 , . . . , No. SINGLE-PATH TREE A matrix procedure will first be developed for a single-path tree under various conditional independence assumptions. I t will then be extended to more complicated cases. Consider the tree in Fig. 7. Since each variable e in a single-path tree is the first branch from the next higher e node, the nth e node is denoted el...l~. Because the derivation of a model for the general case becomes v e r y complicated with respect to notation, for simplicity e1-..I- will be denoted as e ~ for the sir~gle-path tree. Similarly, since all data impact on the last or nth e node, D I...1-I will be designated as D I. Figure 7 can now be represented as illustrated in Fig. 8. C a s e 1. Assume that

(1) P ( D [ e k *,, e* ~-1, . . . , eq ~, hj) = P ( D l e ~ ~)

Vk,

I, . . . , q , j ;

m

(2) P ( D I e k ~') = ~

Vk;

P ( D ~ I e k ")

i=1

(3) P ( e k ~ l e ~ - ~ , .

. . , hi) = P ( e ~ l e , '-~)

w h e r e D = D 1, D 2,. . . , D ~. h

J I

e'l

e2

ca-1

l

ell

Fla. 8. Revision of Fig. 7.

Vi, k, l,.

. . , j,

398

KnLLY AND BARCLAY

Define the following matrices:

= [P(D]el"), P(Dle2n),... , P(Dle.¥j,)] P(el~Iel¢-l)p(o~[e~-i) . . " P(ei~leN,_~i-1)l P(e2~lelI-1) ~i,i--1

~

I

[_P(eN,¢[e~¢-1) H= _p(e.@lh~)-P(@lh~)P(@lh2)" " P(@[ha~°)l

Then the vector

[P(Dlhl), P(DIh2 ) . . . . , P(DIhNo)] ~E .... 1E,-1,,~-2 • • • E2,1H.

is equal to

This formulation does not, however, readily provide for an iterative calculation of P(D[hj). Therefore, making use of the fact that [AB]T =

BTA T, [P(Dlh~),... , P(DIhNo)] =

[Hr/~r~,~-~Er~_~,~_2, . . . , ~*[1]] r,

where :D* is the diagonal matrix

P(Dlez '~)

I

P(DIe~,-,~)_ and [1] is the unit column vector. Iteration is obtained b y multiplying the diagonal matrix b y the column vector:

-p(D~+llel~ )

_P(D~+I[eaTJ~)_

399

BAYESIAN MODEL FOR HIERARCHICAL I N F E R E N C E

a n d r e c o m p u t i n g t h e o u t p u t for each successive D~e D. T h e entries in each of t h e a b o v e m a t r i c e s m u s t b e s u p p l i e d b y t h e inference s y s t e m designer. T h o s e in E . . . . ~, . . . , H are e s t i m a t e d d i r e c t l y as given. T h e diagonal

matrix

is formed

by first estimating

D1

/

. . .

e~v~"~ L P ( D l l e x d

probabilities

for the matrix

])m

')

called t h e d a t a m a t r i x , a n d t h e n t a k i n g the p r o d u c t of t h e entries in each r o w to f o r m t h e d i a g o n a l entries. C a s e 2. A s s u m e n e x t t h a t (1) P ( D l e k % e y e , .

• • , hi) ~ P ( D l e k ~)

Vk,

Z, . . . , j ,

m

(2) P ( D l e ~ ,

e~-',...

, h~) /

II P ( D % ~ ,

ez~-~,...

,

a~)

i=l

V/c, 1 , . . . (3) P(ek~lez~-~, . . . , hi)

~ P ( e ~ q e z ~-~)

¥i,

,j,

It, l, . . . , j .

T h e first m a t r i c e s t h a t m u s t b e e s t i m a t e d u n d e r t h e s e a s s u m p t i o n s are of the form

el i-I

eNi_l i- 1

[

el i

P
.

.

o eNi i

• • -

(e¢-~,

. . . , e J, h~)

w h e r e (%~-2, . . . , eql, hj) d e n o t e s t h a t t h e e s t i m a t e P ( e k ~lez ~-~) is a c t u a l l y P(ek~[ez i-1, ep i-2 . . . . . , e~ 1, hi). I t is n e c e s s a r y to e s t i m a t e No N N1 X N~ X • " - X N~_2 m a t r i c e s for i = 1, 2, . . , n or a t o t a l of No ~- (No X N1) -t• ÷ (No X N1 X . . . X N~_2) m a t r i c e s in all. B e c a u s e of t h e r e l a x e d c o n d i t i o n a l i n d e p e n d e n c e a s s u m p t i o n 5 for t h e -

•

• ~ It may be the case that the Di's are conditionally independellt given e~-.~ for fixed (e ~-1,

. . . , eq 1, h ~ ) .

400

KELLY

AND BARCLAY

en r~l *., Dm (1,1,..,1) e n ...

et

Nn

en

en-11~ 1

NnI

(en1"1 ,...,e~,hI }

! I En~U .... 11

en-1 .l-Nn.i

(e1"2...... 11,h1)

e2 ... e 2 el

I

N21"*" !

)

e1 N* 1

(hl) .. "'..•.=

eI . .. e1

i ] HT

h

°"...% ..-* •.

e2

...

e2

oi-.r

N2

..•

l(hN o) •'"',....

N1

"'°"'"*"'°I

"~ (Nn'I'""NI'N0)

1

{en'l ,...,e1 ,h ) Nn-1 N1 NO FIG.

9. Matrix

array.

data sample, the data matrices which must be estimated are now of the form

~)(~, p , . . . ,

q, J) D1

D2

D m

= ez"IP(Dl[ez")P(D2]ef*'D~)...P(D"lef' 1,.... ,D D"-~)1 eN,=n

T h e r e a r e No X N I X • • • X of e s t i m a t e s is g i v e n in Fig. 9.

N,,_I

(el i - l , ev i-2, . . . , eq 1, h i ) .

of t h e s e m a t r i c e s . T h e c o m p l e t e a r r a y

401

BAYESIAN MODEL FOR H I E R A R C H I C A L I N F E R E N C E

D (I,1,.,,,I)

I nnl 1 P(DIIel' el '"'el' hl)

L

l

P(D1]e~]r"e/~l'""ell'h')

I I

I, ._,_

D N* =

~

,,

4 _ _ z_J.

I~{=,1,1 i ....... 1)1

It

-I:

......

T_ _ T__ _~_ 4I

I

I

I

I

rt nl n2 1 Pelle 1, e I ..... el.h I I ¢

P

L

¢

¢

n nl

n2

b..Nol

R.

I 1

j

t

~1

l...l

I el e2 'el '" " ..e,,h.) " -' J... II ~' I I I I I

EN~ =

I

I... I I I

I

n n I n-2 P e%l ° N ~ ° N ° ? I

" " 'eN1'hN o)

I J...

FIG. 10. D N* a n d E N* m a t r i c e s .

To begin the computation, the d a t a matrices are collapsed into column vectors, denoted/)(~,p ..... q,j) b y multiplying t h e row entries together. These column vectors t h e n b e c o m e entries in a (No X Nt X • • • X N.-1) b y (No X N1 X ' ' " X N~-I) matrix, D N*, in which t h e column v e c t o r /)(t,1 ..... 1) from ~(1,t ..... 1) occupies column 1, rows 1 t o N . ; the v e c t o r /)(3,1 ..... 1) f r o m ~)(2,1..... i) occupies colunm 2, rows ( N . 4- 1) to 2 N . ; a s d in general t h e v e c t o r /)(~,p ..... q,j) occupies column (l X p X • • • X q X j), rows(lXp X... XqXj1 ) N . 4- l t o ( I X p X--.Xq Xj)N.. N e x t a (No X N1 X • • • X N~_I) b y (No X Nt X • • • X N~-I) matrix E N* is formed J~rom the E ~ matrices. T h e first row of En(1,1 1) occupies row 1, columns 1 to N . ; t h e second row of E"(1,1,~..... 1) occupies row 2, columns N . + 1 to 2 N . ; and so on. T h e last row of E"(~._,,N._, ..... ~) occupies row (No X N1 X • • • X N~-I). T h e two matrices D ~¢* and E N* are shown in Fig. 10. A new matrix T N-~ is n o w c o m p u t e d b y t a k i n g the p r o d u c t of E N* and D N* .....

T N - 1 = EN*JDN*.

402

KELLY AND BARCLAY

This is the ( N o X N ~ X . • • XN~_O diagonal matrix shown below. -P(D[eg

-I, eg -2, . . . , @,

by

(NoXN~X"

• •

XN.-O

h~)

TN-1

P ( D ] e N ~ _ : ~ - I , e~v._~-2, . • . , eN~ 1, h~vo

B y shifting the first N~_I entries of T iv-1 left into column 1, the next N~_~ entries into column 2 and so forth, the matrix D 1v-1. is formed. The above process now repeats b y calculating TN-2

= EN-I*DN-I*

continuing until D o* has been generated. This is the desired output. Iteration is achieved in this model b y estimating P(D'~+I[ep,

D 1, . . . , D ~, . . .)

Vk

then recomputing the D N* matrix and recomputing D °*. It is apparent from the preceding discussion that a relaxation of all the conditional independence assumptions results in a substantial increase in the number of estimates which are required along with an increase in computational complexity. For these reasons considerable effort should be expended to formulate a problem so that the conditional independencies given in the previous case hold. In this regard, it m a y be possible to find an intermediate variable, given two conditionally dependent data, such that they can be treated as conditionally independent. This conjecture is based on a result from factor analysis, which states that a point can always be found from which two items appear unrelated. A situation favoring conditional independence will exist most frequently when causal relations are used to construct the inference tree. GENERAL TREE The final case to be studied is that of the general tree with the conditional independence assumptions of case 2 holding for each path in the tree. The algorithm will be developed for computing ( D 1, D 2, . . . ,Dmleq ~, • . .) at a single vertex in the tree at which several branches coalesce; generalization will be obvious. Assume that the particular partial tree in

BAYESIAN MODEL FOR HIERARCHICAL INFERENCE

403

h

I

/, Dnl

\ D ..,I,..

D .d...

FIG. 11. General tree. Fig. 11 is given and t h a t a matrix array has been generated for each p a t h up to the point of duplication. One of two conditions must hold at a branch point. Either there exists a direct link ~dth one or more data samples, for example (D '~1, e~), or there does not. If the latter condition holds, the problem of multiple branches is solved b y multiplying together the T ~v matrices associated ~-ith each branch incident to e~. If the former condition holds, the D N* matrices associated with the branches directly linking e~ to the data are diagonalized, multiplied b y the T N matrices associated ~dth the remaining branches, and the resultant product treated as T N. Iteration is accomplished b y recomputation of all matrices affected b y the introduction of a new d a t u m D . . . . + 1 . . . REFERENCES GET~YS, C. F., & WILL~:~, T. A. The application of Bayes' theorem when the true data state is uncertain. Organizational behavior and human per]ormance, 1969, 4, 125-141. KELLY, C. W. Further investigation of hierarchical Bayesian procedures. Technical Report B/XG-3382, IBM Corporation, Federal Systems Division, 1972. I(ELLY, C. W., ~5 PETERSO~', C. R. Probability estimates and probabilistie procedures in current intelligence analysis. Technical Report II/DAttC15-72-C-0136, IBM Corporation, Federal Systems Division, 1973. TRIBUS, M. Rational descr@tions, decisions, and designs. New York: Pergamon Press, 1970. p. 235.