Lattice theory approach to metastatic disease patterns in autopsied human patients: Application to metastatic neuroblastoma

Lattice theory approach to metastatic disease patterns in autopsied human patients: Application to metastatic neuroblastoma

PaIlern Re('o~lnitum VoL |8. No. 2. pp. 91 102. 1985. 003t 3203 g5 S300+ .00 Pergamon Press Lid 19~5 Pallern Rccogn,t,on SocicD Printed in Grea! Bri...

1MB Sizes 0 Downloads 75 Views

PaIlern Re('o~lnitum VoL |8. No. 2. pp. 91 102. 1985.

003t 3203 g5 S300+ .00 Pergamon Press Lid 19~5 Pallern Rccogn,t,on SocicD

Printed in Grea! Britain.

LATTICE THEORY APPROACH TO METASTATIC DISEASE PATTERNS IN AUTOPSIED HUMAN PATIENTS: APPLICATION TO METASTATIC NEUROBLASTOMA* G. WILLIAM Mooge,l" GP,OVER M. HUTCHINS and SUZANNE M. DE LA MONTE Department of Pathology, The Johns Hopkins Medical Institutions, Baltimore, MD 21205, U.S.A.

(Received 21 December 1983; in revised form 5 April 1984; received for publication 2 May 1984) Abstract--lt is recognized that subsets of some morphologically homogeneous tumors may have different metastatic distribution patterns. Lattice theory is a multidimensional generalization of the cube. If each autopsied patient is regarded as a 'snapshot' (lattice corner), then each lattice edge represents an event of metastatic spread. A "minimum entry' construction program was applied to the 40 neuroblastoma patients with complete autopsies at The Johns Hopkins Hospital. This lattice has the inductive property that reconnecting a single pathway does not decrease its length. Subset patterns in patients identified by this method may have diagnostic and therapeutic significance. Lattice theory

Cancer metastases

Neuroblastoma

INTRODUCTION

Clinicians and pathologists have long recognized that different types of malignant tumors have different patterns of metastases, (1 .z) and even some morphologically homogeneous tumors may separate into important subsets based upon metastatic distribution.( 3-s~ Analysis of the progression of metastatic disease is made difficult because: (i) properties of both the primary tumor and the target metastatic sites may influence the patterns of metastatic spread; (ii) patients may be examined at different stages of disease progression ; (iii) available data are qualitative or at best semiquantitative in nature. If each autopsied patient is regarded as a 'snapshot' of the metastatic process, then it may be possible to infer the progression of such snapshots. The starting point is the frame with no metastases, and successive frames contain more inclusive sets of metastatic sites. Branching might occur, so that an early frame might eventually separate into two or more distinct sequences of metastatic spread. This paradigm of metastatic spread can be represented using the mathematical theory of lattices. Lattice theory is a mathematical generalization of the cube measuring one unit in each dimension,c6.v) In the three-dimensionalcase we can think ofeach dimension of the cube as corresponding to a metastatic site; in Fig. 1, for example, (x, y, z) = (lung, liver, brain). Each corner (ordered triple) on the cube is then a snapshot of

* Supported by N1H Grant LM 03651 from the National Library of Medicine. "t Address corn~pondcnc¢ and reprint requests to Dr. G. William Moore,, D=partm¢nt of Pathology, The Johns Hopkins Hospital, Baltimore, MD 21205, U.S.A. PR

IH:2-A

Partially ordered set

Tree diagram

metastatic disease in a single patient at a single point of time; for example, (1, I, 0) denotes a patient with

:(0,1,1)

(0 I,Ol-,,"

(i,t,i) A

;

'

:'o::o,

--"

I

F x = lung

(0,0,0)

(B)

(0,1,1}

(I,1,0)

Fig. 1. (A) A three dimensional lattice diagram with dimensions for the metastatic sites x = lung, 3' = liver and z = brain. Corners represent single 'snapshots' of metastatic involvement and edges represent events of progressive metastatic spread. The origin, (x, y, z) = (0, 0, 0) at the left lower forward corner, represents the 'primary only' corner or patients with no metastatic disease. The ultimate destination, (x, 3,, z) = (1, 1, l ) at the right upper rear corner, represents the patient with metastases in all three sites. The boldface arrows (--,,)depict a progression of metastatic spread beginningin the liver and going into the brain, i.¢. (0, 0, 0) ~ (0, 1,0) --. (0, 1, 1), and a second progression beginningin the liver and going into the lung, i.¢. (0, 0, 0) -.~ (1, 0, 0). (B) The boldface arrows on the unit cube in (A) are represented as a two-dimensional tree diagram.

92

G.W. MOORE, G. M. HUTCHINSand S. M. DE LA MONTE

metastases to the lung and liver but not brain. Each edge of the cube represents an event of progressive metastatic spread ; for example, edge (0, 1, 0) ~ (1, 1, 0) represents the event in which a patient with existing metastases to the liver acquires metastases to the lung. A system of n metastatic sites is represented analogously as the n-dimensional unit cube. It is reasonable to view progressive metastatic spread as a unidirectional process, i.e. (0, 1, 0) ~ (0, 1; 1), but never (0, 1, 1) ---, (0, 1, 0), and furthermore as a process which takes place with the fewest number of intermediate steps. Thus, if patients (0, 0, 0), (0, 1, 0) and (0, 1, 1) are known, then it is more reasonable to postulate the sequence of edges (0, 0, 0) --* (0, 1, 0) --, (0, 1, 1) (two events of progression) than to postulate the distinct sequences, (0, 0, 0) --* (0, l, 0) and (0, 0, 0) -* (0, 0, 1) --* (0, l, 1) (total of three events of progression). A sequence of progression on a multidimensional unit cube (Fig. ICA)) can be represented in two dimensions as a tree diagram (Fig. I(B)). Reasonably large, finite lattices can be calculated using hierarchical computing languages and represented diagrammatically, ( 7 - ~ and there is already a substantial experience with applications of these methods in evolutionary biology. (~-~9~ We present herewith a lattice theory model for metastatic disease in autopsied patients, with an application to the series of patients with metastatic neuroblastoma in the autopsy files of The Johns Hopkins Hospital. This lattice theory method may be able to identify subsets of patients with diagnostic and therapeutic significance.

lATTICE THEORY It is convenient to construct our theory of progressive metastatic spread from elementary (Zermelo-Fraenkel) set theory and from a finite set ol autopsy observations, X. We employ the usual set theoretic conventions: ~ for the empty set, e for set membership, ~ for subset, u for union, c~ for intersection and - for set subtraction. It is useful to have the following additional notation : ~" for powerset, i.e. ¶x is the set of subsets ofx, and ¢ for cardinality, i.e. Cx is the number of members in x. The set X is the set of sets of metastatic sites observed in one or more patients. It is assumed that one of these sets must be empty, i.e. there are some patients who die with no metastases,so that ~ ~ X. Every other member of X is a non-empty set of metastatic sites. For example, it might be the ease that X = { ~ , xl, x~, xs}, x l = {liver}, x z -~ {liver, pancreas, gallbladder}, and xa = {lung, brain}. We assume that the number of patients is finite, so that CX < oo, and that the number of metastatic sites is finite, so that for every x • X, Cx < ~ . We observe the convention that Xo = ~ and the other members of X are listed in ascending order of cardinality, i.e. X = {Xo..... x,} andCx| < ¢ x l i f 0 < i < j < n. F o r every subset Y ~ X, we define the lattice on Y. denoted L Y , as the powerset of the union of Y, i.e.

• ( ~ Y) Or more compactly " w Y. Thus, if Y = I ~ , xl, xa}, then w Y -- Xo w xl w xa = {liver, lung, brain} and ~ u g = { ~ , {liver}, {lung}, {brain}, {liver, lung}, {liver, brain}, {lung, brain}, Y}. Definition I (Lattice on Y ). Let Y c_ X. Then the lattice on Y, denoted L Y , is L Y = ~. w Y. F r o m elementary set theory, it is a property of the powerset operation that for every r, s ~ ~Eu Y, r c~ s e q' w Y a n d r u s e ¶! w Y, where r r~siscalledthemeetofr and s (most recent common ancestor) and r w s is called the join of r and s (most immediate common descendant). This satisfies the usual definition of lattice. (6) It is convenient to define the lYathset of Y, denoted PY, as the set of ordered pairs (r, s) such that r, s • L Y and r c s. Definition 2 (pathset of Y ). Let Y c_ X. Then the pathset of Y, denoted P Y, is the set P Y = {(r, s)J s ~ L Y a n d r = s}.

The lattice has the property that if(r, s) and (s, t) are paths on Y, i.e. members of PY, then so is (r, t). This transitivity property of the pathset is proved in Theorem 1. In speaking of path (r, s) we call r its origin and s its destination. Theorem 1. Let Y c_ X and (r, s), (s, t)e PY. Then (r, t) • PY. Proof. By Definition 2 ofpathset, r c s c_ w Y and s c t c_ ~ y. By elementary set theory, r c t ~ u Y. By Definition 2, (r, t) • PY. End of Proof. Theorem 2 demonstrates that for every path in the lattice, the subpath formed by removing the leading edge also belongs to the pathset. Theorem 2. Let Y c_ X. Then for every (r, s) • P Y such that a • (s - r) and ¢(s - r) >/2, (r, s - {a}) e PY Proof. By Definition 2 of pathset, r c s ~ w Y, By elementary set theory r c s - {a} _ ~ Y ira ~ (s - r) and ¢(s - r) > 2. By Definition 2, (r, s - {a}) ~ PY. End of Proof. A tree is a set of paths on a multidimensional lattice which can be represented as a two-dimensional diagram. Any two lattice points on the trees have a meet, but do not have a join. In other words, any progression of metastases must begin at a common origin of no metastases, but once the progression divides into two separate pathways, it is not permitted to rejoin. We define a tree on Y as a set of non-overlapping paths on the lattice Y such that two distinct origins never meet at the same destination. Definition3 ( tree on Y ). Let Y ~_ X.Then T ~_ P Y is a tree on Y if and only if: (i) for every (s, t)• Tsuch that s # ~ , there exists an (r,s)• T ; (ii) for distinct (r, s), (r, t) T there exists no u such that r c u c_ s a n d r c u c_ t; (iii) for every y a Y - {~} there exists an ( r , y ) • T . Theorem 3 demonstrates that every tree on Y has an ultimate origin at 0 , the prin~ry only corner. Theorem 4 states that a tree on Y is also a tree on Y w {x} if (2, x) or (x, q) already belongs to the tree. Theorem 5 states

Lattice theory for metastatic disease patterns

(A)

o=¢(t-s)/

.?v

~,e=¢(w-u)

CB) /"

W

Fig. 2. (A) Consider edges (s, t) and (u, w) on any tree (Definition 5, Theorem 7). The local parsimony concept considers any modification of the tree in which edge (s, t) is unhooked and corner t is reattached to comer v to form edge (r, 0. (B) A sample tree in which this modification is performed.

that a tree on Y becomes a tree on Y u {x} if(p, x) and (x, q) are added and (p, q) is removed. Theorem 6 states that a tree on Y becomes a tree Y w {x} if (p, x c~ q), (x n q, q) and (x n q, x) are added and (p, q) is removed. Theorem 3. Let Y ~ X and T be a tree on Y. Then there exists a t ~ u Ysuch that (O, t) • T. Proof Consider a minimal u such that {t, u} • T a n d t 6: ~ . By Definition 30) there exists an (s, t) • T. By Definition 2 of pathset, s : t, so that u is not minimal unless s = ~ . End of Proof. Theorem4. L e t x • X - { ~ } , Y _ X - {x}and T b e a tree on Y. If(p, q ) • T s u c h t h a t p = x or q = x, then T is a tree on Y u {x}. Proof Definition 3(i)-(ii) are satisfied as before and Definition 3(iii) is satisfied for y e (Y u {x}) - {O} if y x. Otherwise y = x. If (p, x) • T, then Definition 3(iii) is satisfied. If (x, q) • T, then since x #: O by hypothesis, by Definition 30) there exists an (r, x') e T. End of Proof. Theorem.$. L e t x • X - {~}, Y E X - {x} and T b e a tree on Y. If(p, q ) • T such that p c x c q, then V = [ ( T - {(p, q)}) u (p, x), (x, q)}] is a tree on Y u {x}. Proof Part (i). Consider ( p , x ) • V. lfp #, O, then by Definition 30) there exists an (r, p) • Tc~ V. Consider (x, q) ~ V. Then (p, x) • V. Part (ii). Consider distinct ( p , x ) • V a n d ( p , t ) ¢ T a n d s u p p o s e t h a t p c u c_x c q and p ,- u c t; contradiction of Definition 300. Consider distinct (x, q) • V and (x, t) • T and suppose that p c x ~ u c_ q and p ~ x c u c_ t; contradiction of Definition 3(ii). Part (iii). Consider any y ~ (Y { x } ) - {O}. I f y # x, then by Definition 3(iii) there exists an r such that (r, y) 6 T f~ V; otherwise y = x and (p, x) • Y. End of Proof. Theorem 6. Let x • X - { f~}, Y c_ X - {x} and T be a tree on Y and let ¢ (x - q) be minimized over all (p, q) • T such that p __qx. Then V -- [(T - {(p, q)}) w {(p, x r~ q), (x c~ q, q), (x f~ q, x)}] is a tree on Y u {x}.

93

Proof If¢(x - q) = 0, then x _: q and Theorems 4 and 5 complete the proof. Part (i). Consider (p, x c~ q) • V. Ifp #: O, then by Definition 3(i) there exists an (s, p) • T n V. Consider (x ~ q, q) • VeT (x c~ q, x) • V. Then (p, x n q)~ V. Part (ii). Consider any (p, t ) e T and suppose t h a t p c u c _ ( x n q ) : _ q a n d p c u _ t ; contradiction of Definition 3(ii).. Consider any (x c~ q, t) • T and suppose that p _: (x n q) ~ u ~ q and p ~_ (x c~ q) c u ~ t; contradiction of Definition 3(ii). Consider any (x c~ q, t ) • T and suppose that p c (x c~ q) c u ~_ x and p _~ (x n q) c u _ t. By elementary set theory (x - t) c_ (x - u), (x - u) c (x - (x ~ q)) and (x - (x faq)) = (x - q). Therefore, ¢(x - t) _< ¢(x - u) < ¢(x - (x c~ q)) = ¢(x - q ) a n d ¢(x - q ) i s not minimal. Contradiction of hypothesis. Part (iii). Consider a n y y • ( Y u {x}) - { ~ } . I f 3 , ~ x , then by Definition 3(iii) there exists an r such that (r, y) • T V ; otherwise y = x and (x n q, x) E V. End of Proof. Tree parsimony In this paper we use a tree to describe the presumed pathways of metastatic spread. This is done by constructing a tree whose final destination points are all members of X, and such that every member of X is traversed by some pathway on the tree. We further seek to link together autopsy observations in a compact fashion, where no unnecessarily long pathways are present on the tree. The cost of a tree, denoted CT, is the sum of path lengths :it traverses on the lattice. We say that a tree on X is totally parsimonious if any other tree on X does not have a lower cost. ~ o-~ 6) However, solutions for totally parsimonious trees in general require an astronomical number of calculations."9) The more limited property of local parsimony requires that unhooking a single pathway and reconnecting it somewhere else does not decrease the cost of the tree.(ls) Definition 4 (cost). Let Y :_ X and T be a tree on Y. Then the cost of T. denoted CT, is

CT=

E ¢(u-tl. U.u) ~ T

For the local parsimony condition, we consider the distinct edges (s, t) and (u, w) shown in Fig. 2, where s u ~ t. Ifthereexistsa v, u _ v ~ w a n d v c_ t such that removing (s, t), (u, w) and adding (u, v), (v, t), (v, w) to the tree decreases its cost, then the tree is not locally parsimonious. If for every such v the modified tree has no less cost than the original tree, then the original tree is locally parsimonious (Definition 5). Since ¢(u, v) + ¢(v, w) = ¢(u, w), it is apparent that local parsimony is equivalent to the condition that a < c for every v (Theorem 7). Theorems 8 and 9 demonstrate that a m i n i m u m entry tree is locally parsimonious if every step of the m i n i m u m entry construction procedure satisfies the local parsimony condition. The proofs for Theorems 8 and 9 are easier to follow if one draws diagrams like Fig. 2. Definition 5 (locally parsimonious ). Tree T is locally parsimonious ifand only if for (s, t), (u, w)• T and v c_ t

94

G.W. MoogE, G. M. Hc'rcrnNs and S. M. DE LA M~NTli

such that s :/: u :/: t a n d u _co ~ w, and S = [(T - {(u, w), (s, t)}) w {(u, v), (v, w), (v, t)}] ifu ~ v g: w, otherwise S = [ ( T - {(s, t)}) u {(v, t)}); it is the case that CS > CT. Theorem 7. Let Y c_ X and T be a tree on Y with s, t, u, v, w as given by Definition 5. Then T is locally parsimonious if and only if ¢(t - s) < ¢(t - o). Proof. Consider any s, t, u, o, w as given in Definition 5. L e t a = ¢ ( t - s ) , b = ¢ ( v - u ) , c = ¢ ( t - v ) , d = (w v) and e = ¢(w - u). Since u _c t, _c w (Definition 5), by elementary set theory e = ¢(w - u) = ¢(e - u) + ¢(w - v) = b + d. Then CS + a + e = C T + c+b+d, CS=CT-a-e+c+b+d=CTa + c a n d CS < C T if and only if a > c,i.e.¢(t - s) > ¢(t - v). By Definition 5, T is locally parsimonious if and only if¢(t - s) < ¢(t - v). End of Proof. Theorem 8. Let x • X - { O}, Y c X - {x} and T be a locally parsimonious tree on Y. If(p, q) • T such that p c x = q, then V = [(T - { ( p , q ) } ) u { ( p , x ) , ( x , q ) } ] is locally parsimonious. Proof. Consider any (s, t), (u, w) • V and v c_ t such that s :/: u # t and u __qo c_c_w, as given by Definition 5, and suppose that V is not locally parsimonious. Case I i ( s , t), (u, w ) • T c~ V and ¢(t - s) > ¢(t - v) for some v. Then by Theorem 7 local parsimony is not satisfied for T. Contradiction. Case 2--(s, t) = (p, x). Since u # t = x, (u, w) # (x, q) and (u, w) • T. By Theorem 7, ¢(x - p) = ¢(t - s) > ¢(t - v) = ¢(x - v) for some t,, so that ¢(q - p) --- ¢(q - x) + ¢(x - p) > ¢(q - x) + ¢(x - v) = ¢(q - v). Contradiction of local parsimony for T. Case 3 I ( S , t) = (x, q). Then ¢(q - x) = ¢(t - s) > ¢(t - o)and ¢(q - p) = ¢(q - x) + ¢(x - p) > ¢(q v).If(u,w) = (p,x), then v = x c q a n d ¢ ( q - x) < ¢(q - v); contradiction. Otherwise (u, w) • T and local parsimony is not satisfied for T ; contradiction. Case 4---(u, w) = (p, x) and (s, t ) • T. Then for some o, v ~ t, p __qv ___x ,-- q and ¢(t - s) > ¢(t - v). Contradiction of local parsimony for T. Case 5--(u, w) -- (x, q) and (s, t) e T. Then for some v, v __qt, p ~ x __qv __qq and ¢(t s) > ¢(t - v). Contradiction of local parsimony for T. End of Proof. Theory9. L e t x • X - {.~}, Y _q X - {x} and T b e a locally parsimonious tree on Y. Let ¢(x - q) be minimized over all (p, q) • T such that p ~ x and V = [(T - {p,q}) u {(t,x n q ) , ( x c~q,q),(x c~q,x)} ].If V is locally parsimonious for ix c~ q, x) = (s, t) or ix n q, x) = (u, w) in the sense of Definition 5, then V is locally parsimonious. Proof. If ¢(x - q) = 0, then x __. q and Theorem 8 completes the proof. Otherwise, consider any (s, t), (u, w) • V and t, _q t such that s ~ u ~ t and u _~ v ~_ w, as given by Definition 5, and suppose that V is not locally parsimonious. Case l ~ ( s , t), (u, w ) e T c~ V and ¢(t s) > ¢(t - o) for some o. Then by Theorem 7, local parsimony is not satisfied for T. Contradiction. Case 2--(s, t) = (x c~ q, x) or (u, w) = (x r~ q, x). By hypothesis, V is locally parsimonious. Case 3 ~ ( s , t) = (p, x r~ q). Since u ~ t = x r~ q, (u, w) ~ (x c~ q, q), (u, w) (x n q, x) and (u, w) ~ T. By Theorem 7, there exists a

v such that t' ~ x n q c q and u ~ r __qw and ¢((x n q) - p) = ¢(t - s) > ¢(t - t') -- ¢((x c~ q) - t,). Then ¢(q -

p) =

¢(q

-

( x c~ q ) ) +

¢((x

r~ q ) -

p) >

¢(q -

( x c~

q)) + ¢((x n q) - v) = ¢(q - v). Contradiction of local parsimony for T. Case 4--(s, t) = (x c~ q, q). If(u, w) = (x ~ q, x), then see Case 2. If(u, w) = (p, x ~ q), then there exists a t' such that p c_ t' _c x c~ q c q and ¢(q x)=¢(q(xc~q))=¢(t-s)>¢(tt , ) = ¢ ( q - r), and thus (x c~ q) c v; contradiction. Therefore (u, w) • T and there exists a t, such that t, c_ q and u __q r ~ w and ¢(q - p) > ¢(q - (x c~q)) = ¢(t - s) > ¢(t - r) = ¢(q - v). Hence T is not locally parsimonious, contradiction. Case 5 I ( U , w) = (p, x c~ q}and (.s, t}• T. By Theorem 7, there exists a v such that t; _ t and p c_ t' __qx ~ q c q and ¢(t - s) > ¢(t - t,). Contradiction of local parsimony for T. Case 6---(u, w) = (x c~ q, q) and (s, t) • T. By Theorem 7, there exists a t, such that t, c_. t a n d p = (x c~q) __ v c _ q a n d ¢ ( t s) > ¢ ( t - v). Contradiction of local parsimony for T. End of Proof. Computer program in M U M P S M U M P S (Massachusetts General Hospital Utility MultiProgramming System) is a dynamic hierarchical computing language which is used largely within the medical computing community. 12°1 M U M P S has several features which make it particularly suitable for handling medical data, such as patient names, other qualitaiive identifiers, test results and diagnoses, and these features have resulted in the widespread use of M U M P S for medical applications, such as large, clinical laboratory computing systems/~-~31 Unlike scientific languages such as F O R T R A N , PL/I and BASIC, M U M P S has no cumbersome D I M E N S I O N statements, and character string data are handled with the same ease and versatility as numeric data. Sorting is extremely easy in M U M P S : data are simply read into an array in arbitrary order and when the array is read out again, the data emerge in sorted order. Debugging is easier in M U M P S than in compiled computer languages because the entire program does not have to be syntactically correct before it begins to execute. Rather, the program simply executes until it reaches the first incorrect c o m m a n d and displays the values of all variables when it stops. M U M P S is compact. Most M U M P S programs, such as the lattice program for the present paper, can be written on one or two sheets of paper. Finally, M U M P S is easy to learn. The selected M U M P S commands and functions, shown in Tables I and 2, can be learned in a few hours and are sufficient for writing most M U M P S programs, including all the computing steps in Table 3. The dynamic storage and implicit sort features make M U M PS especially attractive for lattice theory applications. Arbitrary subsets of the lattice can be built up with the SET command or removed with the K I L L command, without regard to whether the storage has been allotted in a previous dimension statement. Pieces of the lattice are implicitly sorted into their correct position as soon as they are constructed.

Lattice theory for metastatic disease patterns

95

Table 1. M U M P S c o m m a n d s BREAK DO F O R X: Y: Z GOTO X HANG X 1FX KILL X QUIT READ X SET X = Y WRITE XECUTE X

Interrupts program Initiates program execution or moves execution to line indicated Causes repeated execution of the remaining line, starting with value X, incrementing by Y and ending at value Z Moves execution to line X Suspends program execution for X seconds Executes remaining line if X is true Removes variable X from memory Exit point for D O , FOR or X E C U T E c o m m a n d s Inputs value of variable X Assigns the new value of X to the old value of Y Writes out format controls (next line, next page, etc.), character strings and variable values Character string X is executed

Table 2. M U M P S functions SDATA(X) SEXTRACT(X, S T A R T , E N D ) SLENGTH(X) W = $ORDER(X(Y)) SPIECE(X,D,N) X+Y X-Y X,Y X/Y X[Y

Returns 0 if X has no data, 1 if X has data Returns character string between positions "START" and " E N D " within string X Returns n u m b e r of characters in string X Returns next subscript after X(Y), i.e. X(Y) immediately precedes X(W) Returns the Nth piece of a string X, as demarcated by delimiter D on either side X plus Y X minus Y X times Y X divided by Y X contains character string Y

Table 3. M U M P S program for performing cluster analysis by the m a x i m u m parsimony method in lattice theory VMINTR ; GWMOORE--M1N1MUM ENTRY TREE--NEUROBLASTOMA; ENTRY K SC=-I SUUU =0STOTCH=0SM=0SAFF=$C(12JSH=0SDAT=$ZD($H)SV=-I K -PG K -PK K -PQ K -PR K -PP; KILL GLOBAL INTERMEDIATES S N O = 6, VVV = " " F F = 1 : 1 : 6 S VVV -- VVV_"000COOCO00000CO0" S - V M I N T R ( F ) ---"" TTL S - V M I N T R ( 1 ) -- A F F _ " M I N I M U M E N T R Y T R E E - - N E U R O B L A S T O M A " _ D A T S - V M I N T R ( 4 ) ~- " C H A R A C T E R S U S E D : " CHR S V = $O(-VMTS(V)) G : V = " " Z R O S H = H + 1 S -VMTS(V) = 0 S N O = N O + 1 S - V M I N T R ( N O ) ="'" V S - P K ( H ) = V G C H R ; W R I T E C H A R A C T E R ARRAY S V= - 1 SHF=HS HH=0SWWW = SE(VVV,I,HF) S I = - I ZRO RCL S I = $O(~/CSN(1)) G:I = " " M T S S-PQ(0,0,1) = w w w S M = M + 1 G RCL; READ CASE LIST MTS S V = $O(-VMTS(V)) G : V = " " N W C S HH = HH + 1 S S = - 1 S V C H ( V ) = 0 MDS S S = $O(-'VMDSN(V,S))G:S = " " M T S G:$D(-PQ(0.0.S)) = 0 M D S S RR = -PQ(0,0,S} S R ="" S:HH > 1 R = SE(RR,I,HH - 1) S R = R_"I" S:HH < H F R = R_$E(RR,HH + I,HF) S -PQ(0,0,S) = R S VCH(V) = VCH(V) + 1 S T O T C H = T O T C H + 1 G M D S NWC F F = 1:1:6 S N O = N O + 1 S - ' V M I N T R ( N O ) = " " ; LIST I N P U T CASES S M P = M + 1 S MR = M P S I I = 1 S J = - I S K K = 0 S - V M I N T R ( N O - 2 ) = A F F _ " I N P U T CASES:'" S J = $O(-PQ{0,0J)) G:J = " " ASC S SO = 0 S R = -PQ(0,0,J) NOM F K = 1:1 :HF S:$E(R,K,K) = " 1 " SO = SO + 1 S - P Q ( 0 , S O , J ) = - P Q ( 0 , 0 J ) G:SO < I EX G N O M ; EXIT IF N O M E T A S T A T I C SITES S KK = $O(-PQ(0,KK)) G : K K = " " P O N S JJ = - I; M E T A S T A T I C SITES IN ASC O R D E R ASC S JJ = $O(-PQ(O.KKJJ)) G:JJ = " " ASC S il = II + 1 S-PP(ll) = - P Q i 0 , K K J J ) LSC S -PQ(I1,KK,JJ) = - P P ( I I ) S N O = N O + 1 S --VMINTR(NO) = SJ(II,8LSJ(KK,8)_$JOJ,8) G LSC K -PQ(0) S -PQ(I,O,O) = w w w s - P P ( I ) = w w w s -PR(1,2) = " " ; P R I M A R Y O N L Y N O D E PON S I = 2 S U U U = - 1 S U U U -----$O(-PQ(2,UUU)) GNC S I = 1 + 1 S U S = 3 , H F G:I > M P W T L S R = - P P ( 1 ) S J = - 1; G E T N E X T CASE PND S J = $O(-PR(J)) G:J = " " N W B S JJ = - 1 ; P A R E N T N O D E DND S JJ = $O(-PR(J,JJ)) G:JJ = " " P N D S Q = -PP(JJ) S P = - P P ( J ) ; D A U G H T E R N O D E S U=0 S O="" F G=I:I:HF S RR=+$E(R,G,G) S QQ=+$E(Q,G,G) S PP=+$E(P,G,G) G:PP>RRDNDSOO-RRS:QQ=0OO = OS:OO' = RRU = U + ISO = O_OO G : U ' < U S D N D S O S = O S US = U S JS = J S JJS = ] J S O P = P S O Q = Q S O R = R G D N D NWB S MR = M R + 1 S - P P ( M R ) = O S S U U U = U U U + U S K -PR(JS,JJS) S - P R O S , MR) = " " S - P R ( M R J J S ) = " " S - P R ( M R , I ) = ~ " S LPN ~ - I ; A D D N E W B R A N C H T O T R E E S L P N = $ O ( - P R ( L P N ) ) G :LPN = " " G N C S L D N = - 1 ; L O C A L P A R E N T N O D E LPN LDN S LDN = $O(-PR(LPN,LDN)) G:LDN ="" LPN; LOCAL DAUGHTER NODE S LPC = -PP(LPN} S LDC = -PP(LDN) G:OS = LPC LDN G : L P C = OR A L T F F = 1:1 :HF S L -- + SE(LPC,F,F) S R = + SE(OR, F,F) G : L > R A L T

(Cor.tinued over.)

96

G, W. MOORE,G. M. HUTCHINSand S. M. DE L^ MONTE

Table 3 continued.

ALT

WTL

WTOP WTOQ WTOR

WRTR WRVR

WRVS CNDS CNDT CNDU CNDV WRCN WRCO WRCP EX

S D F = 0 F F = I : I : H F S L = +$E(LDC, F,F) S R = +$E(OR,F,F) S V = R S : L = 0 V = 0 S S= +$E (OS, F,F) S:V < R Dl- = DF + I S:S < R DF = D F - 1 I DF < O S NO = NO + 1 S -VMINTR(NO) ="PARSIMONY ERROR A A T " _ I_" FOR "_ DF G LDN G : LDC = OS LDN F F = 1:1 :HF S L = + SE(LDC, F,F) S S = + SE(OS, F,F) G:S > L LDN S D F = 0 F F = I : I : H F S L = + $E(LDC, F,F) S R = +$E(OR,F,F) S V = R S : L = 0 V = 0 S S = + SE (LPC,F,F) S : V < L D F = D F + 1 S:S MP WRTR S KA = - 1 S KB = - 1 S KA = $O(-PQ(KK,KA)) S K B = $O(-PQ(KK,KA,KB)) S KK = KB S -PG(K,KK) = EX S --VMINTR(NO) = SJ(K,8)_" ANCESTRAL TO "_$J(KK,8)__" "'._EX G WTOQ S V = - 1 S NO = NO + 1 S -VMINTR(NO) = "" S NO = NO + 1 S -VMINTR(NO) = "" S NO = NO + 1 S -VMINTR(NO) = A F F " TRANSITION TOTALS: "__UUU_"/'_TOTCH .... DAT S NO = NO + 1 S --VMINTR(NO) = "'" S V = $O(-VMTS(V)) G:V = " " CNDS S NO = NO + 1 S -VMINTR(NO) = " "_V_$J(-VMTS(V),8)_"/"_VCH(V) G WRVS S K = - 1 ; CONDENSE TREE TOPOLOGY S K = $O(-PG(K))G:K = " " WRCN S KK = - 1 S KK = $O(-PG(K,KK)) G:KK = " " CNDT S E =-PG(K,KK) G:E' = .... CNDU G:KK > 1000 CNDU K -PG(K,KK) S KL = - 1 S KL = $OCPG(KK,KL)) G:KL = " " CNDS S E = -PG(KK,KL) K -PG(KK,KL) S -PG(K,KL) = E G C N D V S NO = NO + 1 S -VMINTR(NOI = " " S NO = NO + ! S-VMINTR(NO) = AFF_"CONDENSED TOPOLOGY:" S K = - I; WRITE CONDENSED TOPOLOGY S K = $O(-PG(K)) G:K = " " E X S K K = - I S K K = $O(-PG(K,KK)) G:KK = " " W R C O S NO = NO + 1 S -VMINTR(NO) = SJ(K,8L" ANCESTRAL TO"_$J(KK,8L-PG(K,KK) G WRCP S NO = NO + 1 S -VMINTR = NO W Ill, !!!, "EXECUTION COMPLETE"

Finally, the debugging and b o o k k e e p i n g chores are m a d e much easier by the use of m n e m o n i c s (LIV, L N G , LYN, etc.), rather than numbers, to designate metastatic sites. D a t a were expressed in the form of m n e m o n i c codes a n d typed and proofread on a R a y t h e o n VT1303 video display based w o r d processor with c o m m u n i c a t i o n s software. D a t a were transmitted in a s y n c h r o n o u s ASCII code by dial-up or direct line to a Digital E q u i p m e n t C o r p o r a t i o n P D P 11/70 minicomputer with M U M P S operating system and p r o g r a m m i n g language in the D e p a r t m e n t of L a b o r a t o r y Medicine of The J o h n s H o p k i n s Medical Institutions. The ' m i n i m u m entry tree' p r o g r a m given in Table 3 was applied to the data. Cases were sorted in ascending order of degree o f metastatic involvement in those a n a t o m i c sites in which t u m o r was actually present. A 'primary-only' corner represented cases with n o metastases. Successive cases in the sorted list were attached either directly to the primary-only corner, or to a previously placed corner already on the tree diagram. The criterion used to decide where to place each subsequent case on the tree diagram was that a

m i n i m u m additional metastatic load be added to the tree in linking the new case to the pre-existing tree. Unlike evolutionary tree construction methods, in which it is legitimate for a metastatic site ('character') to go from ' m o r e t u m o r ' to "less tumor" between ancestor case a n d d e s c e n d a n t case ('back replacement'), this possibility was strictly forbidden in the present method. (1 ~) Table 3 gives the M U M P S p r o g r a m for constructing a m i n i m u m entry tree, " V M I N T R ( ) . A sample d a t a set for testing this p r o g r a m can be reconstructed from t h e information displayed in Fig. 3. There are three input global variables which feed into program --VMINTR. These are: " V C S N ( ) , the list of valid case numbers; "-VMTS(), the list of valid metastatic sites; --VMDSN(,), the observed metastatic distribution, where the first subscript is a metastatic site from list - V M T S ( ) and the second subscript is a case n u m b e r from list - - V C S N ( ) . Thus '-VCSN(50500) = " " means that 50500 is a valid case number; --VMTS("LIV4 + " ) = " " means that "LIV4 + " is a valid metastatic site; - " V M D S N ( " L I V 4 + ' , 5 0 5 0 0 ) = " " means that case 50500 has 4 + liver metastases. Global array - V M -

Lattice theory for metastatic disease patterns

97

IKID3+ I. . . . . [MES2+ LYN2+ ,L KID2+ ~ LYN3+ [ GEN3+ MES2+ T LNGI+ /

I~##)~,

~ouR4+ %MES4+

~

~GEN4+ ~LNG2+

~ r"~.' ,m,~vr'' ,) ~v,,L.~

I ILIV4+

I ILYN2+

I

~MES4+IPAN3+

\ I "~_ MES3+ _~LYN4+¢.~. I

I

MES2+,L KID2+ 1 BON4+ PAN2+- PAN3+ - LIV3÷ ~

-

IBON3+

. ME$1+

~

.L LYN3+ -MES3+

l

LIVI+

IGRt,~+~GENI +

/ MES2+

ILNG2+ l-MES3+ I sPLz+

]L.EP2+

~GEN2+ I LNG2+ I P ALYN4+ NI+

KID2+

~

[MES4+ IL'fN3+

~ /

~)

~LIV2+

Q

2LEP4+~BON3+

/GENI+ LIV4+ / / LNG2+ PAN2÷



.IV4+ PANI+

UV4÷ \ LNG2+ ~

/

I PANS+ GEN3+ ILYNS+ / LYN3+ I GEN3÷ /GRM2+ PAN3+ I BON3+ ~LNG4+ SKN2+ (~ ( ~ S P L 2 + (~

GENI+

\

LNG3+ LYN4+ PAN2+ SKN2+ DUR4+KiD,+

\LIV2+ GEN3÷~ PAN2+ \ ~

V~2.t. ~ LI / SKN2+ / BON4~-

/I~-N-~'.,~" / (~/ LNGI+(~

Q

I

~UA,'~ ILEP4+ (~

SPL'+ .NG4+ PAN4+

(~

LEP2+

Fig. 3. Tree diagram for the progressive metastatic spread of neuroblastic tumors. Two major metastatic lines are identified, liver (LIV4 + ) and lymph nodes (LYN2 + ), arising from the 'primary only' corner (9 patients). Each edge shows the additional site or sites involved by metastases and the grade (1 +, 2 +, 3 +, or 4 + ) of involvement. The possible sites of metastatic involvement are: BON-bone, DUR-dura, GEN-genitalia, GRM-grey matter of brain, LEP-leptomeninges, LIV-liver, LYN-iymph nodes, LNG-lun8, MESmesothelium, PAN-pancreas, SBA-subarachnoid space, SKN-skin, SPL-spleen. Each patient is located at a corner and represented by an oval. Corners indicated by dots do not correspond to an actual patient. The group of patients with neuroblastomas at upper right arising from the LIV4 + (shaded)edge correspond to the Hutchison syndrome of metastases. The branch of neuroblastoma cases at lower left arising from the DUR4 + (shaded) edge correspond to the Pepper syndrome of metastatic distribution.

DSN(,) has the property that any case with i + metastases also has (i - 1) + metastases, where I < i ~< 4. Let there be n cases (i.e. variable - V C S N ( ) has n subscripts). In the statements through lines N O M + l, the program constructs array -PQ(i,j, k) = 1, where i is

numbered consecutively from I through n + 1, k is the ith case number, j is the number of positive metastatic sites for case i and ! is the character string of length m with 0 for metastasis absent and 1 for metastasis present. The element - P Q ( I , 0, 0) is the 'primary only'

98

G.W. MOORE, G. M. HUTCHINS and S. M. oI~ LA MONTE

corner, i.e. the character string o f m consecutive zeroes. Statements ASC t h r o u g h LSC are used to c o n s t r u c t variable - P P ( ) , where - P P ( i ) = - P Q ( i , j , k). T h e tree topology is constructed in statements P O N t h r o u g h N W B + 1 as variable - P R ( i , j ) , where i is a p a r e n t corner a n d j is a d a u g h t e r corner. S t a t e m e n t s L P N t h r o u g h W T L - 1 test for the local p a r s i m o n y criterion (Definition 5). The r e m a i n i n g statements print the results. The actual tree calculation a n d testing are performed between lines C H R a n d W T L - 1, a mere 38 lines of p r o g r a m m i n g statements.

Application to neuroblastic tumors A t t e m p t s to predict t u m o r behavior have focused largely o n the use of cell surface properties, t241 memb r a n e receptors ~25-3°j a n d enzyme ~al~ a n d metabolic ~a2~ m a r k e r s to identify s u b p o p u l a t i o n s of malign a n t cells which are indistinguishable by routine histologic techniques. While this a p p r o a c h has been rewarding in several instances, ~27"29~ some reports suggest that o t h e r aspects of t u m o r biology, such as inherent metastatic p o t e n t i a l of m a l i g n a n t

cells,~2s.31.331 might serve as equal to better predictors of t u m o r behavior. W e analyzed the distribution of metastases from primary n e u r o b l a s t o m a s in an effort to determine: (i) whether the distributions of extracentral nervous system spread were patterned or r a n d o m ; (ii) whether the distributions of extra-central nervous system spread would predict or correlate with the occurrence a n d / o r distribution of central nervous system metastases. The a u t o p s y records of all patients with neuroblast o m a were reviewed. Only patients in w h o m complete autopsies h a d been performed at The J o h n s H o p k i n s Hospital were included in the study. Patients in whom only partial autopsies h a d been performed were excluded from the data analysis. I n f o r m a t i o n regarding the location of the primary t u m o r and the locations of all metastases were o b t a i n e d from the autopsy protocols. T h e locations of metastases were confirmed by review of the available histologic sections a n d recorded with respect to : (i) a n a t o m i c region of the body, e.g. head, thorax, a b d o m e n , etc.; (ii) o r g a n or structure involved; (iii) type of tissue involved, e.g. pleura, myocardium, grey matter, etc.; (iv) b r o a d categories of

Table 4. Metastatic spread data for 14 metastatic sites and 40 neuroblastomas in autopsied patients (for site abbreviations see Fig. 3) Patient no.

Metastatic sites

1-9

(primary only--no metastases)

10 11 12 13 14

LIV4 LIV4 LIV4 LIV4 LIV4

15 16 17 18 19 20 21 22

LYN2 LYN4 LYN4 LYN4 LYN4 LYN3 LYN2 LYN2

+ + + + + + + +

MES4 MES3 MES4 MES4 MES3 MES2

+ + + + + +

LIV4 + PAN3 + LIV4 + PAN3 + DUR4 + PAN2 + DUR4 + LEP2 + PAN3 + KID2 + LNG2 + PAN3 + KID2 + BON4 +

23 24 25 26 27 28 29 30 31 32 33

LYN4 LYN2 LYN2 LYN2 LYN3 LYN2 LYN3 LYN2 LYN2 LYN2 LYN2

+ + + + + + + + + + +

BON3 BON3 BON2 BON2 BON3 BON2 BON2 BON3 BON4 BON3 BON4

+ + + + + + + + + + +

DUR4 DUR4 DUR4 DUR4 DUR4 DUR4 DUR4 DUR4 DUR4 DUR4 DUR4

+ + + + + + + + + + +

MESI + GEN2 + LNG2 + PAN1 + MESI + GENI + KID2 + LEPI + MES2 + MES2 + GENI + LIV4 + LNG2 + PAN2 + MES2 + GEN3 + LIV4 + LNG2 + PAN3 + LIV4 + GENI + LEP4 + LNG4 + GRM2 + SPL2 + LIV4 + GEN3 + LEP4 + LNG2 + PAN3 + SKN2 + LIVI + GEN3 + PAN2 + LIV2 + GEN4 + PAN3 + SKN2 + LNGI + LIV2 + SBAI + LIV2 + LEP4 +

34 35 36 37 38 39 40

LYN2 LYN2 LYN3 LYN3 LYN3 LYN4 LYN4

+ + + + + + +

BON2 BON2 BON2 BON3 BON2 BON2 BON2

+ + + + + + +

GRM2 + GENI + LNGI + LNG2 + LNG2 + LNG3 + LNG4 +

LEP2 + KID2 + MES3 + LIV2 + LIV4 + PANI + GENI + LIV4 + PAN2 + DUR4 + SKN2 + KID1 + SPLI + LIV4 + PAN4 + DUR3 + SKN2 + MES3 + LEP2 +

+ + + + +

KID2 + KID3 + PAN2 + LYN3 + LYN3 +

LYN2 + MES2 + LNG3 + LNG3 + MES2 + LNGI + GEN3 +

GEN4 + LNG2 + SKN2 + SPL2 + LIV3 +

Lattice theory for metastatic disease patterns

99

~0

1M AR Y-'~') NLY~

~

W

LIV4 + PAN3+

LYN2+

MES3+ t

LYN4+~____~ MES3+ -= S

U

V

KID2+ PAN3+ SPL2+ LNG2+

Fig. 4. Example of a violation of the local parsimony property, obtained during the minimum entry construction of the tree in Fig. 3. Comers s, t, u, t;, w correspond to the corner labels in Fig. 3. When comer w is added by the minimum entry criterion to the previously locally parsimonious tree, it attaches at comer u to form edge (u, v). It is then possible to remove edge (s, t) and attach corner t to corner v to form a more parsimonious tree. The additional cost of LYN4 + minus LYN2 + (2 units) is more than compensated for by saving the cost of MES3 + (3 units), a total savings of I unit.

embryologic derivation of the affected tissue, e.g. mesoderm, neural crest, endoderm, etc. Fourteen metastatic sites (see Fig. 3) were used in the present analysis. The extent of metastatic involvement of each structure was graded in a semiquantitative manner, on a scale of 0 to 4 + , as follows: 1+ for a single metastatic focus or less than 5 % organ involvement; 2 + for 2-5 separate metastases or up to 20% organ replacement; 3 + for greater than five separate metastases or up to 50% organ replacement; 4 + for massive or extensive tumor infiltration.

Location of primary tumors In this study, a total of 40 cases of neuroblastoma were studied. Nine cases were discovered incidentally at autopsy in the adrenal medulla and had no metastases, 31 cases had demonstrated metastases. In the entire autopsy file of The Johns Hopkins Hospital, there were 62 cases of neuroblastoma, of which 21 cases were excluded because only partial autopsies had been performed and one case was excluded because no tumor was present at autopsy. Review of these records and available histology of cases that were excluded revealed similar distributions of central nervous system metastasis as reported here. Analysis of metastatic distribution. The metastatic distribution data are shown in Table 4. Lattice theory analysis (Fig. 3) demonstrated two major lines of metastatic growth : one branch had at least 2 + lymph node metastases ( L Y N 2 + ) ; a second branch had at least 4 + liver metastases (LIV4+). The lymph node branch contained 26 cases and the liver branch contained 5 cases. The local parsimony criterion (Definition 5) was not satisfied for three case entry steps representing a total of 6 excess transitions (3 %)

out of 200 total transitions. Figure 4 shows a subset of the tree in Fig. 3 in which the minimum entry, but not the local parsimony, criterion is satisfied. Corners s, t, u, v, w correspond to the corner labels in Fig. 2. In this case, the rightmost patient of Fig. 4 (corner w), with a total of 14 metastatic events to L Y N 2 + M E S 3 + K I D 2 + P A N 3 + S P L 2 + L N G 2 + , finds its minimum entry to the previously locally parsimonious tree by forming link uw, and adds a total of 12 new metastatic events to the lattice (LYN2+ represents metastatic events already present prior to corner u). However once corner w has been added, the lattice now fails to be locally parsimonious. That is, if link st is removed from its position indicated on Fig. 4 and reattached at corner v to form link vt, then one must now pay the cost of L Y N 4 + minus L Y N 2 + (2 units), but save the cost ofMES3 + (3 units), a total saving of 1 unit. Thus, the minimum entry inclusion of corner w has in effect destabilized the local parsimony property elsewhere on the lattice. For the minimum entry tree in Fig. 3, this destabilization event took place three times, as indicated by the error message in line ALT + 2 in program - V M I N T R (Table 3). The resulting tree revealed both Hutchison (34) and Pepper ~35) type distributions of metastases from neuroblastomas. The 5 cases arising from the Liver 4 + branch had extensive hepatic metastasis and had few metastases elsewhere in the body. These cases correspond to the Pepper syndrome. Similarly, the Lymph Node 4 + branch, which arises from the Lymph Node 2 + branch, contains cases which have massive intra-abdominal tumor, usually with involvement of only a few different organs. In the sense that these tumors grew to massive sizes without widely disseminated metastases, this group also resembled the Pepper syndrome. Note that in both the Liver 4 + and

100

G . W . MOORE, G. M. HUTCHINS and S. M. DE LA MONTE

Lymph Node 4 + groups, central nervous system metastases were rare, i.e. present in 1/9 cases. A second pattern of extra-central nervous system metastasis from neuroblastoma was associated with bone metastases. As shown in the tree diagram, bone metastases (BON2+ or greater) were associated with more widely disseminated tumor, i.e. an increased total number of metastatic sites (P < 0.001), including the presence of central nervous system metastases (P < 0.001). While the Hutchison syndrome is defined by prominent skull metastases, we found that skull and orbit metastases were strongly correlated with the presence of cen tral nervous system metastases. Therefore, cases which stem from the Dura4+ branch actually correspond to the Hutchison syndrome, because all these cases had extensive skull and/or orbital metastases. A third group of cases stems from either the Lymph Node 2 + or Bone 2 + branch. These cases of neuroblastoma really do not represent either a Pepper or Hutchison syndrome. Rather, they seem to represent combinations of the two syndromes, in the sense that large masses of intra-abdominal tumor coexisted with widely disseminated tumor. It is interesting that the cases located farthest out on the branches illustrate the overlap between the Hutchison and Pepper syndromes. DISCUSSION

Autopsy pathologists have long recognized that certain types of primary human tumors have familiar patterns of metastatic spread. It is less clear whether meaningful patterns of spread can be discerned objectively or whether partial knowledge of a metastatic pattern can have diagnostic or therapeutic significance. The data derived from lattice theory analysis in this report suggest several points: (i) despite their general morphologic uniformity neuroblastomas are heterogeneous and groups of these tumors are identifiable on the basis of their metastatic behavior; (ii) neuroblastomas which metastasize to the central nervous system exhibit different patterns of extra-central nervous system metastasis compared to tumors which do not involve the brain; (iii) the distribution of metastatic neuroblastomas may correspond to either the Pepper or Hutchison syndrome, and both patterns may be present in some patients. These findings support the concept of tumor heterogeneity with regard to the intrinsic metastatic potential of malignant neoplasms. We conclude from this analysis that development of central nervous system metastasis from neuroblastomas may be predictable, based upon the pattern of distribution of extra-central nervous system tumor. It is of particular interest that metastases to mesodermal and neural crest-derived structures outside the central nervous system most strongly correlate with development of central nervous system metastases and that

metastatic lesions within the central nervous system are confined to predominantly mesodermal and neural crest-derived elements, i.e. dura and leptomeninges. The therapeutic implications of predicting a total metastatic pattern from knowledge of a partial pattern are substantial: diagnostically accessible locations of metastatic spread could be used to suggest the presence of tumor in other locations which are risky or expensive to evaluate. One could make such diagnostic predictions using classical decision theory and conditional probabilities. However, conditional probabilities may be unreliable for small data sets (such as the 40 patients with neuroblastoma obtained from an autopsy experience of over 43,000 cases spanning over 90 years) and the metastatic spread diagram shown in Fig. 3 has pathogenetic implications which are less apparent in lists of conditional probabilities. In the current study, central nervous system metastases were present in 16 cases (51 ~), including one case of microscopic metastasis only, which presumably would be undetectable by tomographic scans. Findings such as these might potentially justify the administration of prophylactic therapy to the central nervous system in selected patients with a high likelihood of metastases to this location. There is a substantial literature in the field of cluster analysis, with many applications in the field of evolutionary, biology.H~-171 These may be grouped broadly as character-state methods and matrix methods. Character-state methods, such as the present method, use the actual value of each character (here, metastatic site, with values 'i + present' versus 'i + absent') in computing the branching arrangement of the tree. These include the maximum parsimony tree, ~2-~4) the compatibility tree, 1~5~ the consensus tree ~6~ and the Camin-Sokal tree. ¢1~ The maximum parsimony and consensus approaches, which are widely used in applications in evolutionary biology, fail to satisfy the 'descendant metastasis property' (captured by the set inclusion relation, r ,'-s, in Definition 2), i.e. it is possible to have solutions in which a metastasis is present in an ancestral corner, but not in one or more of its descendant corners. Since this set inclusion relation (or some partial ordering) is fundamental to the definition of lattice, ~6~it may be that lattice theory may not be applicable to problems of evolutionary biology, where back mutations are possible. Matrix methods do not use character-state observations as the primary data, but rather rely upon a 'dissimilarity index', i.e. a count (or estimate) of the number of character-states by which each pair of patients differ from one another. It is easy to demonstrate by example that the same matrix may be generated by a variety of different character-states. This built-in loss of information in matrix methods renders them inherently less desirable than character-state methods. It is important that the mathematical properties of the method chosen should 'make sense' for the particular application. I t is widely accepted that metastatic disease begins with the

Lattice theory for metastatic disease patterns patient having tumor in no extra-primary sites, followed by successive involvement of additional metastatic sites. This paradigm is captured by the hyperdimensional cube, or lattice, whose dimensions correspond to the set of possible metastatic sites and whose edges correspond to even ts of metastatic spread. The lattice tree shown in Fig. 3 indicates which metastatic sites have been added at each edge in the tree. It is possible to see major trends of metastatic spread at a glance from diagrams such as this. The lattice theory method for determining patterns of metastatic spread is available to any laboratory with a word processor and access to a computer with a MUMPS interpreter. MUMPS is now available on the Apple Ile and IBM personal computers, as well as on the larger minicomputers used in many clinical laboratories. The lattice theory program is particularly wellsuited for the MUMPS language, since the program involves extensive manipulations on sparse, hierarchically arranged arrays with character-string subscripts. A similar program written in FORTRAN, PL/I or BASIC would probably consume several times as many lines of computer code. Since lattice theory has such a natural correspondence to the apparent progression of metastatic spread, it seems likely that lattice theory can be used to identify important patient subsets on the basis of metastatic disease. These subsets, in turn, could have diagnostic and prognostic implications in patient care. SUMMARY

It has long been recognized that different types of malignant tumors, and some subsets of morphologically homogeneous tumors, may have different patterns of metastatic distribution. If each autopsied patient is regarded as a 'snapshot' of the metastatic process, then it may be possible to infer the progression of such snapshots. The starting point is the frame with no metastases, and successive frames contain more inclusive sets of metastatic sites. This paradigm of metastatic spread can be represented using the mathematical theory of lattices, a generalization of the multidimensional cube measuring one unit in each dimension, where each lattice corner is a snapshot of metastatic disease in a single patient at a single point of time and each lattice edge represents an event of progressive metastatic spread. The entire sequence of progression on a lattice can be represented as a tree diagram and reasonably large, finite lattices can be calculated using a hierarchical computing language, such as MUMPS. The length of such a tree diagram is the sum of path lengths it traverses on the lattice ; we say that a tree is totally parsimonious if any alternative tree does not have a lower length, but only locally parsimonious if unhooking a single pathway and reconnecting it somewhere else does not decrease the length of the tree. In the 'minimum entry' tree computer construction algorithm, cases are sorted in

101

ascending order of degree of metastatic involvement. A 'primary only' corner represents cases with no metastases. Successive cases in the sorted list are attached either directly to the primary only corner or to a previously placed corner already on the tree diagram. The criterion used to decide where to place each subsequent case on the tree diagram is that a minimum additional metastatic load be added to the tree in linking each new case to thepre-existing tree. Unlike evolutionary tree construction methods, in which it is legitimate for a metastatic site to go from 'more tumor' to 'less tumor' between ancestor case and descendant case ('back replacement'), this possibility is strictly forbidden in the present method. In this report, we demonstrate by mathematical induction that a minimum entry tree is locally parsimonious if every step of the minimum entry construction procedure satisfies the local parsimony condition. The records of all 40 patients with neuroblastoma in whom complete autopsies had been performed at The Johns Hopkins Hospital were reviewed. Information regarding the location of the primary tumor and the locations of all metastases were obtained from the autopsy protocols and the locations of metastases were confirmed by review of the available histologic sections. Fourteen metastatic sites were used in the present analysis. The extent of metastatic involvement of each structure was' graded semiquantitatively on a scale of 0-4 +. Lattice theory analysis demonstrated two major lines of metastatic growth : one branch (26 cases) had at least 2 + lymph node metastases; a second branch had at least 4 + liver metastases (5 cases). The local parsimony criterion was not satisfied for three case entry steps, representing a total of 6 excess transitions (3 %) out of 200 total transitions. Since lattice theory has such a natural correspondence to the apparent progression of metastatic spread, it seems likely that lattice theory can be used to identify important patient subsets on the basis of metastatic disease. These subsets, in turn, could have diagnostic and therapeutic implications in patient care.

REFERENCES

1. S. Paget, The distribution of secondary growths in cancer of the breast, Lancet 1, 571-573 (1889). 2. E. Viadana and K. L. Au, Patterns of metastases in adenocarcinomas of man. Autopsy study of 4,728 cases, J. Med. 6, 1-14 (1975). 3. S. M. de la Monte, G. W. Moore and G. M. Hutchins, Non-random distribution of metastases in neuroblastic tumors, Cancer 52, 915-925 (1983). 4. S. M. de la Monte, G. W. Moore and G. W. Hutchins, Patterned distribution of metastases from malignant melanoma in humans, Cancer Res. 43, 3427-3433 (1983). 5. S. M. de ia Monte, G. M. Hutchins and G. W. Moore, Breast carcinoma and endocrine organ metastases (abstract), Lab. Invest. 48, 19a (1983). 6. A. Tarski, A lattice-theoretical fixpoint theorem and its applications, Pacif. J. Math. 5, 285-309 (1955). 7. N. Deo, Graph Theory with Applications to Engineering

102

8.

9. 10. 11. 12.

13. 14. 15.

16. 17. 18.

19. 20. 21. 22.

G.W. Mot)RE, G. M. HUTCHINSand S. M. DE LA MONTE and Computer Science, pp. 348-354. Prentice-Hall, Engiewood Cliffs, NJ (1974). E.M. Reingold, 5. Nievergelt and N. Deo, Combinatorial algorithms: theory and practice, Graph Algorithms, Chapter 8, pp. 318-400. Prentice-Hall, Englewood Cliffs, NJ (1977). 5. B. Kruskal Jr, On the shortest spanning subtree of a graph and the traveling salesman problem, Proc. Am. math. Soc. 7, 48-50 (1956). R. C. Prim, Shortest connection networks and some generalizations, Bell System Tech. J. 36, 1389-1401 (1957). D. Cheriton and R. E. Tarjan, Finding minimum spanning trees, SIAM J. Comput. 5, 724-742 (1976). G.W. Moore, 5. Barnabas and M. Goodman, A method for constructing maximum parsimony ancestral amino acid sequences on a given network, J. theor. Biol. 38, 459-485 (1973). M. S. Waterman and T. F. Smith, On the similarity of dendrograms, J. theor. Biol. 73, 789-800 (1978). 5. S. Farris, Methods for computing Wagner trees, Syst. Zool. 19, 83-92 (1970). G. F. Estabrook, C. S. Johnson Jr and F. R. McMorris, A mathematical foundation for the analysis of cladistic character compatibility, Math. Biosci. 29, 181-187 (1976). T. Margush and F. R. McMorris, Consensus n-Trees, Bull. math. Biol. 43, 239-244 (1981). 5. H. Camin and R. R. Sokal, A method for deducing branching sequences in phylogeny, Evolution 19, 311-326 (1965). J. Felsenstein, Evolutionary trees from DNA sequences: a maximum likelihood approach, J. molec. Evol. 17, 368-376 (1981). M. D. Hendy, D. Penny and L. R. Foulds, Identification of phylogenetic trees of minimal length, J. theor. Biol. 71, 441-452 (1978). J. Bowie and G. O. Barnett, MUMPS - - an economical and efficient time-sharing system for information management, Comput. Program.s Biomed. 6,11-22 (1976). S. 5. Robboy, B. S. Altshuler and H. Y. Chert, Retrieval in a computer-assisted pathology encoding and reporting system (CAPER), Am. J. clin. Path. 75, 654-661 (1981). G.O. Barnett, N. S. Justice, M. E. Somand, J. B. Adams, B. D. Waxman, P. D. Beaman, M. S. Patent, F. R.

VanDeusen and J. K. Greenlie, COSTAR - - a computerbased medical information system for ambulatory care, Proc. IEEE 67, 1226-1237 (1979). 23. R. E. Miller, G. L. Steinhach and R. E. Dayhoff, A hierarchical computer network: an alternative approach to clinical laboratory computerization in a large hospital, Proc. 4th Syrup. on Computer Applications in Medical Care, Vol. 1, pp. 505-513 (1980). 24. I.J. Fidler, D. M. Gersten and l. R. Hart, The biology of cancer invasion and metastasis, Adv. Cancer Res. 28, 149-250 (1978). 25. R. T. Prehn, Analysis of antigenic heterogeneity within individual 3-methylcholanthrene-induced mouse sarcomas, J. hath. Cancer Inst. 45, 1039-1045 0970). 26. G. S. Pinkus and J. W. Said, Specific identification of intracellular immunoglobulin in paraffin sections of multiple myeloma and macroglobulinemia using an immunoperoxidase technique, Am. J. Path. 87, 47-55 (1970). 27. E. Gorelik, M. Fogd, S. Segal and M. Feldman, Tumorassociated antigenic differences between the primary and the descendant metastatic tumor cell populations, J. supramolec. Struct. 12, 385-402 (1979). 28. M. Fogel, E. Gorelik, S. Segal and M. Feldman, Differences in cell surface antigens of tumor metastases and those of the local tumor, J. ham. Cancer inst. 62, 585-588 (1979). 29. P. M. Comoglio, M. Prat and M. Bertini, Tumor-specific cell surface antigens of Rous sarcoma virus-transformed mammalian fibroblasts, Tumor-Associated Antigens and their Specific Immune Response, F. Spreafico and R. Arnon, eds, pp. 1-20. Academic Press, New York (1979). 30. M. Sluyser and R. VanNie, Estrogen receptor content and hormone-responsive growth of mouse mammary tumors, Cancer Res. 34, 3253-3257 (1974). 31. H.B. Bosmann, G. F. Bieber, A. E. Brown, K. R. Case, D. M. Gersten, T. W. Kimmerer and A. Lione, Biochemical parameters correlated with tumour cell implantation, Nature 246, 487-489 (1973). 32. I. Kiricuta, 1. Mustea, I. Rogozan and G. Simu, Relations between tumor and metastasis. I. Aspects of the crabtree effect, Cancer 18, 978-984 (1965). 33. I. J. Fidler, Tumor heterogeneity and the biology of cancer invasion and metastasis, Cancer Res. 38, 2651-2660 (1978).

About the Author--Dr MOORE received a B.S. from the University of Michigan in 1967, a Ph.D. in Biomathematics from North Carolina State University at Raleigh in 1971, and an M.D. degree from Wayne State University in 1976. He completed a residency in Anatomic Pathology in 1981 and is currently Assistant Professor of Pathology at The Johns Hopkins Medical Institutions. About the Author--Dr I-IuTcmNSreceived a B.A. degree in 1957 and an M.D. degree in 1961 from The Johns Hopkins University. He completed a residency in Anatomic Pathology in 1966-and is currently Professor of Pathology at The Johns Hopkins Medical Institutions. About the Author--Dr De LA MONTEreceived an A.B. degree from Cornell University in 1972 and an M.D. degree from Cornell University in 1977. She is currently Chief Resident in Neuropathology at the Massachusetts General Hospital and Clinical Instructor at the Harvard Medical School.