BULYuETIN OF ~IATHE~IATICAL B I O L O G Y
vonv-~
39, 1977
ON T H E COMPATIBILITY OF B I N A R Y QUALITATIVE TAXONOMIC CHARACTERS
[] F. R. MCMORRIS
Department of Mathematics, Bowling Green State University, Bowling Green, OH 43403, U.S.A.
I n t h i s n o t e we p r o v e t h a t a n e n t i r e collection of b i n a r y q u a l i t a t i v e t a x o n o m i c c h a r a c t e r s is c o m p a t i b l e if e a c h c h a r a c t e r i n t h e collection is c o m p a t i b l e w i t h e v e r y o t h e r c h a r a c t e r . A n a p p a r e n t l y m o r e general definition of c o m p a t i b i l i t y is p r e s e n t e d a n d s h o w n t o b e e q u i v a l e n t t o t h e old one.
1. Introduction. In attempting to construct a reasonable estimate of the phylogeny of a particular collection of evolutionary units (EU's), it is important to have methods available that enable the investigator to select those taxonomic characters containing appropriate evolutionary information. When binary characters are used, the working taxonomist has evidently found the concept of uniquely derived character useful. LeQuesne (1969, 1972) called a character uniquely derived if it "evolved only in one direction on a single occasion in its history." In a more general context we have called such characters "true" (Estabrook et al., 1975), and this terminology will be used for the remainder of this note. One can see that if a binary character is true, then it indicates one of the branches of the actual phylogenetic tree of the study group S. Hence if a sufficient number of non-equivalent true characters can be found on S, then it is theoretically possible to faithfully reconstruct this evolutionary history. Unfortunately since the actual history of S is unknown, one can never be certain whether a character is true. The best we can do is give conditions under which it is a logical possibility that a collection of characters are all true. We have called such characters compatible, and in (Estabrook et al., 1975, 133
134
1~. R. MciYIOI~RIS
1976a, b; McMorris 1975) we obtained compatibility results for multi-state characters whose states have an evolutionary ordering in the form of a tree. (These characters are called eladistie characters.) In the binary character ease our results reduced to the original test announced by LeQuesne and this will be recalled below. Not all characters used in a study are likely to be compatible and it is therefore important to have a method which will pick out maximal compatible subsets of the original collection. The result which easily enables us to give such an algorithm for cladistic characters is the fact (Theorem 4 in Estabrook et al., 1976a) that a pairwise compatible collection of cladistic characters is compatible. For characters whose states are not initially given an ordering (qualitative characters), this result is not true (Fitch, 1975; McMorris, 1975). I n fact, we have only recently been able to mathematically verify a procedure suggested by Fitch, and independently by Estabrook and Landrum (1975), for determining when two qualitative characters are compatible (Estabrook and McMorris, 1976). The main purpose of this note is to prove t h a t for binary qualitative characters, a pairwise compatible collection is compatible. 2. Definitions and Results. From now on, S will denote the study collection of EU's. The motivation for the basic definitions presented here can be found in (Estabrook, 1972; Estabrook et al., 1975). All sets are finite. A tree semilattice is a partially ordered set (T, _-<) in which every two elements have a greatest lower bound and satisfies the tree condition t h a t a < c and b < c for a, b, c e T imply a < b or b < a. We use the notation a ^ b = g.l.b. {a, b}. A tree poset is a partially ordered set satisfying the above tree condition. The central goal in m a n y phylogenetic studies is to construct a tree semilattice S* containing S in such a way t h a t S* can be considered a plausible guess of the evolutionary history of S. We of course take the ordering of the semilattice to be the usual ancestor relation, where a < b if and only if a "is an ancestor of" b. Since S sometimes m a y already include EU's t h a t are related, S is assumed to be a tree poser. A binary qualitative character on S is a map from S onto a two element set and a binary cladistic character on S is a map from S onto the two element semilattice. I f S* is a tree semilattice containing S as a sub-poser we define "binary character on S*" analogously. A cladistic character K on S* is ideally related to S* if it satisfies the uniquely derived condition of LeQuesne given in the Introduction. In (Estabrook et al., 1975) we made this concept mathematically precise and showed t h a t K is ideally related to S* if and only if it is a semilattice homomorphism (K(a ^ ®) = K(a) ^ K(b) for all a, b e S*). We say that a collection of binary cladistic characters K1 . . . . . K n on S is compatible if there exists a tree semilattice S* extending S such t h a t each Ki can be extended to a semilattice homomorphism on S*. The usefulness of this
COMPATIBILITY OF BINARY QUALITATIVE TAXO:NOMIC CHARACTERS
135
definition lies in that fact that incompatibility implies that the characters, as presently structured, cannot all be true on the actual evolutionary history of S, and hence they are not as a set useful for our methods of constructing a first estimate of this phylogeny. In general, qualitative characters are those characters, many times not morphological, where there is no a priori reason for having one state designated derived and the other primitive. An amino acid position on cytochrome c is an example of a multi-state qualitative character since there is no universal reason for believing that, within the study group, a particular amino acid is more primitive than any other (Boulter et al., 1972; Estabrook and Landrum, 1975). Another example is the binary character that registers "presence" or "absence" of a certain pigment in plants (Taylor and Campbell. 1969). Even though there m a y be a great deal of back mutation and/or parallel evolution with qualitative characters, one needs only a relatively few true characters to get a good picture of the phylogeny. A collection of binary qualitative characters is said to be compatible if their character state sets can be given orderings such that the collection is a compatible set of cladistic characters. Thus, as before, it is impossible for incompatible qualitative characters to all be true. Recall (LeQuesne, 1969) that two binary qualitative characters KI:S--~Q1 and K~ :S --~ Q2 are compatible if and only if not all four elements of Q1 × Q~ appear in (K1 × K2)(S) = ((K~(x), K2(x)):x e S}. We must also recall (Estabrook et al., 1976a) that two binary cladistic characters K~, K2 : S --> ]01(O primitive, 1 derived) are compatible if and only if not all three of the elements (1, 1), (0, 1), (1, 0) appear in (K1 × K~)(S). If K: S - 3 (a, b} is a binary qualitative character and the set (x ~ S: K(x) = a} has more elements than the set (x E S:K(x) = b} we say that state a is larger than state b. The number of elements in the first set is denoted b y # a and similarly for the second. Also let 0 denote "primitive" and 1 "derived" so that writing a -- 1 means that state a is designated the derived state. I f K1 and K2 are compatible binary qualitative characters and are not equivalent (i.e. do not induce the same partition of S), then it is impossible for both # a = # b and # c = # d. It is tacitly assumed that the results are stated for non~ equivalent characters. Finally we now can get to tbe main result.
L~M~zA. Let K I :S --> Qz = {a, b} and Kg.:S --~ Q~ = {c, d} be compatible binary qualitative characters. Then, if the larger state in both K1 and K2 is taken as O, K1 and K~ becomes compatible cladistic characters in this ordering for Qz and Q~.. I f K~ or K2 has an equal number of EU's in each state, then an arbitrary assignment of 0 or 1 to the states of that character can be made. Proof. The only non-trivial ease is when three out of the possible four elements of Q1 × Q2 appear in (K1 × K2)(S) and we assume without loss of
136
F. 1%.McMOR1%IS
generality t h a t (a, c), (a, d), a n d (b, c) appear as data. The other three eases are handled similarly. I f # a ~ #band#c~ #d, thena = 0-- candb = 1 = d a n d the three ordered pairs become (0, 0), (0, 1), (1, 0). B y the compatibility result for eladistic characters mentioned above, K1 a n d K2 are now compatible as cladistic characters. I f # a ~ # b a n d # c < # d, t h e n the three ordered pairs become (0, 1), (0, 0), (1, 1). I f # a ~ # b t h e n # c ~ # d since (b, d) does n o t appear. Hence a - - 1 = d, b - - 0 = c a n d we have (1,0), (1,1), and (0, 0) appearing. I f # a = # b t h e n again we have # c > # d a n d either the assignment of a = 0 or a -- 1 will result in compatible cladistie characters. Q.E.D. TttEOREM 1. A collection of qualitative characters in which the characters are pairwise compatible is a compatible collection.
Proof. F r o m the L e m m a , each character state set can be given a fixed ordering so t h a t the collection now consists of pairwise compatible cladistic characters. B y Theorem 2 of (Estabrook et al., 1976b) the collection is compatible. Q.E.D. Notice t h a t the converse to Theorem 1 is obviously true. Thus the L e m m a and Theorem 1 tell us t h a t if one has a collection of b i n a r y characters t h a t possibly are all true, t h e n taking the ubiquitous state for each character as primitive guarantees that, for these orderings, t h e y are cladistic characters each ideally related to the same estimate of the evolutionary history of S, a necessary condition for t h e m all to be true. I n fact, as will be noted in examples below, this ordering must be assigned to be assured compatibility in the cladistic sense. I t is interesting to note t h a t the assignment of primitive to the largest state is fairly s t a n d a r d practice (Scora, 1966; Taylor a n d Campbell, 1969) and the justification is p u r e l y biological, while here it comes out in the m a t h e m a t i c a l wash. F u r t h e r discussion will be found in (Estabrook, 1976). Also it is i m p o r t a n t to r e m a r k t h a t the above theorem is not t r u e for qualit a t i v e characters having more t h a n two states and a counterexample can be found in (McMorris, 1975).
Examples. These examples show t h a t if K1 a n d K2 are compatible b i n a r y qualitative characters, a n d 1 is given to the most frequent state in either K1 or K2, t h e n K1 a n d K2 need not be compatible cladistic characters. L e t S = (1, 2, 3, 4, 5} a n d K I : S - + {a, b}, K2:S--> {c, d} defined b y KI(1) = K1(2) = K1(3) = KI(4) = a, K1(5) = b, K2(1) = K2(5) = c, K2(2) = K9.(3) = K2(4) = d. K1 a n d K2 are compatible as qualitative characters since (a, c), (a, d), (b, c) o n l y appear as data. Now # a ~ # b a n d # d # c , and if we assign a = 1, b = 0 a n d d = 0, c = 1, t h e n (1, 1). (1, 0), (0, 1) appear implying K1 and K2 are not compatible as cladistic characters in these orderings.
COMPATIBILITY OF BINARY QUALITATIVE TAXONOMIC CIIAI~ACTERS
137
For another example, suppose S and K1 are as above. Let K2(3) = d and K2(1) = K 2 ( 2 ) = K 2 ( 4 ) = K 2 ( 5 ) = c. Assuming we pick the less frequent state as 0 for both characters, the data (a, c), (a, d), (b, c) become (1, 1), (1, 0), (0, 1). Hence K1 and K2 are not compatible eladistic characters with these orderings for their states. The next definition, suggested to the author b y G. F. Estabrook, extends the usual one for compatibility to include the possibility of ancestral EU's realizing states different from the two which appear in the study collection S. For many studies, this is clearly the more realistic view of the situation. A collection K~:S--+ Q~, i = 1 . . . . . n of binary qualitative characters is E-compatible if and only if there exist tree semilattices Ti containing Q~, i = 1 . . . . . n, such that K1, • •., K n is a compatible collection of eladistic characters. We proved (Estabrook etal. 1976a; McMorris 1975) that a collection K~:S--~ Ti, i = 1, . . . . n, of cladistic characters is compatible if and only if is a tree subsemi]attice of T l x . . . x T n , where K = K l x . . . x K n , and Ira(K) denotes the image of S under K, and is the semilattice generated b y Ira. T~IEOI~M 2. Two binary qualitative characters are E-com2~atible if and only if they are compatible.
Proof. Sufficiency is obvious. Assume K1 : S -+ Q1 = {a, b} and K2 :S -+ Q2 = {c, d} are E-compatible but not compatible. Then all the elements (a, c), (a, d), (b, c), (b, d) are in (K1 x K2)(S). Now for any tree semilattices T1, T2 containing Q1 and Q2 respectively, we have (a A b, d), (a, c A d ) e _ T l x T 2 . Since (a ^ b,d), (a,c A d) <_ (a,d) and is a tree. A similar argument holds if we assume the second case. Q.E.D. COI~OLLAI~Y. A collection of binary qualitative characters is E-compatible if and only if it is compatible. Hence if binary qualitative characters are used, and it is felt that ancestral EU's have realized states not observed in S, then the above result simply states that the LeQuesne compatibility tests will suffice to test compatibility. However, the ancestral states for those characters must be postulated, and this would be done in such a fashion so as to make each character a cladistic charactor. Now, to check whether these postulated states and character state trees are reasonable, the compatibility results for eladistic characters are employed. I wish to thank Professor G. F. Estabrook for his helpful comments and ideas.
138
F. 1~. McMOI~RIS
LITERATURE Boulter, D., J. A. M. 1%amshaw, E. W. Thompson, M. l~ichardson, and 1%. H. Brown. 1972. " A Phylogeny of Higher Plants Based on the Amino Acid Sequences of Cytochrome e and its Biological Implications." Prec. R. Soc. B, 181, 441-455. Estabrook, G. F. 1972. "Cladistie Methodology: A Discussion of the Theoretical Basis for the Induction of Evolutionary H i s t o r y . " A. Rev. Ecol. Syst., 3, 427-456. - - . 1976. "Are Common Traits Primitive ?", Syst. Botany, in press. - - , C. S. Johnson, Jr., and F. 1%. McMorris. 1975. "An Idealized Concept of the True Cladistie Character." Math. Biosci., 23, 263-272. - - , - - . 1976a. "An Algebraic Analysis of Cladistic Characters." Discrete Math., in press. - - . 1976b. " A Mathematical Foundation for the Analysis of Cladistic Character Compatibility." Math. Biosci., 29, 181-187. - and L. Landrum. 1975. " A Simple Test for the possible Simultaneous Evolutionary Divergence of two Amino Acid Positions." Taxon, 25, 609-613. - and F. 1%. McMorris. 1976. " W h e n are Two Qualitative Taxonomic Characters Compatible ?" Bull. Math. Biol., in press. Fitch, W. M. 1975. " T h e l~elationship between Prim Networks and Trees of Maximum Parsimony." I n The Eighth International Conference on Numerical Taxonomy, G. F. Estabrook ed., pp. 189-230. San Francisco : W. H. Freeman. LeQuesne, W. J. 1969. " A Method of Selection of Characters in Numerical T a x o n o m y . " Syst. Zool., 18, 201-205. • 1972. " F u r t h e r Studies Based on the Uniquely Derived Character Concept." Syst. Zool., 21, 281-288. McMorris, F. 1%. 1975. "Compatibility Criteria for Cladistic and Qualitative Taxonomic Characters." I n The Eighth International Conference on Numerical Taxonomy, G. F. Estabrook, ed., pp. 399-415, San Francisco : W. H. Freeman. Score, 1%. W. 1966. " T h e Evolution of the Genus Monarda (Labiatae)." Evolution, 20, 185-190. Taylor, 1%. J. and D. Campbell. 1969. "Biochemical Systematics and Phylogenetic Interpretations in the Genus Aquilegia." Evolution, 23, 153-162•