Cladistics Cladistics 19 (2003) 554–564 www.elsevier.com/locate/yclad
Brooks Parsimony Analysis: a valiant failure Mark E. Siddalla,* and Susan L. Perkinsb a b
Division of Invertebrate Zoology, American Museum of Natural History, Central Park West at 79th Street, New York, NY 10024, USA Department of Ecology and Evolutionary Biology, Population and Organismic Biology, University of Colorado, Boulder, CO 80309, USA Accepted 9 October 2003
Abstract A recent comparison of two methods for examining correlated host and parasite phylogenies, namely TreeMap 1.0 and Brooks Parsimony Analysis concluded that the latter method performed better and is to be preferred. Reevaluation of the examples contrived for that study demonstrates that the two methods were only compared on one kind of problem (widespread parasite) for which there is an easy fix for TreeMap 1.0. Other kinds of problems like host-switching among sister taxa or host-switching over great distances across a host tree befuddle BPA even as they are readily resolved parsimoniously by TreeMap 1.0. These difficulties, compounded with inaccurate counting of ad hoc hypotheses required by its solutions render BPA unsuitable for comparison of host and parasite phylogenies. Ó 2003 The Willi Hennig Society. Published by Elsevier Inc. All rights reserved.
Introduction Examination of correlated phylogenies is now moving beyond the realm of host–parasite cospeciation and biogeography. The principles governing these questions may yield new insights from genomic studies in that they hold the promise of being able to unravel histories of gene duplications and lateral transfer of genetic elements not only among species but among cellular pathways (e.g., Planet et al., 2003). As such, it is again germane that methods available to the field be examined thoroughly so that the functioning and limitations of each are sufficiently transparent to enable even nonsystematists access to the technology. Presently there are several working methodologies (Brooks, 1981; Charleston, 1998; Page, 1994b; Ronquist, 1998) and with the exception of Brooks Parsimony Analysis (BPA) (Brooks, 1981) each has an associated software package for implementation (Charleston and Page, 2002; Page, 1995; Ronquist, 2000). Recently, Dowling (2002) confined his performance evaluation of cospeciation to the reconciled tree approach implemented in TreeMap 1.0 (Page, 1995) and BPA implemented with MacClade (Maddison and Maddison, 1992). * Corresponding author. Fax: 1-212-769-5277. E-mail address:
[email protected] (M.E. Siddall).
In light of BPAÕs widely acknowledged problem of accurately counting ad hoc events relating to lineage sorting and lateral transfers (Cracraft, 1988; Hovenkamp, 1997; Kluge, 1988; Morrone and Carpenter, 1994; Page, 1988, 1990; Page and Charleston, 1998; Ronquist and Nylin, 1990; Siddall, 1996) it is surprising that Dowling (2002, p. 431) concluded ‘‘Brooks Parsimony Analysis was the more reliable method.’’ This conclusion was based in large part on the sum of events tallied for each method across 62 trials in which TreeMap 1.0 demonstrated a tendency to overcount duplication events and undercount cospeciation events relative to the ‘‘known’’ (user-created) histories of hosts and parasites relative to BPA. However, frequencies usually are not meaningful estimates unless there is some stochastic element to what is being measured. The frequency with which a coin is purposely placed heads-up is not a good measure of whether it is biased. For the frequencies of the ad hoc events to be meaningful in DowlingÕs (2002) study, the modeled histories would have to have been generated according to some stochastic protocol, which they were not. One might easily imagine (and many have) more modeled histories in which BPA is bound to failure than those that prove difficult for TreeMap 1.0. Moreover, finding few optimal solutions should not be considered a virtue if the data being analyzed are legitimately ambiguous. Thus, BPAÕs arrival at a single
0748-3007/$ - see front matter Ó 2003 The Willi Hennig Society. Published by Elsevier Inc. All rights reserved. doi:10.1016/j.cladistics.2003.10.002
M.E. Siddall, S.L. Perkins / Cladistics 19 (2003) 554–564
reconstruction for the gopher/louse dataset relative to TreeMapÕs 480 equally optimal solutions carries no more force than does StonekingÕs (1992; see also Saitou and Nei, 1987) preference for neighbor-joining over parsimony because the former always finds only one tree (see Farris et al., 1996). Dowling (2003) has already identified some difficulties with the interpretation of 5 of his 62 trials. Here we reevaluate the remainder of those trials following what we hope will be a clearer exposition of the two methods.
What is BPA, really? Evaluation of how well BPA performs cannot be separated from BPA having been proposed for two related but ultimately different tasks. The method was first proposed by Brooks (1981) as a solution to ‘‘HennigÕs parasitological method.’’ Hennig (1966) noted that though there might be a lot of information about relationships of some host group contained in the patterns of association observed for their parasites (Stammer, 1957), no criteria were known for codifying that assumption in a testable way. Before formalizing such a method, Brooks (1979) already had considered several expectations with respect to host and parasite relationships in relation to a variety of long-standing ‘‘Parasitological Rules’’ (e.g., Eichler, Farenholtz, von Ihering) where both host and parasite phylogenies are known and can be compared to each other. In developing the method itself, Brooks (1981) later provided examples and applications to real data in a manner that was very different. Rather than comparing host and parasite trees, parasite phylogenetic trees would be recoded as additive binary characters and then (through inclusive or-ing) would be used to generate hypotheses about host phylogeny (Brooks, 1981). Alternatively, these characters could be added to other host data for that same purpose. Only in one hypothetical example did Brooks (1981, Fig. 31) actually optimize the parasite characters on a host tree that was derived independently of the parasitological characters. The use of an associate tree topology recoded as phylogenetic character information readily led itself to its being applied to biogeographic questions (Brooks, 1985) in which the output would be a phylogenetic history of areas derived from the combined information of the histories of species inhabiting them. Unlike for host taxa more frequently we do not have independent estimates of the histories of areas. Siddall (1996, p. 55) made the formal distinction between Type I BPA in which ‘‘a host (or area) cladogram is unknown and the correlation under consideration involves one or more parasite cladograms’’ and Type II BPA wherein ‘‘the associates are being compared with a defined host cladogram.’’ That the fundamental purpose of BPA was toward the creation of hypotheses of host
555
relationship (Type I BPA) rather than comparing host and parasite phylogenies (Type II BPA) seems evident in a perusal of the associated (sic) literature. Even a dozen years after the method was proposed, Brooks and McLennan (1993, p. 27) held to be ‘‘rare’’ those cases in which we have both a host and a parasite phylogeny for comparison adding that ‘‘At the moment we usually have detailed information about the parasites but not about their geographic distributions or hosts.’’ In neither book that describes BPA in detail (Brooks and McLennan, 1991, 1993) is there a single instance of Type II BPA where estimates of host and parasite phylogenies are independently known. OÕGrady and Deets (1987) examined the applicability of nonredundant linear coding only in relation to Type I BPA (where there is no independent estimate of host phylogeny). The methodological and theoretical update had a slant toward biogeographic questions (Brooks, 1990) and introduced a duplication to accommodate homoplasy (secondary BPA) only for Type I BPA; it was not clear that this ‘‘fix’’ could be applied to Type II BPA for comparison of two known topologies. This is not to say that Type II BPA had never been addressed. The first instance was sufficiently unusual to be perhaps a Type III BPA in which the independent variable (the host phylogeny) was optimized on the dependent variable, a parasite phylogeny of amphilinids (Bandoni and Brooks, 1987). Analyses of cestodes in relation to their pinniped and seabird hosts, however, followed the expected pattern for Type II BPA with parasite tree characters optimized onto the host phylogeny derived from independent data (Hoberg, 1992; Hoberg and Adams, 1992). Brooks (1988, Figs. 8–10) also described the ‘‘mapping’’ of parasite characters onto a host phylogeny (i.e., Type II BPA), but then confused the distinction by providing an example of a ‘‘host phylogeny that results from mapping [sic] the two parasite trees together’’ which was not Type II BPA mapping of one phylogeny on another but rather classic Type I BPA in which parasite characters were used to construct the host tree. More recently, Van Veller et al. (2000), Van Veller and Brooks (2001), and Brooks et al. (2001), favoring BPA over reconciled tree analysis, each did so exclusively in relation to Type I BPA and specifically with respect to the construction of area cladograms. They did not evaluate Type II BPA. In contrast, Dowling (2002) only examined Type II BPA in which both host and parasite phylogenies are available; the problem that TreeMap was designed to evaluate (Page, 1994b, 1995). As such, DowlingÕs (2002, p. 423) uncritical acceptance of Van Veller and BrooksÕs (2001) assertion that the difference between a priori and a posteriori methods (cf. Van Veller et al., 2000) is somehow explanatory because all ‘‘reconciliation methods sacrifice parsimony’’ is without merit. The so-called a priori and a posteriori
556
M.E. Siddall, S.L. Perkins / Cladistics 19 (2003) 554–564
distinction concerned biogeography, only Type I BPA and its derivatives (‘‘secondary BPA’’ sensu Brooks, 1990), and not the situation in which both host and parasite cladograms are available (i.e., Type II BPA). That Dowling (2002, p. 421) may have conflated Type I BPA and Type II BPA follows from his characterization of ghost characters being evident ‘‘only when a hostswitch has occurred and do not provide any support for groupings that were not already supported on the tree.’’ Such ‘‘support for groupings’’ can only be emergent in Type I BPA wherein one is grouping on the basis of parasite characters. With respect to Type II BPA the groupings exist in advance in the form of the independently acquired (and supported) host phylogeny upon which the parasite tree characters are merely optimized. Similarly, his assertions that ghost characters ‘‘do not appear to affect the overall structure of the BPA tree’’ or ‘‘cause any topological changes’’ DowlingÕs (2002; p. 421, 431) are not applicable—the host tree is independently given and is unchanging in Type II BPA.
Tangled trees TreeMap (Page, 1995), or at least the algorithm for its implementation (Page, 1994b) was derived from methods of Reconciled Tree Analysis (RTA) which in turn followed from previous component compatibility methods (Page, 1990, 1993a,b, 1994a). Dowling (2002) is unclear on the distinction between RTA and TreeMap. He at once asserts that RTA ‘‘does not incorporate hostswitching at all’’ (p. 423) and then illustrates 17 trials in which TreeMap postulates at least one host-switch. As noted above, Dowling (2002) was following Van Veller et al. (2000) and Van Veller and Brooks (2001) in his criticisms of reconciliation (RTA) methods as ‘‘a priori.’’ The choice of terms ‘‘a priori’’ and ‘‘a posteriori’’ by Van Veller et al. (2000; but see Ebach and Humphries, 2002) are unnecessarily confusing because these terms usually are applied to statistical hypothesis testing regimes (e.g., Kishino and Hasegawa, 1989; Shimodaira and Hasegawa, 1999). Still, previous treatments have been clear in critiquing reconciled tree analysis only as implemented in Component 2.0 (Brooks et al., 2001; Van Veller and Brooks, 2001; Van Veller et al., 2000) which unlike TreeMap 1.0 does not include hostswitching. Any attempts to level such criticisms at TreeMap under the umbrella of RTA miss their mark. Answering critics like Van Veller et al. (2000) in advance, Page (1994b) developed TreeMap 1.0 specifically because of the ‘‘unsatisfactory choice between one method (BPA) that incorporates host transfer but can lead to internal inconsistencies, and another method (reconciled trees [i.e., RTA]) that discounts hostswitching altogether’’ Page (1994b, p. 155). In formulating the criteria to be followed for the incorporation of
host-switching into the reconciled tree approach, Page (1994b, p. 162) was explicit that the method could not be readily applied to situations in which a single parasite species was associated with more than one host and he alluded to complexities of this problem. That and other complexities of the host-switching problem have since been addressed in some detail by Ronquist (1995). Dowling (2002, pp. 422–423) was (or should have been) aware of TreeMapÕs limitation to one-host-per-parasite when he established the 62 trials, 22 of which violated that very constraint.
Counting errors Dowling (2002, Table 5) identified 28 instances in which TreeMap 1.0 arrived at a different number or combination of cospeciations, host-switches, and sorting events than were presupposed in his contrived host– parasite history. Five of those instances were incorrectly assessed in the original (Dowling, 2003). In addition, our examination of trial 47 with TreeMap 1.0 yields 10 cospeciations, one duplication, no host-switches, and a sorting event precisely as indicated by the model and contra DowlingÕs (2002, p. 427) having found the ratio to be 8:0:3:1. This leaves 22 instances in which TreeMap seems to be at variance with the contrived model tree.
What is parsimony, really? Two of DowlingÕs (2002, Table 5) trials (45 and 49) reflect an asymmetry between truth and its estimation (notably, in these cases neither BPA nor TreeMap converged on the ‘‘true’’ history). In the case of trial 45 an exhaustive search yields a solution with only one ad hoc event: a host-switch. The latter is simply a more parsimonious explanation of the patterns observed than is the true history. Similarly in trial 49 wherein a lineage sorting erases one of the descendants of a duplication it will always be more parsimonious to postulate a single hostswitch than the other two ad hoc explanations. In trial 54, rather than have the parasites colonize the host tree at the base only to sort out one of the immediately descending lineages, it is more parsimonious to postulate that the parasites arrived one node later. The suggestion that the method to prefer necessarily is the one ‘‘that deviates the least from the known test case’’ Dowling (2002, p. 423) in these instances is equivalent to criticizing the use of parsimony in phylogenetic analyses of nucleotide data because it does not take into account multiple substitutions (Felsenstein, 1978, 1982; Goldman, 1990; Huelsenbeck and Hillis, 1993; Saether, 1986; Swofford et al., 2001); arguments we do not find particularly convincing (Pol and Siddall, 2001; Siddall, 2002; Siddall and Kluge, 1997; Siddall and Whiting, 1999).
M.E. Siddall, S.L. Perkins / Cladistics 19 (2003) 554–564
more than one host, contravening PageÕs (1994b) admonition that this should not be done. The obvious solution to this difficulty is to simply code the parasite for each time it occurs (e.g., Ia and Ib) represented unproblematically on the parasite tree as a monophyletic group (which presumably we admit they are if we are coding
A simple fix The remaining trials in which TreeMap 1.0 did not provide Dowling (2002) with the expected number of cospeciations, duplications, host-switches, and lineage sorts are those in which a parasite species occurred on A
557
B
D IV II
D IV IIb C III
C III 18
A I II
A I
13 12
B
E V
E V
19
17 14
F VI
16
G VII 15
H VIII 13
H VIII
15 I
14
IX D IV
C
20
A I
13 10
17
C III
A I
11
B II E V VII
14
I IX D IV
D
C III 16
F VI G VII
17
12
B IIa
12
21
18
F VI VII
14
G VII
B II E V VIIb F VI VIIc G VIIa
19
H VIII 12
J IX
H VIII 17 15
J IX
Fig. 1. Reanalyses of DowlingÕs (2002) Trials 1 and 8, which recode widespread parasites as monophyletic lineages. (A) DowlingÕs (2002) conclusion from TreeMap 1.0 analysis showing eight lineage sorting events, three duplications, and five cospeciations. (B) The same dataset analyzed with TreeMap 1.0, recoding widespread parasite ‘‘II’’ as ‘‘IIa’’ and ‘‘IIb’’ now produces the eight cospeciations and one host-switch as deemed correct. (C) Trial 8 as analyzed by Dowling (2002) in TreeMap 1.0 yields six cospeciations, two duplications, and four lineage-sorting events. (D) The same dataset reanalyzed after splitting parasite taxon VII into a monophyletic lineage of 3 produces the actual history as defined in the model.
558
M.E. Siddall, S.L. Perkins / Cladistics 19 (2003) 554–564
them as the same species in the first place). After all, if BPA can duplicate redundant areas or hosts to get around (only) one of its several difficulties, TreeMap 1.0 should be allowed to duplicate associates to get around its sole failing. Consider DowlingÕs (2002) Trial 1. Unmodified TreeMap 1.0 suggests eight lineage sorts, three duplications, and five cospeciations (Fig. 1A) relative to BPA having correctly suggested eight cospeciations and one host-switch Dowling (2002, Table 5). But if one simply duplicates parasite II (to IIa and IIb) and reanalyzes, TreeMap 1.0 arrives at the same solution as BPA and the same as the input trial (Fig. 1B). Consider now, Trial 8. Dowling (2002, p. 428) concluded ‘‘that BPA is the more consistently accurate method’’ notwithstanding that it considerably overestimates the number of hostswitches. In Trial 8, BPA correctly hypothesized eight cospeciations but found two host-switches; one more than was modeled; a direct result of descendants illogically carrying long-dead ancestors (ghost lineages) with them when they host-switch. For this same trial, when widespread parasite VII is multiplied to VIIa,b,c and analyzed with TreeMap 1.0 (Fig. 1D) the correct estimate of eight cospeciations and one host-switch is found.
Every single one of the 22 trials involving a widespread species (a parasite in more than one host) can be reanalyzed in this manner and in each case TreeMap 1.0 arrives at the ‘‘true’’ history contrived by Dowling (2002); thus outperforming BPA even for this simple problem. Thankfully, the obvious requirement for representing a widespread parasite as many times as it occurs is no longer an issue. TreeMap 2.0 (Charleston and Page, 2002) has circumvented the problems alluded to by Page (1994b) in a manner consistent with RonquistÕs (1995, 2003) criteria for potency, factuality, and consistency (see also Charleston and Perkins, 2003). In any case, with this simple fix, TreeMap 1.0 is rendered more accurate than BPA—but only for the trials considered by Dowling (2002).
The bane of BPA Dowling (2002) did evaluate a variety of circumstances in his determination of the performance of TreeMap 1.0 and BPA. However, as the previous section shows, those trials were overwhelmingly of a kind in which there are multiple hosts associated with a
Fig. 2. Illustration of the unparsimonious nature of BPA via 3-taxon examples. (A) A single host-switch among sister taxa necessitates two ad hoc events (one duplication + one lineage-sorting event) under BPA, but only one (a host-switch) using TreeMap. (B) A host-switch to a monotypic lineage requires three ad hoc events under BPA, but only two in TreeMap. (C) In scenarios where BPA does include a host-switch, TreeMap typically produces more parsimonious solutions.
M.E. Siddall, S.L. Perkins / Cladistics 19 (2003) 554–564
1
That is, ancestors 8 and 9 predate 7, and all three are present at the base of the tree. Brooks has never identified these as duplications, per se. In Type I BPA, they are merely synapomorphies supporting the monophyly of the whole. In Type II BPA they can only be interpreted as speciation events in the parasite lineages that predate the origins of the host group (i.e., identical to basal duplications in TreeMap). Proper analysis requires a ROOT taxon (see Siddall, 1996).
parasite
host
A
V
b
III a I
9 7
d II
1: 0->1
d
c
b
a
B
8
6
IV
c
2: 0->1 5: 0->1 6: 0 ->1
4: 0->1 7: 1->0
3: 0->1
6: 0 ->1
7: 0->1 8: 0->1 9: 0 ->1
3: 0->1
1: 0->1
d
c
b
C a
parasite. Dowling (2002) did not evaluate the other wellknown circumstances in which methods are known to behave poorly such as switching among closely related lineages, or lineage sorting after successive cladogenesis. Arguably, had he fashioned enough of these, Dowling (2002) may not have concluded that: (1) ‘‘there was not single trial in which Treemap predicted more cospeciation events than BPA,’’ (2) ‘‘BPA produces with fewer duplications and sorting events,’’ and (3) ‘‘BPA seems to perform very well.’’ These assertions are easily refuted with simple examples (Fig. 2). If the actual history involves a hostswitch between sister taxa (Fig. 2A), BPA is forced to hypothesize that both ancestors arose prior to the divergence of those sister taxa, unparsimoniously implying a duplication and an extinction (lineage sort). Similarly, in the case of a host-switch from one member of a clade that is sister to a monotypic lineage (Fig. 2B), BPA prefers the unparsimonious solution entailing a duplication and two lineage-sorting events and omitting hostswitching altogether. In both of the foregoing, TreeMap 1.0 yields solutions that are more parsimonious by employing host-switches where BPA cannot. Even in simple examples where BPA does allow host-switching, TreeMap 1.0 can do so more parsimoniously (Fig. 2C). The better behavior of TreeMap extends beyond these 3-taxon examples. Fig. 3A shows a set of host and parasite phylogenies and the associations of the taxa therein. The Inclusive Ored BPA matrix for the parasite tree and host associations is: a 100001111 b 001000111 c 000100011 d 010011111 Fig. 3B depicts the Type II BPA solution from fitting the BPA matrix to the host tree. Fig. 3C is a lineage trace of what is required by BPAÕs optimizing the binary parasite data on the host tree in Fig. 3C. Note that there is one hypothesized cospeciation (ancestor-7 splits at the base with descendants terminal-III and ancestor6), there is a single host-switch (as evidence by the two independent ancestor-6: 0 ! 1 changes), and one lineage-sorting event (ancestor-7: 1 ! 0, concomitant with terminal-I descending from ancestor-6 on host-a). If one wished to count duplications, there are two implied at the base of the tree.1 We contrived this case so as to omit the confounding problem of ‘‘reticulate histories’’ and ‘‘easily detectable ghost taxa’’ that might otherwise
559
4: 0->1
>0
7:
6: 0 ->1
1-
6:
0
>1
2: 0->1 5: 0->1
-
Fig. 3. A hypothetical example of a 4-host, 5-parasite system in which parasites II and V each are associated with host d. (A) The ‘‘tanglegram’’ depicting associations between hosts and parasites with parasite ancestors numbered for BPA. (B) Type II BPA optimization of parasite data using the matrix depicted in the text. BPA implies ancestors 7 through 9 are synapomorphies at the base of the tree. Character 6 is homoplasious indicating a host-switch and character 7 reverses suggesting a lineage-sorting event. (C) Lineage tracking of parasites and their ancestors (in black) on the host tree (in grey) based on the results of mapping in (B). There are two duplications (boxes) one cospeciation (black circle) and a host-switch (arrow). BPA fails to count five lineage-sorting events (white arrowheads) required by the optimization
be rectified with so-called secondary BPA (Brooks, 1990). Fig. 4 illustrates the three TreeMap 1.0 solutions for the same problem, each of which finds two cospeciation events (one more than BPA). Moreover, in Fig. 4B there are two host-switches (BPA only found one), and a single sorting event. In Fig. 4C, the most
560
M.E. Siddall, S.L. Perkins / Cladistics 19 (2003) 554–564
Fig. 4. The three solutions to the same set of associations depicted in Fig. 3 as found by TreeMap 1.0. Each solution (A–C), yields two cospeciations, one more than BPA does. The most parsimonious of the three (C) has two host-switching events (one more than BPA) and no other ad hoc hypotheses.
parsimonious solution of the three, there are two cospeciations, two host-switches, no duplications, and no lineage-sorting events. Thus DowlingÕs (2002) assertions that BPA will perform well finding more cospeciation events and more host-switching events than TreeMap 1.0 is refuted. Looking more carefully at what the BPA solution requires, reveals a difficult problem. BPA suggests only a single sorting event (Fig. 3B), the loss of ancestor-7. Yet for the BPA solution to be at all sensible (Fig. 3C) there are six lineage-sorting events that must have occurred five of which are simply ignored by BPA. We grant that ignoring events necessary yields fewer steps, but this is not normally an accepted manner for minimizing ad hoc hypotheses.
HIV The failure of BPA to accurately (or even parsimoniously) reflect patterns of host and parasite association is not difficult to see even in known empirical applications. Mindell et al. (1995) arrived at the unusual conclusion that humans gave HIV to monkeys by way of optimizing hosts on the viral phylogeny, a method that at the time already was obsolete and much repudiated (Siddall, 1997). A reanalysis of the associated HIV and primate phylogenies used by Mindell et al. (1995) and clarified by Siddall (1997) reveals the superiority of TreeMap 1.0 over Brooks Parsimony Analysis (Fig. 5). The TreeMap solution (Fig. 5B) yields five cospeciation events as well as four host-switches, one duplication, and five lineage-sorting events for a cost of 10 ad hoc
hypotheses. In contrast, the BPA solution is considerably less parsimonious (Fig. 5C). Examination of the lineage trace (Fig. 5D) implied by the transformations in the BPA solution (Fig. 5C) (leaving aside for the moment that the method does not actually do this of its own accord), BPA finds the same five cospeciations as TreeMap 1.0, however, it yields only one host-switch (ancestor-13 giving rise to HIV2), as well as fully four duplications and a single lineage-sorting event (the loss of ancestor-11 on Mandrilus). Admittedly this is not entirely straightforward. The host-switch by ancestor-13 carries ancestors 14 and 15 with it as two extra-step ghost lineages sensu Dowling (2002). Also, the loss of ancestor-11 causes the simultaneous loss of all of ancestor-11Õs ancestors (ancestors 12, 13, 14, 15, and 18). As a result there are seven unnecessary transformations counted in the BPA solution. However, in light of the four duplications (ancestors 19, 17, 15, and 12) it is clear that there are seven additional lineage-sorting events that BPA simply did not bother to count (in the same way this was found for the contrived example in Fig. 3). After correction both for the overcounting and undercounting, the BPA solution has 13 ad hoc events and so is less parsimonious than the 10 events required by TreeMap 1.0. This refutes DowlingÕs (2002) assertion that TreeMap 1.0 overestimates lineage sorting events and duplications while underestimating cospeciation; quite the opposite obtains in this case. In two circumstances TreeMap 1.0 more parsimoniously postulates a host-switch from one taxon to its sister, for example, the origins of HIV as a switch from chimps. It is methodologically impossible for BPA to yield such a solution (see Fig. 2A) because the switch places the ancestor
M.E. Siddall, S.L. Perkins / Cladistics 19 (2003) 554–564
561
Fig. 5. An analysis of HIV and primate phylogenies using both BPA and TreeMap 1.0. (A) The ‘‘tanglegram’’ of primate immunodeficiency viruses (with ancestors numbered for cospeciation analysis) and their hosts, rooted with feline immunodeficiency virus (FIV). (B) The TreeMap solution with 10 ad hoc hypotheses, most notably, two host-switches by the viruses from simian to human hosts. (C) The BPA solution depicted on the host phylogeny. (D) SIV lineages tracked onto the host phylogeny under BPA requires more ad hoc events than TreeMap but does not find as many hostswitches. Lineage-sorting events are marked with an asterisk. Unnecessary steps forced illogically by BPA are in outlined type face. Lineage-sorting events that are required by, but are not counted by, BPA are indicated by white arrowheads.
widespread over the two sister lineages resulting in a ‘‘synapomorphy’’ for the two in the inclusive Ored matrix. In three separate instances (Fig. 5D) BPA less parsimoniously postulates a duplication requiring (but not counting) a lineage sort immediately thereafter.
Malaria Assessing cospeciation of the causative agents of malaria by correlating the phylogeny of Plasmodium species (Perkins and Schall, 2002) on their vertebrate host phylogeny provides an empirical challenge to DowlingÕs (2002, p. 421) conclusion that BPAÕs ‘‘ghost lineages’’ are easy to interpret (Fig. 6). TreeMap yields five cospeciations, seven host-switches, three duplica-
tions, and two lineage-sorting events (Fig. 6B). BPA counts six cospeciations, four host-switches, four duplications, and admits to two lineage-sorting events (Fig. 6C) though more are actually required to explain the patterns observed. Although on its face, it appears that BPA better maximizes cospeciation, it does so at an enormous cost; a cost it mostly neglects to count. As with the HIV dataset there are many (13 in fact) uncounted lineage-sorting events in the BPA solution (Fig. 6D) such that its total cost is really 23, nearly double the cost of the TreeMap solution. Admittedly, BPA did find more cospeciation than TreeMap, but for ancestor-25 to be taken as a sixth cospeciation, ancestor29 must represent a duplication resulting in an uncounted lineage sort forced between ancestors 25 and 24, as well as an additional three uncounted lineage-sorting
562
M.E. Siddall, S.L. Perkins / Cladistics 19 (2003) 554–564
Fig. 6. (A) A ‘‘tanglegram’’ of malaria parasites (Plasmodium and Hepatocystis) based on parasite mitochondrial cytochrome b data (Perkins and Schall, 2002) and host mitochondrial 12S data (unpublished, compiled from GenBank sequences). (B) The TreeMap reconciliation of these phylogenies predicts five cospeciations, seven host-switches, three duplications, and two lineage-sorting events. (C) The parasite phylogeny recoded onto the host phylogeny via BPA. (D) The BPA solution yields more cospeciations (six), but necessitating 13 uncounted lineage-sorting events (arrowheads), rendering a total cost of 23 ad hoc events. The losses of ancestors 23, 24, and 25 (asterisked) are unintelligible.
events between ancestors 24 and 19 (Fig. 6D). TreeMapÕs involving ancestor-25 in a host-switch (Fig. 6B) pushes ancestor-24 up the tree thus eliminating four ad hoc lineage-sorting events even though it makes ancestor-23 host-switch instead of cospeciate. Treating ancestor-29 as a host-switch (Fig. 6B) has a similar effect in eliminating three of the ad hoc lineage-sorting events required by BPA (Fig. 6D). All of this results in the BPA evaluation because the method does not respect the interdependencies between nested ancestors. That Allouatta and Homo each have ancestors 23 and 24 in their respective histories is sufficient to force those ancestral parasites ‘‘down’’ the tree as ‘‘synapomorphies’’ and ignores the fact that there is a host-switch above them. BPA cannot ‘‘see’’ the losses it requires between ances-
tors 24 and 19 because the (unrelated) descendants of ancestor-23 in Alouatta, Mandrillus, and Pan each also have ancestor-24 in their histories masking the fact that the sister lineage failed to speciate. Dowling (2002, p. 421) suggested that the ‘‘ghost lineages’’ are easy to interpret and yet the loss of ancestors 23, 24, and 25 on Pan defies clear explanation. Neither ancestor-24 nor ancestor-25 are any longer in existence by the time Pan exists; their loss is simply required by the loss of their descendant (ancestor-23)—easy enough interpretation. But why is this lineage carried up past the ancestor of Pan and Homo if it has no descendants on either? The loss of 23 should be on the same internode as ancestor17 (Fig. 6C). The cause of the bizarre position for the loss of ancestors 23, 24, and 25 is, in fact, the host-switch
M.E. Siddall, S.L. Perkins / Cladistics 19 (2003) 554–564
of ancestor-18 to Homo. Ancestor-18 brings its ‘‘ghost’’ ancestors (23, 24, and 25) along for the ride and forces them to be present when Homo and Pan diverged. The fact that Homo and Alouatta have related parasites has nothing at all to do with parasites on Pan or Mandrillus. Whether or not the obvious failures of BPA in host– parasite cospeciation will somehow be strangely reinterpreted as a virtue of some alternate philosophical perspective in the way that it has been for biogeography (e.g., Brooks et al., 2001) remains to be seen. Regardless, Type II Brooks Parsimony Analysis yields unparsimonious solutions, and fails to accurately count ad hoc hypotheses relative to the alternative methods that are available. Its continued use for comparing host and parasite phylogenies should be discouraged.
Acknowledgments We thank Liz Borda, Kirsten Jensen, Tripp Macdonald, Louise Crowley, and Megan Harrison for their comments on earlier versions of the manuscript. This work was supported by grants from the National Science Foundation including DEB-0108163 to both authors and DBI-0074512 to S.L.P. and from the National Institutes of Health NIGMS 5R01GM062351-02.
References Bandoni, S.M., Brooks, D.R., 1987. Revision and phylogenetic analysis of the Amphilinidea Poche, 1922 (Platyhelminthes: Cercomeria: Cercomeromorpha). Can. J. Zool. 65, 1110–1128. Brooks, D.R., 1979. Testing the context and extent of host-parasite coevolution. Syst. Zool. 28, 299–307. Brooks, D.R., 1981. HennigÕs parasitological method: a proposed solution. Syst. Zool. 30, 299–307. Brooks, D.R., 1985. Historical ecology: a new approach to studying the evolution of ecological associations. Ann. Mo. Bot. Gard. 72, 660–680. Brooks, D.R., 1988. Macroevolutionary comparisons of host and parasite phylogenies. Annu. Rev. Ecol. Syst. 19, 235–259. Brooks, D.R., 1990. Parsimony analysis in historical biogeography and coevolution: methodological and theoretical update. Syst. Zool. 39, 14–30. Brooks, D.R., McLennan, D.A., 1991. Phylogeny, Ecology, and Behavior: A Research Program in Comparative Biology. University of Chicago Press, Chicago. Brooks, D.R., McLennan, D.A., 1993. Parascript: Parasites and the Language of Evolution. Smithsonian Press, Washington, DC. Brooks, D.R., Van Veller, M.G.P., McLennan, D.A., 2001. How to do BPA, really. J. Biogeogr. 28, 345–358. Charleston, M.A., 1998. Jungles: a new solution to the host/parasite phylogeny reconciliation problem. Math. Biosci. 149, 191–223. Charleston, M.A., Page, R.D.M., 2002. TREEMAP 2.0 A Macintosh program for the analysis of how dependent phylogenies are related, by cophylogeny mapping. Department of Zoology, University of Oxford, South Parks Road, Oxford OX1 3PS, UK. Charleston, M.A., Perkins, S.L., 2003. Lizards, malaria and jungles in the Caribbean. In: Page, R.D.M. (Ed.), Tangled Trees: Phylogeny,
563
Cospeciation, and Coevolution. University of Chicago Press, Chicago, pp. 65–92. Cracraft, J., 1988. Deep-history biogeography: Retrieving the historical pattern of evolving continental biotas. Syst. Zool. 37, 221–236. Dowling, A.P.G., 2002. Testing the accuracy of TreeMap and Brooks parsimony analyses of coevolutionary patterns using artificial associations. Cladistics 18, 416–435. Dowling, A.P.G., 2003. Erratum to ‘‘Testing the accuracy of Tree Map and Brooks Parsimony analysis of coevolutionary patterns using artificial association.’’ Cladistics 18, 416–435. Ebach, M.C., Humphries, C.J., 2002. Cladistic biogeography and the art of discovery. J. Biogeogr. 29, 427–444. Farris, J.S., Albert, V.A., Kallersjo, M., Lipscomb, D., Kluge, A.G., 1996. Parsimony jackknifing outperforms neighbor-joining. Cladistics 12, 99–124. Felsenstein, J., 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27, 401–410. Felsenstein, J., 1982. Numerical methods for inferring evolutionary trees. Q. Rev. Biol. 57, 379–404. Goldman, N., 1990. Maximum likelihood inference of phylogenetic trees, with special reference to a Poisson process model of DNA substitution and to parsimony analyses. Syst. Zool. 39, 345– 361. Hennig, W., 1966. Phylogenetic Systematics. University of Illinois Press, Urbana, IL. Hoberg, E.P., 1992. Congruence and synchronic patterns in biogeography and speciation among seabirds, pinnipeds, and cestodes. J. Parasitol. 78, 601–615. Hoberg, E.P., Adams, A.M., 1992. Phylogeny, historical biogeography and ecology of Anophryocephalus spp. (Tetrabothriidae) among pinnipeds of the Holarctic during the late Tertiary and Pleistocene. Can. J. Zool. 70, 703–719. Hovenkamp, P., 1997. Vicariance events, not areas, should be used in biogeographical analysis. Cladistics 13, 67–79. Huelsenbeck, J.P., Hillis, D.M., 1993. Success of phylogenetic methods in the four-taxon case. Syst. Biol. 42, 247–264. Kishino, H., Hasegawa, M., 1989. Evaluation of the maximum likelihood estimate of the evolutionary tree topologies from DNA sequence data, and the branching order in Hominoidea. J. Mol. Evol. 29, 170–179. Kluge, A.G., 1988. Parsimony in vicariance biogeography: A quantitative method and a Greater Antillean example. Syst. Zool. 37, 315–328. Maddison, W., Maddison, D., 1992. MacClade: Analysis of Phylogeny and Character Evolution. Sinauer, Sunderland, MA. Mindell, D.P., Schultz, J.W., Ewald, P.W., 1995. The AIDS pandemic is new, but is HIV new? Syst. Biol. 44, 77–92. Morrone, J.J., Carpenter, J.M., 1994. In search of a method for cladistic biogeography: an empirical comparison of component analysis, Brooks parsimony analysis, and three-area statement. Cladistics 10, 99–153. OÕGrady, R.T., Deets, G.B., 1987. Coding multistate characters, with special reference to the use of parasites as characters of their hosts. Syst. Zool. 36, 268–279. Page, R.D.M., 1988. Quantitative cladistic biogeography: Constructing and comparing area cladograms. Syst. Zool. 37, 254–270. Page, R.D.M., 1990. Component analysis: a valiant failure? Cladistics 6, 119–136. Page, R.D.M., 1993a. Genes, organisms, and areas: the problem of multiple lineages. Syst. Biol. 42, 77–84. Page, R.D.M., 1993b. Parasites, phylogeny, and cospeciation. Int. J. Parasitol. 23, 499–506. Page, R.D.M., 1994a. Maps between trees and cladistic analysis of historical associations among genes, organisms, and areas. Syst. Biol. 43, 58–77. Page, R.D.M., 1994b. Parallel ‘‘phylogenies’’: reconstructing the history of host–parasite assemblages. Cladistics 10, 155–173.
564
M.E. Siddall, S.L. Perkins / Cladistics 19 (2003) 554–564
Page, R.D.M., 1995. TreeMap. Computer Program, distributed by the author. University of Glasgow, Glasgow. Page, R.D.M., Charleston, M.A., 1998. Trees within trees: phylogeny and historical associations. Trends Ecol. Evol. 13, 356–359. Perkins, S.L., Schall, J.J., 2002. A molecular phylogeny of malarial parasites recovered from cytochrome b gene sequences. J. Parasitol. 88, 972–978. Planet, P.J., Kachlany, S.C., Fine, D.H., DeSalle, R., Figurski, D.H., 2003. The widespread colonization island of Actinobacillus actinomycetemcomitans. Nature Genetics 34, 193–198. Pol, D., Siddall, M.E., 2001. Biases in maximum likelihood and parsimony: a simulation approach to a ten-taxon case. Cladistics 17, 266–281. Ronquist, F., 1995. Reconstructing the history of host–parasite associations using generalized parsimony. Cladistics 11, 73–89. Ronquist, F., 1998. Three-dimensional cost-matrix optimization and maximum cospeciation. Cladistics 14, 167–172. Ronquist, F., 2000. TreeFitter. Computer Program, distributed by the author. University of Uppsala, Uppsala. Ronquist, F., 2003. Parsimony analysis of coevolving species associations. In: Page, R.D.M. (Ed.), Tangled Trees: Phylogeny, Cospeciation, and Coevolution. University of Chicago Press, Chicago, pp. 22–64. Ronquist, F., Nylin, S., 1990. Process and pattern in the evolution of species associations. Syst. Zool. 39, 323–344. Saether, O.A., 1986. The myth of objectivity—post-Hennigian deviations. Cladistics 2, 1–13. Saitou, N., Nei, M., 1987. The neighbor-joining method: A new method for constructing phylogenetic trees. Mol. Biol. Evol. 4, 406–422.
Shimodaira, H., Hasegawa, M., 1999. Multiple comparisons of loglikelihoods with applications to phylogenetic inference. Mol. Biol. Evol. 16, 1114–1116. Siddall, M.E., 1996. Phylogenetic covariance probability confidence and historical associations. Syst. Biol. 45, 48–66. Siddall, M.E., 1997. The AIDS pandemic is new, but is HIV not new? Cladistics 13, 267–273. Siddall, M.E., 2002. Phylogeny and revision of the leech family Erpobdellidae (Hirudinida: Oligochaeta). Invertebr. Taxon. In press. Siddall, M.E., Kluge, A.G., 1997. Probabilism and phylogenetic inference. Cladistics 13, 313–336. Siddall, M.E., Whiting, M.F., 1999. Long-branch abstractions. Cladistics 15, 9–24. Stammer, H.J., 1957. Gedanken zu den parasitophyloetischen Regeln und zur Evolution der Parasiten. Zool. Anz. 159, 255–267. Stoneking, M., Sherry, S.T., Vigilant, L., 1992. Geographic origin of human mitochondrial DNA revisited. Syst. Biol. 41, 384– 391. Swofford, D.L., Waddell, P.J., Huelsenbeck, J.P., Foster, P.G., Lewis, P.O., Rogers, J.S., 2001. Bias in phylogenetic estimation and its relevance to the choice between parsimony and likelihood methods. Syst. Biol. 50, 525–539. Van Veller, M.G.P., Brooks, D.R., 2001. When simplicity is not parsimonious: a priori and a posteriori methods in historical biogeography. J. Biogeogr. 28, 1–12. Van Veller, M.G.P., Kornet, D.J., Zandee, M., 2000. Methods in vicariance biogeography: assessment of the implementations of assumptions 0, 1, and 2. Cladistics 16, 319–345.