Limitations of locally sampled characters in phylogenetic analyses of sparse supermatrices

Limitations of locally sampled characters in phylogenetic analyses of sparse supermatrices

Molecular Phylogenetics and Evolution 74 (2014) 1–14 Contents lists available at ScienceDirect Molecular Phylogenetics and Evolution journal homepag...

710KB Sizes 0 Downloads 18 Views

Molecular Phylogenetics and Evolution 74 (2014) 1–14

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev

Limitations of locally sampled characters in phylogenetic analyses of sparse supermatrices Mark P. Simmons ⇑ Department of Biology, Colorado State University, Fort Collins, CO 80523, USA

a r t i c l e

i n f o

Article history: Received 10 June 2013 Revised 30 January 2014 Accepted 30 January 2014 Available online 14 February 2014 Keywords: Bayesian unlinked Character partition Maximum likelihood Missing data Parsimony Strict consensus

a b s t r a c t Empirical and simulated examples were used to demonstrate the following four points in the context of sparse supermatrices. First, locally sampled characters, when analyzed with low quality heuristic parsimony, likelihood, or Bayesian searches, can create high resolution and resampling values for clades that are properly unsupported because there is no comparable information among sets of terminals. Second, arbitrary factors that should have no effect on phylogenetic inference can create large fluctuations in congruence of trees inferred by parsimony, likelihood, and Bayesian methods with the simulated topology. Third, phylogenetic signal present in locally sampled characters may be interpreted in radically different ways depending upon the phylogenetic signal present in globally sampled characters. Fourth, application of Bayesian MCMC analyses with unlinked branch lengths among character partitions cannot be expected to universally obviate missing-data artifacts, even when numerous characters are sampled from each partition. The first three points may be addressed by conducting thorough tree searches while allowing numerous equally optimal trees to be saved from each replicate rather than relying entirely upon subtree pruning and regrafting (SPR) while saving a single optimal tree, as is the case in many contemporary empirical sparse-supermatrix analyses. Ó 2014 Elsevier Inc. All rights reserved.

1. Introduction It is widely recognized that locally sampled characters contain phylogenetic signal and generally improve phylogenetic inference (Wiens, 1998, 2003). Furthermore, adding terminals with extensive missing data may improve phylogenetic inference by breaking long branches (Wiens, 2005). Therefore, the serendipitous scaffolding approach (Johnson et al., 2012) of integrating all available characters largely irrespective of their missing-data content into a single supermatrix has been widely embraced. But locally sampled characters have their own limitations, including the potential to mislead parametric methods (Gatesy et al., 2002; Lemmon et al., 2009; Simmons, 2012a,b) and cause tree-search artifacts (Simmons and Goloboff, 2013). Phylogenetic analyses that are based on sparse supermatrices are increasingly common, ranging from broad-scale studies that are based largely or entirely on publicly available sequence data (e.g., Peters et al., 2011; Springer et al., 2012), to studies that are based on partial genomic resources (e.g., restriction-site-associated DNA, transcriptomes; Meusemann et al., 2010; Wagner et al.,

⇑ Fax: +1 970 491 0649. E-mail address: [email protected] http://dx.doi.org/10.1016/j.ympev.2014.01.030 1055-7903/Ó 2014 Elsevier Inc. All rights reserved.

2013), studies that integrate novel organellar-genome data to resolve ancient lineages while also sampling a large number of taxa with comparatively few genes sampled to resolve more recently derived clades (Davis et al., 2013), as well as more conventional studies that are based on individually amplifying several genes while also integrating publicly available data (e.g., Chatrou et al., 2012; Simmons, 2012a,b; Guo et al., 2013). There are often dramatic disparities in the amount of missing (and inapplicable) data between character partitions, such that some partitions are sampled for nearly all terminals (i.e., ‘‘globally sampled’’), whereas many others are sampled for just a minority of the terminals (i.e., ‘‘locally sampled’’; e.g., Simmons and Goloboff, 2013). Note that ‘‘locally sampled’’ is based simply on the fraction of terminals scored and has no necessary correlation with being sampled for a single lineage. Depending on the inferred tree topology and the distribution of missing data within each locally sampled character or character partition, that character may or may not be able to contribute an unambiguously optimized synapomorphy for a given clade (Malia et al., 2003; Sanderson et al., 2010; Simmons, 2012b). For example, consider a phylogenetic analysis that encompasses the Vertebrata, including 100 species of Amniota. If humans and chimps are the only members of Amniota scored for presence or absence of an amniotic egg, then presence of an amniotic egg would be

2

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

ambiguously optimized on every branch between the clade of Amniota as a whole and the two-terminal clade of humans + chimps because the two-terminal clade is highly nested within Amniota. Alternatively, if humans and swans are the only members of Amniota scored for presence or absence of an amniotic egg, then presence of an amniotic egg would be unambiguously resolved as a synapomorphy for the Amniota as a whole because the most recent common ancestor of humans and swans is the most recent common ancestor of (extant) Amniota. In this second case the optimization obtained from the incompletely scored amniotic-egg character is identical to that which would be obtained if all members of Amniota were scored for presence or absence of an amniotic egg. The effect of including a locally sampled character that provides phylogenetic signal albeit with ambiguous optimization, as with the human + chimp example described above, may vary depending upon how the character interacts with globally sampled characters and how the data are analyzed. Hence the locally sampled character may be clearly beneficial in one context yet misleading in another. An exploration of these different contexts is needed in order to provide general guidelines for assembly and phylogenetic analysis of sparse supermatrices, and that is the focus of this study. In this study empirical and simulated examples were used to examine interactions between partitions of locally sampled characters relative to each other as well as to globally sampled characters in the context of different optimality criteria (parsimony, maximum likelihood, and Bayesian posterior probabilities; Kluge and Farris, 1969; Fitch, 1971; Felsenstein, 1973; Yang and Rannala, 1997), tree searches, and model partitioning. The results were used to demonstrate misleading artifacts that may occur based on interactions between globally and locally sampled partitions and how different distributions of missing data among locally sampled partitions can contribute to unstable results. Furthermore, the results were used to demonstrate the importance of thorough tree searches that allow multiple equally optimal trees to be held irrespective of which optimality criterion is employed, the extent of phylogenetic-inference artifacts that may occur when low quality tree searches are applied, and the limits of additional character sampling and model partitioning to address these artifacts. The potential phylogenetic-inference artifacts described in this manuscript need to be considered at the following three stages of supermatrix analyses: (1) when determining which characters and terminals to include in a supermatrix, (2) which phylogenetic-inference methods to apply to the supermatrix, and (3) when assessing the branch support conferred by synapomorphies from locally sampled characters. 1.1. Three general principles Three principles described in the following paragraphs may generally (but not always) apply when comparing resolution in the strict consensus of all optimal trees between a matrix that consists entirely of characters that are globally sampled for all terminals in the study, and the same matrix that also includes characters that are only locally sampled for some terminals in the study. These principles are relevant whenever one quantifies the phylogenetic signal contributed by undersampled character partitions to an existing data matrix that is largely complete. For the vast majority of macroscopic eukaryote phylogenetic studies being conducted, there are some publicly available data that can be scored for at least a minority of the terminals included in novel phylogenetic analyses. The added resolution and branch support may initially appear impressive, but cannot always be unequivocally justified, particularly in the context of low-quality heuristic tree searches. The first principle is that any clade resolved in the strict consensus for the global + local matrix can generally only be

unequivocally supported if it is a convex group (i.e., monophyletic or paraphyletic) that is bounded by branches that are supported by synapomorphies from the globally sampled partition. The reason for this first principle is that the locally sampled characters are only informative for inferring relationships among terminals that are sampled for those characters relative to each other. By definition those locally sampled characters cannot exclude any terminals that they are not sampled for from any of the clades that they delimit. That is, the unsampled terminals should behave as wildcards (Nixon and Wheeler, 1991) and collapse any such clades in the strict consensus unless those terminals are excluded from the clade in question by the globally sampled characters. The first principle is fulfilled whenever one of the following three cases apply: (1) the clade that is present in the strict consensus for the global + local matrix is also present in the global-only matrix, (2) the clade in question is nested within another larger clade (in an unrooted sense) that is also present in the strict consensus for the global-only matrix and that larger clade only includes terminals scored for the same locally sampled characters, or (3) the clade in question encompasses another smaller clade (in an unrooted sense) that is also present in the strict consensus for the global-only matrix; this smaller clade includes any terminals that are not scored for the locally sampled character(s) that support the clade in question. The difference between the second and third cases is that the second case focuses on larger clades that encompass the clade in question whereas the third case focuses on a sub-clade within the clade in question. The second principle is that any clades present in the strict consensus for the global + local matrix but not in the strict consensus for the global-only matrix can generally only be unequivocally supported if either the second or third conditions described for the first principle apply. Otherwise, any unsampled terminal should act as a wildcard within the larger clade and collapse any such nested clades in the strict consensus for the same reason cited above for the first principle. The third principle is that the resampling-based branch-support assigned to a given clade resolved by the global + local matrix generally should not be higher than that provided for the same clade (or a larger clade that contains it when the second condition applies, or a smaller clade that includes the terminal that is not scored for the locally sampled characters when the third condition applies) in the strict consensus for the global-only matrix. The rationale for this principle is that if the globally sampled character(s) that provide support for the given clade are not sampled in a bootstrap (BS; Felsenstein, 1985) or jackknife (Farris et al., 1996) pseudoreplicate then neither that clade nor any nested clades within it can be unequivocally supported by the locally sampled characters alone. An example wherein the first and second principles do not hold is presented in Fig. 1. An example that demonstrates the second principle and also shows a case where the third principles does not hold is presented in Fig. 2.

2. Methods 2.1. Empirical examples The empirical examples consist of 347 terminals sampled for the internal-transcribed-spacer (ITS) region of nuclear rDNA (including the 30 terminus of the 18S subunit, ITS 1, the entire 5.8S subunit, ITS 2, and the 50 start of the 26S subunit for most sequences) from the plant order Celastrales. The sequence data were taken from Coughenour et al. (2010, 2011) and Simmons et al. (2012a,b)), to which 51 Madagascan terminals were added by C.D. Bacon et al. (unpublished data). This is the same matrix

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

Fig. 1. Contrived example of 11 globally sampled (i.e., for all terminals) nonadditive characters and five locally sampled (for terminals A, B, F, and G only) characters with parsimony jackknife support shown on the arbitrarily rooted strictconsensus trees. The first and second principles do not hold in this example. The first five globally sampled characters place terminal G as most closely related to A + B, whereas the last five globally sampled characters place G as most closely related to E + F. There are two most parsimonious trees, both of which have the same relationships for A ? F, but G is either resolved with A + B or E + F. Consequently, the strict consensus of the 11 globally sampled characters is an unresolved bush. The five locally sampled characters all place G as more closely related to F than it is to A or B, but are uninformative as to the relationship of G to terminals C, D, or E. Based on the interactions of the globally and locally sampled characters, there are three most parsimonious trees, all of which resolve A + B as distinct from C ? G, with 75% jackknife support (as calculated by TNT [Goloboff et al., 2008] for 100,000 pseudoreplicates with a deletion probability of 0.37). This example requires character-conflict among the globally sampled characters for which the locally sampled partition favors one of the possible resolutions of the terminal(s) for which the globally sampled characters conflict.

examined by Simmons and Goloboff (2013) in the context of alternative initial-tree construction and branch-swapping strategies. Because of alignment ambiguity in the ITS 1 and ITS 2 regions when attempting to align those regions across the entire order, an unconventional alignment approach was implemented whereby the 18S, 5.8S, and 26S regions (together with three adjacent positions from ITS 1 and nine adjacent positions from ITS 2) were globally aligned across the Celastrales whereas the remaining positions of ITS 1 and ITS 2 were only locally aligned within each of seven monophyletic or paraphyletic groups consisting of 26 to 88 terminals each that were well supported in previous analyses and/or trees generated by preliminary analyses of four plastid loci (atpB, matK, rbcL, and trnL-F). This alignment approach was derived from a presentation by K.S. MacDonald and M.E. Siddall at Hennig XXVI in 2007, which was based on Barta’s (1997) proposal on how to integrate hypervariable regions into molecular phylogenetic analyses. Preliminary nucleotide alignments were obtained using MAFFT ver. 6.5 (Katoh and Toh, 2008) with Q-INS-i for local alignments and G-INS-i for the global alignment. Manual adjustments to the alignments were then performed using the similarity criterion (Zurawski and Clegg, 1993; Simmons, 2004). Ambiguously aligned regions across all terminals were excluded and ambiguously aligned regions from individual terminals were re-scored as ‘‘?’’ for those terminals. The seven blocks of locally aligned characters were concatenated, one after the other, to the block of 260 globally aligned characters to create the ‘‘ITS_all’’ matrix, which consists of 3814

3

Fig. 2. Contrived example of 11 globally sampled (i.e., for all terminals) nonadditive characters and five locally sampled (for terminals A ? D only) characters with parsimony jackknife support shown on the arbitrarily rooted strict-consensus trees. This serves as an example for the second principle. Clades (B, E, F) and (C, B, E, F) are only present in the global + local strict consensus because the clade (E, F) is present in the global-only strict consensus. Yet the third principle does not hold in this example. The only clade present in the strict consensus for the global-only matrix is (E, F), which is supported by the first character. For the remaining 10 globally sampled characters, terminal A has autapomorphies and therefore behaves as a wildcard relative to the clades that otherwise may be unequivocally supported. In addition to the clade of (E, F), two additional clades are also resolved in the strict consensus for the global + local matrix. The locally sampled characters group terminals A and D together separate from B and C, while the second to eleventh globally sampled characters separate C and D from E and F. Consequently, terminal D no longer behaves as either a global or local wildcard and the parsimony jackknife support for (E, F) increases from 61% to 99%. This increase in resampling-based branch support is beneficial, but contrary to the third principle.

characters, including 2252 variable and 1796 parsimony-informative characters with 111–700 characters scored per terminal (mean = 623 characters). A second matrix (‘‘ITS_conserved’’), consisting of only the 260 globally aligned characters, was also analyzed. This matrix includes 90 variable and 58 parsimonyinformative characters with 8–249 characters scored per terminal (mean = 219). A third matrix (‘‘ITS_no_overlap’’) was also analyzed wherein the 260 globally aligned characters from the ‘‘ITS_all’’ matrix were staggered in the same manner as the 3554 locally aligned characters. Consequently, no scored characters are shared between the seven monophyletic or paraphyletic groups. This third matrix includes 2371 variable and 1865 parsimony-informative characters. All three matrices are posted as supplemental online data at: . 2.2. Examples incorporating simulated and contrived characters The following three examples consist of simulated characters to which a limited number of contrived characters were added to the first character partition. Nucleotide characters were simulated by using the Evolver program with MCbase.dat from the PAML ver. 4.1 package (Yang, 2007). Absolute branch lengths of 0 or 0.004 were applied. When 1000 characters were simulated, a 0.004 branch length equates to an average of four substitutions per branch and an expected maximum of 98% BS (Felsenstein, 1985). The Jukes and Cantor (1969) model was applied with equal nucleotide frequencies and no rate heterogeneity among characters. These simulation parameters were selected to minimize homoplasy while providing a sufficient number of potential synapomorphies for phylogenetic methods to resolve each branch without confounding effects. Ten replicate matrices were created for each set of simulations. Contrived characters, when added, consist of binary parsimonyinformative characters that all have identical character-state

4

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

distributions, albeit for different nucleotides. Following Simmons and Freudenstein (2011), all 12 nucleotide substitutions were represented when 12 contrived characters were added so as to conform to the Jukes-Cantor model. When only three contrived characters were added the substitutions represented are: A-T, CG, and T-C. Three contrived characters were added so that at least one of them will be sampled in 95% of the BS pseudoreplicates. 2.3. First simulated examples The first simulated examples were created to test whether the inferred phylogenetic signal present in locally sampled characters may be interpreted differently depending on the phylogenetic signal in globally sampled characters. These examples have the following three partitions: characters 1–1003 (the globally sampled characters); 1004–2003 (locally sampled characters for the odd terminals); and 2004–3003 (locally sampled characters for the even terminals). The first partition, which was scored for all 40 terminals, includes 1000 characters simulated on a pectinate tree wherein the internal branch lengths are all zero and the terminal branch lengths are all 0.004. Consequently, very few (an average of four across the 10 replicates) of these characters are parsimony-informative and all parsimony-informative characters are expected to be homoplasious when optimized (Fitch, 1971) onto the simulated topology. Characters 1001–1003 from the first partition consist of contrived parsimony-informative characters. In example 1A, the three globally sampled contrived synapomorphies are for the clades ((terminals 1–20) (terminals 21–40)), which is consistent with the simulated topology. But in example 1B, the three contrived synapomorphies are for the clades ((odd terminals) (even terminals)), which contradicts the simulated topology. A single block of 1000 characters were simulated for the second and third partitions wherein all branch lengths are 0.004, resulting in an average of 144.25 parsimony-informative characters. These 1000 simulated characters were then subdivided between the second and third partitions. The second partition was only scored for the odd terminals and the third partition was only scored for the even terminals. For example 1A, the three globally sampled contrived characters separate terminals 1–20 from terminals 21–40. The locally sampled characters in partitions 2 and 3 are insufficient to unambiguously resolve relationships within either the (terminals 1–20) or (terminals 21–40) clades. Within the (terminals 1–20) clade, for instance, partition 2 unambiguously resolves relationships among the odd terminals and partition 3 unambiguously resolves relationships among the even terminals. But there is no comparable information in partitions 2 or 3 that can resolve relationships between the odd and even terminals within either the (terminals 1–20) or (terminals 21–40) clades. Partitions 2 and 3 do contain phylogenetic signal with which to infer where to optimally connect the (terminals 1–20) and (terminals 21–40) clades to each other. But because of the distribution of missing data they should be insufficient to unambiguously identify a unique optimal connection. Based on the information noted above, the strict consensus of optimal trees should contain at least one clade (terminals 21–40), and any additional clades may only be unambiguously supported by homoplasious globally sampled simulated characters from the first partition. For example 1B, the three globally sampled contrived characters separate the odd terminals from the even terminals. The locally sampled characters in partition 2 then unambiguously resolve relationships within the (odd terminals) clade and the locally sampled characters in partition 3 unambiguously resolve relationships within the (even terminals) clade. All of these clades are contradictory to the simulated topology. The only comparable information

from parsimony informative characters with which to infer where the (odd terminals) and (even terminals) clades connect to each other comes from homoplasious (relative to the simulated topology) globally sampled simulated characters from the first partition. Because the convergent autapomorphies each only unite two terminals, the number of steps (two) for those characters is identical irrespective of whether those two terminals are adjacent in a paraphyletic group or widely separated as a polyphyletic group. Hence, all possible connections between the (odd terminals) and (even terminals) clades should be equally optimal and the strict consensus unresolved aside from the single branch separating the odd terminals from the even terminals. 2.4. Second and third simulated examples The second and third simulated examples were created to quantify the extent to which different phylogenetic-inference methods can create resolution and support despite there being no comparable information among sets of terminals, and how this resolution and support can alternately be considered advantageous or disadvantageous. The second simulated examples consist of 64 terminals but just two partitions of 1000 simulated characters each without any globally sampled or contrived characters. The characters were alternatively simulated on a completely pectinate tree (as in the first example) or a completely symmetrical tree. In both cases partition 1 was only scored for terminals 1–32 and partition 2 was only scored for terminals 33–64. Although relationships are clear among terminals 1–32 relative to each other and among terminals 33–64 relative to each other, there is no comparable information shared between those two groups of terminals. Hence my null expectation was that the strict consensus should be completely unresolved for matrices derived from both the pectinateand symmetrical-topology simulated characters. The third simulated examples are identical to the second simulated examples except that partition 1 was only scored for the odd terminals and partition 2 was only scored for the even terminals. Once again, my null expectation was that the strict consensus should be completely unresolved for matrices derived from both the pectinate- and symmetrical-topology simulated characters. 2.5. 10,000-Character simulated examples Based on preliminary results generated for the 1000-character simulated examples described in Sections 2.2–2.4, I extended example 1B and the third pectinate example to include 10,000 simulated characters per partition rather than just 1000 simulated characters. This was done to test if those methods that produced erroneous topologies with high branch support for these examples are positively misleading (Felsenstein, 1978) under these conditions. The third pectinate matrices with 10,000 simulated characters per partition are the same ones used by Simmons and Goloboff (2013). For example 1B, rather than increase the number of contrived characters in partition 1 from three to 30, a more modest increase to 12 contrived characters was applied. The use of 12 rather than 30 contrived characters was useful because the lower number of contrived characters made it easier for methods that produced misleading topologies (relative to both the simulated characters and strict consensus that should include a maximum of one clade) to overcome their misleading inference because of the greater ratio of simulated characters to contrived misleading characters. The 10,000-character 1B examples were extended by progressively reducing the number of contrived misleading characters from 12 to 6, 4, 3, 2, 1, and 0. These examples were performed to identify the threshold (when applicable), and rate of decrease in resolution and support, at which the misleading signal from the

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

globally sampled contrived characters is overturned by the phylogenetic signal from the locally sampled simulated characters. 2.6. Parsimony tree searches Two alternative parsimony-tree-search methods were applied to the examples – one using TNT ver. 1.1 for relatively thorough searches and a second using PAUP ver. 4.0b10 (Swofford, 2001) for relatively low quality searches that are roughly comparable to the GARLI analyses (Zwickl, 2006) wherein only a single optimal tree is held for the matrix as a whole as well as within each BS pseudoreplicate. Although bootstrapping has known problems caused by uninformative characters and upweighting characters (Carpenter, 1996; Freudenstein and Davis, 2010; Simmons and Freudenstein, 2011), I used this resampling method because it is the only one implemented in GARLI and RAxML (Stamatakis, 2006) and consistent usage of the BS enables a more direct comparison with the resampling values generated by PAUP and TNT. The relatively thorough TNT analyses of the ITS example were conducted by using a two-part tree search. Branches with a minimum possible optimized length of zero were collapsed. In the first part up to 50 trees were held (Davis et al., 2005) within each of 5000 random-addition-sequence (RAS) tree-bisection-reconnection (TBR) searches that also implemented 100 ratchet (Nixon, 1999) iterations, which alternated between equal character weighting and each character having a 10% chance of being upweighted and a 5% chance of being downweighted. The second part of the search consisted of TBR swapping on all trees obtained from the first part of the search with up to 500,000 trees retained, after which the strict consensus was calculated. One thousand BS pseudoreplicates were implemented. Each pseudoreplicate consisted of 1000 TBR searches that each held up to 50 most parsimonious trees. The TNT analyses of the simulated examples consisted of 1000 RAS TBR searches that each held up to 50 most parsimonious trees. Branches with a minimum possible optimized length of zero were collapsed. The strict consensus of all most parsimonious trees found was then saved. All 1000 BS pseudoreplicates implemented the same tree-search strategy, albeit with 100 RAS TBR searches per pseudoreplicate. The relatively low quality PAUP analyses of both the empirical and simulated examples consisted of 1000 RAS TBR searches with only a single tree held per search. These lower quality tree searches were conducted to emulate those performed in GARLI, PhyML (Guindon et al., 2010), and RAxML. Branches with a minimum possible optimized length of zero were collapsed. All 1000 BS pseudoreplicates implemented the same tree-search strategy, albeit with only 1 RAS TBR search per pseudoreplicate for the empirical examples and 10 RAS TBR searches for the simulated examples. Rather than calculating a strict consensus of all most parsimonious trees found for each matrix, only the first tree (of up to 1000 equally parsimonious trees) saved was retained. Likewise for each BS pseudoreplicate, wherein up to 10 equally parsimonious trees could be saved. Any additional trees saved from a given pseudoreplicate were deleted. Bootstrap percentages were then mapped onto the strict consensus (for TNT) or first most parsimonious tree found (for PAUP) by using SumTrees ver. 3.11 (Sukumaran and Holder, 2010) or TreeGraph2 ver. 2.0.45 (Stöver and Müller, 2010). Bootstrap percentages 650 were eliminated in TreeGraph2. 2.7. Parametric tree searches Likelihood analyses were performed on all simulated matrices by using GARLI ver. 2.0.1019. Branches with a length of 1  108 were collapsed in the same manner that PAUP collapses these

5

branches in likelihood analyses (Swofford, 2001). The GARLI analyses were performed by using the lowest settings for an intensive search recommended by Zwickl (2009; streefname = stepwise; attachmentspertaxon = 50, genthreshfortopoterm = 20,000, numberofprecreductions = 20, treerejectionthreshold = 100) for both optimal-tree searches (1000 search replicates) and the BS (1000 pseudoreplicates, each with 1 search for the empirical examples and 10 searches for the simulated examples). The GTR + C model with four rate categories was applied to the three empirical matrices. Both unpartitioned and partitioned GARLI analyses of these empirical matrices were performed for the ITS_all and ITS_no_overlap matrices. For the ITS_all matrix, the 260 globally aligned characters were treated as one partition and each of the seven groups of locally aligned characters was treated as a separate partition. For the ITS_no_overlap matrix, each of the seven groups of locally aligned characters (including the formerly globally aligned characters) was treated as a separate partition. Different model parameter values and a different rate multiplier were allowed for each partition. Because the simulated characters were generated using a Jukes–Cantor model without any invariant sites or rate heterogeneity among characters, the same model was applied for the GARLI analyses of the simulated examples. Each character partition described in Sections 2.3–2.5 was also delimited in the GARLI analyses. The same model was applied to all partitions but each partition was allowed to have its own rate multiplier. To test the generality of the GARLI results to other likelihood implementations, the study was expanded to include selected PAUP and RAxML analyses for simulated examples in which the GARLI results were of particular interest. PAUP likelihood analyses on the third-example matrices were limited to between 62 and 86 optimal-tree TBR searches and 20 BS pseudoreplicates. As with the GARLI analyses, the Jukes–Cantor model was applied in PAUP. No rate multiplier was allowed across partitions. As in the TNT parsimony analyses, searches for the optimal tree(s) were performed using 1000 RAS TBR searches with up to 50 trees held per search and a strict consensus was calculated from all optimal trees found. Each of the 1000 BS pseudoreplicates consisted of 10 RAS TBR searches with up to 50 trees held per search. The frequency-within-replicates BS implementation in PAUP was applied (in contrast to the strict-consensus BS implementation in TNT; Davis et al., 1998) and BS percentages P50 were mapped onto the strict consensus by using TreeGraph2. RAxML ver. 7.2.6 likelihood analyses were run using the GTRGAMMA model and 1000 independent searches starting from randomized parsimony trees. Bootstrap support for the clades resolved on the single optimal tree was calculated in RAxML ver. 7.0.3. Each of the 1000 BS pseudoreplicates consisted of ten searches using the ‘‘–f i’’ option. Each of the character partitions was allowed to have different model-parameter values. Bayesian MCMC analyses were performed by using MrBayes ver. 3.2.1 (Ronquist et al., 2012). As with the GARLI likelihood analyses, the GTR + C model was applied for Bayesian analyses of the empirical examples. Ten independent MCMC runs that each consisted of four chains and 20,000,000 generations were performed for the ITS_conserved and ITS_all matrices, whereas five runs were performed for the ITS_no_overlap matrix. Each run was sampled every 1000 generations and the burn-in was set at 10,000,000 generations. The post-burn-in trees from all five or ten runs (as applicable) were then combined in MrBayes to estimate posterior probabilities for each clade. Convergence between the independent runs was assessed by the potential scale reduction factor being close to 1 for all parameters; in no case were substantial deviations observed. As with the GARLI and PAUP likelihood analyses, the Jukes– Cantor model was applied for Bayesian analyses of the simulated

6

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

matrices. Ten independent MCMC runs that each consisted of four chains and 10,000,000 generations were performed for each matrix. Each run was sampled every 1000 generations and the burnin was set at 5,000,000 generations. Posterior probabilities and convergence were assessed as described above for the empirical examples. The ITS_all and ITS_no_overlap matrices were analyzed by using the following two alternative approaches in MrBayes: (1) unpartitioned and (2) partitioned wherein each partition was allowed to have its own rate multiplier (ratepr = variable), nucleotide frequencies, rate matrix, and gamma distribution. The simulated examples were also analyzed by using two alternative approaches in MrBayes. In both approaches each of the character partitions was defined. In the first approach, each partition was allowed to have its own rate multiplier (i.e., the same partitioned approach as for the empirical matrices). In the second approach, each partition was allowed to have its own branch lengths (unlink brlens=(all)).

2.8. Quantification of resolution and branch support Two measures of resolution in the optimal tree or consensus (when applicable) were applied as were two measures that also incorporated branch support for the resolved clades. First, the number of clades resolved in the optimal tree/consensus were simply summed and reported as ‘‘resolution.’’ Second, congruence with the simulated topology (not applicable to the empirical ITS analyses) was quantified by using the overall success of resolution (OSR), which assigns +1 for congruent clades, 0 for unresolved clades, and -1 for incongruent clades (Simmons and Webb, 2006). For example, the OSR can range from 37 to -37 for the 40terminal simulations and would be 0 for a completely unresolved tree. Support was integrated into the first two measures by linearly scaling BS support to that provided by one to four uncontradicted synapomorphies, while also incorporating those clades with 50– 62% support (Simmons et al., 2010). Clades with 50–62% support (i.e., less than that provided by one uncontradicted synapomorphy) were set to 0.2, 63–85% support to 0.4, 86–94% support to 0.6, 95– 97% support to 0.8, and 98–100% support (equivalent to at least four uncontradicted synapomorphies) to 1. For the third measure this is presented as ‘‘scaled support.’’ Finally, for the fourth measure, the averaged overall success of resolution (aOSR) is reported for analyses of the simulated examples. For example, an aOSR score of 37 in the 40-terminal simulations indicates that all 37 possible clades are resolved, congruent with the simulated topology, and received P98% BS support or 0.98 posterior probability. Topological congruence with the simulated topology for each of the simulated examples was initially assessed by using PEST ver. 2.2 (Zujko-Miller and Miller, 2003). But an error in PEST was frequently observed such that the number of incongruent clades (but not the number of congruent clades) was overestimated for trees that included polytomies. Consequently, all PEST results for incongruent clades were examined in TreeGraph2 and manually fixed as necessary. Averages across all 10 replicates for each of the simulated examples are presented below for each of the four quantification measures. In several cases there was high variance for a given measure among the replicates (the raw data are posted at: http://rydberg.biology.colostate.edu/Research/) because some methods were radically unstable to slight permutations among the replicates for a simulated example. In contrast, some methods were invariant for a given measure among the simulated replicates. Consequently, ANOVA analyses were not performed to test for significant differences

between methods because the assumptions would have been violated.

3. Results 3.1. Empirical examples Of the 347 terminals sampled, five are outliers by having P35% missing/inapplicable data for the 260 characters in the ITS_conserved matrix, ranging from 58% to 97% (Table 1). These five terminals were selected for detailed examination to determine if their resolution by some methods in the ITS_all matrix could be unequivocally supported based on the three general principles presented in Section 1.1. If these terminals could not be unequivocally supported based on these principles then their resolution would be regarded as unjustified. Only those clades with P50% support and within which one of the five terminals was nested were examined. The consensus or optimal tree, as applicable, for each method was viewed in MacClade ver. 4.08 (Maddison and Maddison, 2001). All of the ITS_conserved characters for which the given terminal was scored were optimized onto the tree by using Fitch optimization to determine if any of them could be optimized as a synapomorphy for the clade(s) in question, irrespective of how homoplasious the character is or whether the optimization is ambiguous. A qualification with this approach is that consensus trees, as in the TNT and Bayesian results, can hide underlying resolution in the optimal trees that may be relevant for optimization (Nixon and Carpenter, 1996). The results for these five outlier terminals are presented in Table 1. A single (potential, if ambiguously optimized, as for Torralbasia cuneifolia) synapomorphy for one (Euonymus fortunei, Torralbasia cuneifolia) or two (Hippocratea volubilis 6923) of the clades within which the terminal was nested in the Celastraceae ingroup (or those clades separating the terminal from a P 30terminal polytomy, when applicable) could be identified in the ITS_conserved matrix for which the terminal was scored in three of the five cases. In contrast, not a single such synapomorphy could be identified for Astrocassine fusiformis 2917 or Maytenus domingensis. The resolution and support provided by some methods for these two terminals appears to be unjustified based on the first general principle stated in the introduction. For the three cases in which at least one (potential) synapomorphy from the ITS_conserved matrix was identified, it is good to see that those least inclusive clades received at least as high of support as any of the more inclusive clades, which is consistent with the third general principle. Yet those least inclusive clades were generally assigned unusually high support values given that they are each premised on a single globally sampled synapomorphy (including those from homoplasious characters) from the ITS_conserved matrix for which the terminal in question was scored. As noted by Felsenstein (1985) and Farris et al. (1996), a single uncontradicted synapomorphy that shows no homoplasy and is scored for all terminals should confer just 63% BS or jackknife support. If that character from the ITS_conserved matrix is not sampled in a given BS pseudoreplicate then the terminal in question should be completely unresolved in trees inferred from both the ITS_conserved and ITS_all matrices. Yet BS supports ranged up to 99% and posterior probabilities ranged up to 1.0. Another concern is that one or more of those clades that were resolved and assigned P50% support in the ITS_all tree by a given method was also supported in the ITS_no_overlap tree for three of the five terminals (Table 1). Yet all such resolution and support in the ITS_no_overlap tree is by definition spurious because terminals that were exclusively scored for different partitions have no

7

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

Table 1 Branch-support values P50% from all three ITS matrices for the five outlier terminals with P35% missing/inapplicable data for the 260 characters in the ITS_conserved matrix. When multiple support values are listed, they are ordered from the least inclusive clade to the most inclusive clade. Support values that are in bold font and underlined are those for which a single potential (if ambiguously optimized) synapomorphy could be identified in the ITS_conserved matrix for which the terminal was scored. ‘‘unresolved’’ = the terminal was part of a large (P30-terminal) polytomy; ‘‘0’’ = the terminal was resolved within Celastraceae but none of the nested clades received P50% support; ‘‘N/A’’ = partitioned analyses are not applicable to the ITS_conserved matrix. Method

ITS_conserved

ITS_all

ITS_no_overlap

Astrocassine fusiformis 2917 TNT ratchet PAUP TBR GARLI unpartitioned GARLI partitioned Bayesian unpartitioned Bayesian partitioned

89% missing/inapp. unresolved 0 0 N/A unresolved N/A

unresolved 67, 67 53, 75 58, 76 99, 100 62, 72, 99, 100

unresolved 0 0 0 unresolved unresolved

Euonymus fortunei TNT ratchet

82% missing/inapp. unresolved

96, 88

unresolved

PAUP TBR

0

94, 84, 87

0

GARLI unpartitioned

0

99, 76, 93, 62

97, 68, 86

GARLI partitioned

N/A

99, 84, 95, 68

96, 71

Bayesian unpartitioned

unresolved

100, 98, 97, 97

83 57

Bayesian partitioned

N/A

100, 99, 96, 94

70% missing/inapp. unresolved 0

unresolved

unresolved

94, 54, 52

88

GARLI unpartitioned

0

97, 66, 75, 60

93

GARLI partitioned

N/A

98, 67, 76, 60

93

Bayesian unpartitioned

55 N/A

100, 96, 100, 51, 100 100, 96, 100, 100

52 unresolved

Maytenus domingensis TNT ratchet PAUP TBR GARLI unpartitioned GARLI partitioned Bayesian unpartitioned Bayesian partitioned

97% missing/inapp. unresolved 0 unresolved N/A unresolved N/A

unresolved 62 0 0 79 69

unresolved 67, 57, 53 80 80 56 unresolved

Torralbasia cuneifolia TNT ratchet PAUP TBR GARLI unpartitioned

58% missing/inapp. unresolved 0 0

unresolved 0

Hippocratea volubilis 6923 TNT ratchet PAUP TBR

Bayesian partitioned

62

unresolved 0 0

GARLI partitioned

N/A

68, 58

0

Bayesian unpartitioned

73, 56 N/A

97

0

94

0

Bayesian partitioned

comparable information and therefore should behave as wildcards relative to each other. These results from the ITS_no_overlap matrix reinforce the conclusion that the support values assigned to the equivalent clades in the ITS_all tree are inflated. Despite there being no supported clades in the ITS_conserved or ITS_no_overlap strict consensus and just 17 such clades in the ITS_all strict consensus, 3, 121, and 170 clades, respectively, were assigned P50% support in the BS majority-rule consensus (Table 2). The BS percentages assigned to properly unsupported clades (i.e., clades that are not present in the strict consensus of all most parsimonious trees; Goloboff et al., 2003) ranged from 54–62% (average of 59%) for the ITS_conserved matrix, 51–85% (average of 66%) for the ITS_no_overlap matrix, and 50–99% (average of 79%) for the ITS_all matrix. For the ITS_all matrix this included 56 clades with P90% BS, including 37 clades with P95% BS and four clades with 99% BS. For comparison, the BS percentages assigned to the 17 properly supported clades ranged from <50–97% (average of 83% for the 16 clades with >50% BS). Hence the BS percentages assigned to properly supported clades were only 4% higher, on average, than those assigned to properly unsupported clades. These results indicate that the highest unjustified BS support was obtained for the ITS_all matrix, followed by the ITS_no_overlap matrix, and lastly the ITS_conserved matrix, for which the unjustified support was minor. The same ordering is also apparent

for the PAUP TBR scaled-support results (Table 2). Furthermore, the same ordering also applies to unsupported resolution as demonstrated by the PAUP TBR resolution results. These orderings of resolution and scaled support are not unique to parsimony—they both directly apply to all of the likelihood and Bayesian results. Extrapolation of the parsimony-based inferences of unsupported resolution and unjustified support suggests that much of the likelihood- and Bayesian-based resolution and support for the ITS_conserved and ITS_all matrices is also liable to be unjustified and that they are artifacts caused by heuristic shortcuts when conducting tree searches. Congruence between the parsimony-based TNT BS MRC and the PAUP TBR trees relative to the likelihood and Bayesian trees for the ITS_all matrix was assessed to check the extent to which properly unsupported parsimony clades that were assigned P50% BS were also assigned P50% BS by likelihood and P0.5 posterior probability (PP) by Bayesian analyses. The partitioned-analysis results were used because those trees were assigned higher scaled support than the equivalent unpartitioned-analysis results (Table 2). The 190 clades with P50% BS in the likelihood-partitioned tree inferred from the ITS_all matrix included 16 of the 17 clades resolved in the parsimony strict consensus, 157 of the 172 properly unsupported clades that were assigned P50% BS in the parsimony-based TNT BS MRC or the PAUP TBR tree, and 17 novel

8

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

Table 2 Resolution and scaled-support results for the empirical 347-terminal ITS analyses. Method

Conserved

All

No overlap

Resolution

Support

Resolution

Support

Resolution

Support

Parsimony TNT ratchet TNT bootstrap MRC PAUP hold/1

0 3 121

0 0.6 10.2

17 170 311

8.2 86.6 110.4

0 121 277

0 40.8 69

Likelihood GARLI unpartitioned GARLI partitioned

134 N/A

8.6 N/A

321 314

103.6 107.6

325 320

88.4 86.8

Bayesian MCMC Bayesian unpartitioned Bayesian partitioned

75 N/A

40.8 N/A

236 233

180.2 184.4

118 97

54.2 27.6

clades. Likewise, the 233 clades with P0.5 PP in the Bayesian-partitioned tree inferred from the ITS_all matrix included 16 of the 17 clades resolved in the parsimony strict consensus, 167 of the 172 properly unsupported clades that were assigned P50% BS in the parsimony-based TNT BS MRC or the PAUP TBR tree, and 50 novel clades. Hence, there is high topological congruence between properly unsupported parsimony clades and the likelihood and Bayesian trees. Scatterplots comparing the parsimony-based BS values for the 172 properly unsupported clades with the likelihood BS values P50% for those same clades are presented in Fig. S1A and B. Equivalent results for Bayesian posterior probabilities P0.5 are presented in Fig. S1C and D. These include three conflicting clades with P50% BS in the likelihood tree and three conflicting clades with P0.5 posterior probability in the Bayesian tree. In all four scatterplots a positive correlation is observed, which is particularly strong for the likelihood comparisons. The lower correlations for the Bayesian comparisons is probably caused by the general observation (even for properly supported clades) that Bayesian posterior probabilities are inflated (e.g., Suzuki et al., 2002; Cummings et al., 2003; Simmons et al., 2004), and hence quickly saturate at 1.0 PP. The correlations observed between the parsimony BS values that are assigned to properly unsupported clades with the likelihood topology and BS values as well as the Bayesian topology and posterior probabilities reinforce the earlier inference that much of the likelihood- and Bayesian-based resolution and support for the ITS_all matrix is liable to be unjustified and that they are probably artifacts caused by heuristic shortcuts when conducting tree searches. 3.2. First simulated examples The resolution, OSR, scaled support, and aOSR for the first simulated examples are shown in Table 3. In these examples the first partition was scored for all terminals, and consequently all methods examined, including TNT, produced both resolution and scaled support on their optimal trees. For example 1A, an average of 3.6 clades (rather than an average of just one clade) were resolved on the TNT consensus trees. Of the 11 two-terminal clades resolved in the ten replicates, 10 were identified as having convergent apomorphies among the first 1000 simulated characters. The other 15 non-20-terminal clades (across replicates 2 and 8), all of which received <50% bootstrap support, were resolved based on interactions between the globally and locally sampled parsimonyinformative characters. This unexpected resolution on the parsimony strict consensus was found to occur only when two separate globally sampled homoplasious characters from partition 1 each united an odd terminal and an even terminal within either the (terminals 1–20) or (terminals 21–40) clades. The resolution and scaled support provided by the other four methods were all greater than that provided by TNT, with resolu-

Table 3 Results for the pectinate-topology, 3003-character, 40-terminal simulations (i.e., the first simulated examples), averaged across all ten replicates for each of the two sets of matrices. Method

Terminals 1–20 vs. 21–40 TNT PAUP hold/1 pars. Bayesian partitioned Bayesian unlinked GARLI

Resolution

(example 1A) 3.6 31.5 29.3 14.5 32

Odd vs. even terminals (example 1B) TNT 1 PAUP hold/1 pars. 34.9 Bayesian partitioned 36.7 Bayesian unlinked 36.3 GARLI 35.7 PAUP likelihood 35.2 RAxML 37

Overall success of resolution

Scaled support

Averaged overall success of resolution

-1.6 -26.9 15.9 7.7 18

1 4.4 11.2 4.7 9.7

0.3 -3.2 8 2.6 7.6

-1 -34.9 -36.7 -36.3 -35.7 -35.2 -37

0.4 9.1 17 15.5 13 10.8 12.8

-0.4 -9 -17 -15.5 -13 -10.8 -12.8

tion ranging from 14.5 to 32 clades and scaled support ranging from 4.4 to 11.2. The OSR and aOSR are, on average, negative for PAUP hold/1 parsimony analyses, whereas they are positive for the Bayesian and likelihood analyses. For example 1B, the TNT strict consensus always resolved a single internal branch—that separating the 20 odd terminals from the 20 even terminals, as may be expected. Yet despite there being no comparable information indicating where to connect the (odd terminal) clade to the (even terminal) clade, all other methods provided an average of 34.9–37 resolved clades (the maximum possible is 37), and 9.1–15.5 scaled support. Because of the three misleading globally sampled contrived characters and the distribution of missing data in partitions 2 and 3, generally all of the resolution and scaled support was contradictory to the simulated topology, which is evident when comparing the resolution with the OSR and the scaled support with the aOSR. Hence all other methods, including the added methods of PAUP likelihood and RAxML, provided from 33.9 to 36 lower OSR and 8.6–16.4 lower aOSR than did TNT. Results for the pectinate-topology, P30,000-character, 40-terminal simulations in which the second partition is only scored for the odd terminals and the third partition is only scored for the even terminals are presented in Fig. 3 and Table S1. These simulations are equivalent to the example-1B simulations, except that all three simulated partitions contain an order of magnitude more characters (10,000 vs. 1000) and there are up to 12 globally sampled contrived characters that separate the odd terminals from the even terminals. In comparing these results (using 12 globally sampled contrived characters for the P30,000-character matrices),

9

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

there is little change, on average, for any of the methods except for RAxML, wherein the scaled support increased by an average of 60%, and the aOSR correspondingly decreased by an average of 60% from -12.8 to -20.5 (Table S1). Hence the results for all of the other methods were relatively stable despite the increase in character sampling. Within the 30,000-character simulations, the number of globally sampled contrived characters that separate the odd terminals from the even terminals ranged from 12 to 0. Two interesting fluctuations in scaled support and aOSR (Fig. 3B) are as follows. First, the curves for the PAUP likelihood, GARLI partitioned, and RAxML partitioned methods are roughly similar to each other as they approach zero when fewer contrived characters were sampled. PAUP likelihood has the lowest scaled support in the simulations with 12 globally sampled contrived characters and approaches zero fastest, followed by GARLI, and then RAxML. This ordering is identical to that proposed by Simmons and Norton (2013) based on the severity of artifacts entailed by each of these implementations of likelihood. Second, most of the methods gradually approach zero for scaled support and aOSR as the number of globally sampled contrived characters decreased. By contrast, Bayesian unlinked showed, on average, consistent resolution, OSR, scaled support, and aOSR irrespective of whether 12 ? 1 globally sampled contrived characters were included. In particular, the aOSR held steady, on average, from -15.5 to -15.9 and only approached zero (dropping precipitously to -0.2) when no such characters were sampled. The consistently high posterior probabilities followed by a precipitous drop were also reflected in the posterior probabilities that the Bayesian-unlinked method assigned to the branch separating the odd from the even terminals, from 1.0 when 12 ? 1 globally sampled contrived characters were included, to being contradicted when no such characters were sampled (Fig. S2). 3.3. Second and third simulated examples

Fig. 3. Results for the pectinate-topology, P30,000-character, 40-terminal simulations in which the second partition is only scored for the odd terminals and the third partition is only scored for the even terminals. The X-axis indicates the number of synapomorphies added to the first partition for the 20-terminal odd/ even clade. A. Solid lines = resolution, dashed lines = overall success of resolution on the optimal tree or strict consensus (as applicable). B. Solid lines = scaled support, dashed lines = averaged overall success of resolution.

The resolution, OSR, scaled support, and aOSR for the second and third simulated examples are shown in Tables 4 and 5, respectively. As expected given that there is no comparable information between the two groups of terminals scored for partitions 1 and 2, the TNT-based resolution (and consequently all other measures for this method) was always zero for both the second and third examples. In contrast, all other methods provided an average of 0.3–59.4 resolved clades and 0.1–32.7 scaled support for the second example (Table 4). The Bayesian-unlinked method consistently provided at least an order of magnitude lower resolution and scaled support than did PAUP hold/1 parsimony, Bayesian partitioned, or GARLI for both the second and third examples (Tables 4 and 5).

Table 4 Results for the 64-terminal, 2000-character simulations in which terminals 1–32 share scored characters and terminals 33–64 share scored characters (i.e., the second simulated examples), averaged across all ten replicates for each of the two sets of matrices. Method

Resolution

Overall success of resolution

Scaled support

Averaged overall success of resolution

Symmetrical topology TNT PAUP hold/1 pars. Bayesian partitioned Bayesian unlinked GARLI

0 54.9 55 5.3 59.4

0 37.5 54.8 5.3 45

0 19.5 32.7 1.3 26.4

0 19.5 32.7 1.3 26.4

Pectinate topology TNT PAUP hold/1 pars. Bayesian partitioned Bayesian unlinked GARLI

0 55.9 42 0.3 59.4

0 -3.9 0.8 0.3 -10.2

0 9 16.3 0.1 16.1

0 0.5 0.8 0.1 -0.9

10

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

Table 5 Results for the 64-terminal simulations in which the odd terminals share scored characters and the even terminals share scored characters (i.e., the third simulated examples), averaged across all ten replicates for each of the three sets of matrices. Overall success of resolution

Scaled support

Averaged overall success of resolution

Symmetrical topology 2000 characters TNT 0 PAUP hold/1 pars. 54.5 Bayesian partitioned 53.8 Bayesian unlinked 0.2 GARLI 61

0 -54.5 -53.8 -0.2 -61

0 18.6 26.4 0.04 25.6

0 -18.6 -26.4 -0.04 -25.6

Pectinate topology 2000 TNT PAUP hold/1 pars. Bayesian partitioned Bayesian unlinked GARLI PAUP likelihood RAxML

0 -57.7 -40 -0.2 -60.8 -0.2 -61

0 12 15.2 0.04 19.4 0.2 21.4

0 -12 -15.2 -0.04 -19.4 -0.2 -21.4

0 -56.1 -13.7 -0.3 -61 -61

0 9.8 4.7 0.1 22.1 21.3

0 -9.8 -4.7 -0.1 -22.1 -21.3

Method

Resolution

characters 0 57.7 40 0.2 61 0.2 61

Pectinate topology 20,000 characters TNT 0 PAUP hold/1 pars. 57.3 Bayesian partitioned 13.7 Bayesian unlinked 0.3 GARLI 61 RAxML 61

The amount of resolution was roughly similar by PAUP hold/1 parsimony, Bayesian partitioned, and GARLI on the symmetrical and pectinate topologies, yet there are large discrepancies for all three of these methods with respect to the OSR, scaled support, and aOSR (Table 4), with consistently higher values for all three measures on the symmetrical topology. For example, the OSR and aOSR are 45 and 26.4, respectively, for GARLI on the symmetrical topology, and -10.2 and -0.9, respectively on the pectinate topology. These discrepancies occurred despite there being similar and low levels of homoplasy on both simulated topologies. The average CI on the most parsimonious PAUP hold/1 trees is 0.93 on both the symmetrical and pectinate topologies, while the average RI is 0.97 on the symmetrical topology and 0.99 on the pectinate topology. The OSR and aOSR were generally positive for PAUP hold/1 parsimony, Bayesian partitioned, Bayesian unlinked, and GARLI on both the symmetrical and pectinate topologies for the second example (Table 4), whereas they were both consistently negative for all four measures on both the symmetrical and pectinate topologies for the third example (Table 5). In comparing the resolution with the OSR and the scaled support with the aOSR for the third example, it is clear that all or almost all of the resolution and scaled support are contradictory to the simulated topology (Table 5). As with the second examples, the contradictory optimal topologies occurred despite low levels of homoplasy on both simulated topologies. The average CI on the most parsimonious PAUP hold/1 trees is 0.87 on the symmetrical topology and 0.92 on the pectinate topology, while the average RI is 0.94 on the symmetrical topology and 0.99 on the pectinate topology. In contrast to GARLI and RAxML results for the pectinate topology, the PAUP likelihood analyses provided an average of nearly zero resolution and scaled support (Table 5). Hence the PAUP likelihood analyses performed similarly to the TNT and Bayesian-unlinked methods. When the pectinate topology for the third example was re-done by increasing the number of simulated characters by an order of magnitude, neither the OSR nor the aOSR substantially changed for GARLI or RAxML (Table 5). In contrast, the OSR increased dramatically from -40 to -13.7 and the aOSR increased from -15.2 to -4.7 for the Bayesian-partitioned method as a result of there being less resolution and scaled support, respectively. A

much lower increase in OSR and aOSR was observed for PAUP hold/1 parsimony. Curiously, there were generally large discrepancies in branch lengths between the two groups of terminals (1–32 vs. 33–64) in the Bayesian partitioned trees for both the symmetrical and pectinate topologies from the second example. Both the internal and terminal branch lengths were very short for the parts of the trees containing terminals 1–32 and very long for the parts of the trees containing terminals 33–64, or vice versa. A similar, but generally less pronounced, discrepancy was observed between the odd-terminal and even-terminal portions of the trees from the third example, but only for the 2000-character matrices. No such branchlength discrepancies were observed in the GARLI trees (results not shown).

4. Discussion 4.1. Empirical examples Comparison of the resolution and scaled support among the ITS_conserved, ITS_all, and ITS_no_overlap matrices (Table 2) demonstrates the extent to which locally sampled characters in sparse supermatrices, when analyzed with low quality heuristic searches, can create high resolution and resampling values for clades that are properly unsupported (i.e., not present in the strict consensus of all most optimal trees). For five of the seven methods examined (not including the TNT ratchet or Bayesian partitioned results), the resolution and scaled support are higher for the ITS_no_overlap matrix than for the ITS_conserved matrix despite there being no possibility of properly supported clades in the former. Still higher resolution and support were inferred by all seven methods for the ITS_all matrix. Some of this resolution and support is justified based on the 17 clades resolved in the TNT-ratchetbased strict consensus with 8.2 scaled support. But the large majority of the far higher resolution and support reported for the ITS_all matrix relative to the ITS_conserved matrix appears to be unjustified based on the following four factors: (1) the three general principles presented in the introduction, (2) that much of this additional resolution and support is also provided by the ITS_no_overlap matrix (Tables 1 and 2), (3) comparison with the ratchet-based strict consensus, and (4) resampling-value congruence for properly unsupported clades (by parsimony) between the parsimony, likelihood, and Bayesian analyses from the ITS_all matrix (Fig. S1). Based on these results, low quality heuristic tree searches, such as those performed by RAxML, should not be exclusively relied upon for analyzing sparse supermatrices that include many terminals (contra, e.g., Fabre et al., 2009; Peters et al., 2011). Rather, more sophisticated tree searches, such as those integrating the ratchet, sectorial searches, tree-drifting, and/or tree-fusing, should be applied while allowing numerous equally optimal trees to be saved from each replicate and preferably implementing a twostage search so that the most optimal trees identified in the first stage may be used to identify other optimal trees on the same tree island(s) during the second stage (Goloboff and Pol, 2005; Davis et al., 2005). The most efficient use of the extra computing power required is probably to apply it to the matrix as a whole and then map support values onto the strict consensus rather than slightly improving tree-search quality within each resampling pseudoreplicate. By doing the former, inferred resampling support will hopefully be limited to properly supported clades even if high resampling values are assigned to properly unsupported clades (e.g., compare scaled support for TNT-ratchet vs. TNT-bootstrapMRC for the ITS_all and ITS_no_overlap matrices in Table 2; Simmons and Freudenstein, 2011).

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

In cases where low quality tree searches for sparse supermatrices are unavoidable (as in GARLI, PhyML, and RAxML analyses for which only a single optimal tree may be saved for the matrix as a whole as well as every bootstrap pseudoreplicate and lazy SPR branch-swapping is relied upon), I suggest also employing a thorough parsimony analysis (as may be done in TNT). For any clades that are resolved and highly supported by GARLI, PhyML, or RAxML but not by parsimony, and from which biological conclusions will be based upon, determine if those clades can be unambiguously supported by synapomorphies based on the distribution of scored characters in the context of the inferred topology (Malia et al., 2003; Simmons, 2012b). If they can, then also apply the three general principles from the introduction to determine if the resolution and support assigned to those clades can be justified based on ± globally sampled characters. 4.2. Examples incorporating simulated and contrived characters All three sets of simulated examples are based on characters that were simulated using a Jukes–Cantor model, no rate heterogeneity among characters within each partition, a single topology to simulate characters from all partitions, zero-length and/or equalpositive-length branches, short branch lengths and consequently low homoplasy, and an average of four synapomorphies per branch for the 1000-character-per-partition simulations. The only qualifications to these easy scenarios for phylogenetic inference are the distributions of missing data and the addition of three parsimony-informative contrived characters to the first set of simulated examples that consist of 1000 simulated characters per partition. Hence, any well supported branches (as measured by bootstrap values and posterior probabilities) that are resolved by a given method may be expected to match the simulated topology, with the exception of a single clade in example 1B. But, because of the distribution of missing data, my expectation was that there would be few (first set of simulated examples) or no (second and third sets of simulated examples) unequivocally supported clades. Contra this expectation, most of the phylogenetic-inference methods examined provided high scaled support for numerous clades from each of the simulated examples to which they were applied. Furthermore, the resolved clades frequently contradicted the simulated topology with high support, as indicated by negative OSR and aOSR. These negative sums were generally not caused by a single misplaced terminal. Rather they resulted from large portions of the trees being incorrectly resolved. As examples, the trees inferred by GARLI for the first replicate matrix from all three sets of simulated examples are posted as Fig. S3. 4.3. First simulated examples Results from the first simulated examples demonstrate that phylogenetic signal present in locally sampled characters may be interpreted in radically different ways depending upon the phylogenetic signal present in globally sampled characters. Examples 1A and 1B are identical to each other with respect to 3000 of the 3003 characters. The only way they differ is that in example 1A the three contrived globally-sampled characters are consistent with the topology used to simulate the other characters, whereas in example 1B those three characters contradict the simulation topology. Aside from the one clade delimited by the three contrived characters, the trees should be largely unresolved because of the distribution of missing data in character partitions 2 and 3. This expectation is met in the TNT strict consensus trees for both examples 1A and 1B (Table 3). In contrast, all of the other methods provided much higher resolution as well as support across many branches of the inferred trees. For example 1A, where the three contrived characters are consistent with the simulation topology,

11

the OSR and aOSR for all of the Bayesian and likelihood methods are uniformly positive. Yet in example 1B, the OSR and aOSR for all of the Bayesian and likelihood methods (including PAUP likelihood and RAxML) are uniformly and strongly negative, with the OSR, on average across the ten replicates, ranging from -34.9 to -37 and the aOSR, on average, ranging from -10.8 to -17. For example 1B, the parsimony-informative simulated characters that were used to create partitions 2 and 3 directly conflict with the three globally sampled contrived characters. But because of the distribution of missing data in partitions 2 and 3 this character conflict is almost entirely eliminated, which accounts for the high average CI and RI (0.94 and 0.99, respectively) on the PAUP hold/1 parsimony trees that are almost entirely incongruent with the simulated topology (the OSR averaged -34.9 across the 10 replicates; the worst possible OSR is -37; Table 3). These results reinforce the above stated importance of conducting rigorous tree searches on sparse supermatrices rather than relying upon low quality heuristic tree searches, and the importance of examining the bases for any clades that are present in likelihood or Bayesian trees that are unresolved in the parsimony-based strict consensus (Simmons, 2012a,b; Simmons and Norton, 2013). Comparison of the largely consistent results between the 3000and 30,000-character pectinate-topology simulated matrices from example 1B demonstrates that the strongly negative OSR and aOSR scores for PAUP hold/1 parsimony, Bayesian partition, Bayesian unlinked, GARLI, PAUP likelihood, and RAxML are not necessarily obviated with increased character sampling. By varying the number of globally sampled contrived characters that separate the odd terminals from the even terminals in the 30,000-character simulations, sensitivity of the different methods may be examined. In contrast to most other methods, the Bayesian-unlinked analyses produced dramatically different results when zero vs. just one globally sampled contrived character was sampled, not only for OSR, but, more seriously, also for aOSR (Fig. 3; Table S1) based on posterior probabilities inferred for many clades across the tree. Although the Bayesian-unlinked method generally performed well in the second and third examples (Tables 4 and 5), its performance in the first examples, including the dramatic sensitivity and worse performance than the Bayesian-partitioned method when four-or-fewer contrived characters were sampled in the 30,000-character simulations, demonstrates that application of this method cannot be expected to universally obviate missing-data artifacts in parametric phylogenetic analyses, even when numerous characters are sampled from each partition and stationarity and convergence are obtained (see Johnson et al. (2012) and Simmons and Norton (2013)). The Bayesian-unlinked method is not as conservative as thorough parsimony-based tree searches for making systematic inferences from sparse supermatrices, and hence cannot be relied upon as a parametric-based substitute method to ensure that inferred resolution and support from sparse supermatrices are unequivocally supported by the data. 4.4. Second and third simulated examples Results from the second and third simulated examples demonstrate the extent to which the different methods can create resolution and support despite there being no comparable information among sets of terminals. As expected, the TNT-based resolution (and consequently all other measures for this method) was always zero for both examples (Tables 4 and 5). But despite the lack of comparable information, the PAUP hold/1 parsimony, Bayesian partitioned, GARLI, and RAxML methods resolved an average of P40 clades with a scaled support of P16.1 in the 2000-character matrices. Alternatively, the Bayesian unlinked and PAUP likelihood analyses resolved an average of just 0.2–5.3 clades with 0.04–1.3 scaled support.

12

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

The contrasting parsimony (TNT vs. PAUP hold/1 parsimony), likelihood (GARLI and RAxML vs. PAUP ML), and Bayesian posterior probability (partitioned vs. unlinked) results demonstrate that producing high resolution and support despite there being no comparable information is not inherent to any of these optimality criteria. Rather, the extent of this resolution and support is determined based on how each optimality criterion is implemented (only holding a single optimal tree for PAUP hold/1 parsimony, GARLI, and RAxML; extrapolating branch lengths among character partitions that have drastically different distributions of missing data during Bayesian MCMC searches). The high resolution and scaled support produced by PAUP hold/1 parsimony, GARLI, and Bayesian partitioned methods was highly unstable and their congruence with the simulated topologies were subject to large fluctuations caused by arbitrary factors that should have no effect on phylogenetic inference given that the strict consensus should be unresolved. The OSR and aOSR for these methods were highly positive for the second example when applied to characters simulated on a symmetrical topology, relatively close to zero for the second example when applied to characters simulated on a pectinate topology, and strongly negative for the third example when applied to characters simulated on either a symmetrical or pectinate topology (Tables 4 and 5). Comparison of the results between the 2000- and 20,000-character pectinate-topology simulated matrices from the third example demonstrates that increased character sampling can substantially decrease the extent of resolution and support produced by the Bayesian-partitioned method but does not necessarily decrease that produced by PAUP hold/1 parsimony, GARLI, or RAxML (Table 5). Hence, the limitation of holding only a single optimal tree in parsimony or likelihood searches may create just as serious of artifacts when applied to matrices with large numbers of characters as to matrices with comparatively few characters. Based on these results as well as those from example 1B described above, sampling large numbers of characters in sparse supermatrices should not be used as a rationale to rely exclusively upon phylogenetic methods that perform low quality tree searches. 4.5. Connection to empirical sparse supermatrices The empirical and simulated examples presented in this study are simplistic relative to empirical sparse supermatrices with respect to the numbers of taxa and characters sampled as well as the number and overlap of taxon sampling between character partitions. As such, the conclusions from this study may not directly apply to empirical supermatrices as a whole. But the patterns examined in this study often do have direct representations in particular lineages of empirical supermatrices. Soltis et al.’s (2013) supermatrix of 950 terminals from the flowering plant order Saxifragales will be used as an example. The supermatrix includes 48,465 nucleotide characters from 51 concatenated gene alignments from the mitochondrial, nuclear, and plastid genomes. On their most likely RAxML tree, Soltis et al. (2013) show the Crassulaceae as a clade that is distinct from the other families sampled (see their Appendix S1). I compartmentalized the original supermatrix to only include Crassulaceae terminals and then eliminated parsimony-uninformative characters. In the Crassulaceae submatrix, Sedum humifusum Rose is not sampled for any of the 2099 parsimony-informative characters. The characters it is sampled for are also scored for only 15 or 16 other terminals in the complete 950-terminal matrix, and of these only two (Crassula marnieriana H. Huber & H. Jacobsen and Kalanchoe pinnata Pers.) are within the Crassulaceae. These three Crassulaceae species are not resolved as a three-terminal clade in the RAxML tree. If they were then the resolution of Sedum humifusum as a member of a nested two-terminal clade might have been unequivocally sup-

ported given that all three terminals are scored for common characters (Malia et al., 2003; this paper). Rather, Sedum humifusum is separated by at least 10 branches from the other two species in the fully resolved RAxML tree. Therefore, all apomorphies separating Sedum humifusum from those other two Crassulaceae terminals are ambiguously optimized across multiple branches. Despite the fact that Sedum humifusum should act as a wildcard within the Crassulaceae such that there should be no resolution in the strict consensus, RAxML assigned numerous branches within the Crassulaceae 100% bootstrap support and, as always, produced a single fully resolved optimal tree. These artifacts occurred despite Soltis et al.’s (2013) inclusion of 48,465 characters from 51 concatenated gene alignments and taking the three precautions of only sampling gene alignments scored for P10 terminals, removing spuriously resolved terminals from each gene-alignment tree, and ensuring overlap in terminal sampling between every gene alignment and at least one other gene alignment. 5. Conclusions The eight findings of this manuscript are as follows: (1) Locally sampled characters, when analyzed with low quality heuristic parsimony, likelihood, or Bayesian searches, can create high resolution and resampling values for clades that are properly unsupported because there is no comparable information among sets of terminals. (2) A sparse supermatrix may consist of slowly evolving characters with minimal homoplasy and be analyzed using a parametric method with a well-fitting nucleotide-substitution model yet still produce misleading results when analyzed using certain phylogenetic-inference methods. (3) The limitation of holding only a single optimal tree in parsimony and likelihood tree searches may create just as serious of artifacts when applied to matrices with large numbers of characters as it does to matrices with comparatively few characters. (4) Arbitrary factors that should have no effect on phylogenetic inference can create large fluctuations in congruence of trees inferred by parsimony, likelihood, and Bayesian methods with the simulated topology. (5) Phylogenetic signal present in locally sampled characters may be interpreted in radically different ways depending upon the phylogenetic signal present in globally sampled characters. (6) The resolution and branch support that some phylogeneticinference methods attribute to sparse supermatrices with no comparable information can appear comparable to that provided by supermatrices with little to no missing data. But this resolution and support should be regarded as an artifact rather than a strength of the phylogenetic-inference method—even when the bias is positive (Simmons, 2012b). (7) Application of Bayesian MCMC analyses with unlinked branch lengths among character partitions cannot be expected to universally obviate missing-data artifacts, even when numerous characters are sampled from each partition. (8) Bayesian MCMC methods may be more resilient than likelihood methods to presenting resolution and support within clades with no comparable information (see also Simmons, 2012b)—particularly when numerous characters are sampled. The first five findings may be addressed by analyzing sparse supermatrices using thorough tree searches while allowing numerous equally optimal trees to be saved from each replicate rather than relying entirely upon subtree pruning and regrafting (SPR)

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

while saving a single optimal tree (as in GARLI, PhyML, and RAxML). For sparse supermatrices that include hundreds of terminals, sufficiently thorough tree searches may only be computationally tractable in a frequentist context by using parsimony as the optimality criterion (Sanderson and Kim, 2000) together with heuristic searches that are specifically designed to handle large matrices (e.g., Nixon, 1999; Goloboff, 1999; Goloboff et al., 2008). Acknowledgments I thank Alfried Vogler and an anonymous reviewer for suggestions with which to improve the manuscript; Christine Bacon for permission to use unpublished ITS sequences; Jerry Davis, John Gatesy, Todd Gilligan, Pablo Goloboff, and Andrew Norton for helpful discussions; and Todd Gilligan for help running the phylogenetic analyses and SumTrees. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ympev.2014. 01.030. References Barta, J.R., 1997. Investigating phylogenetic relationships within the Apicomplexa using sequence data: the search for homology. Methods 13, 81–88. Carpenter, J.M., 1996. Uninformative bootstrapping. Cladistics 12, 177–181. Chatrou, L.W., Pirie, M.D., Erkens, R.H.J., Couvreur, T.L.P., Neubig, K.M., Abbott, J.R., Mols, J.B., Maas, J.W., Saunders, R.M.K., Chase, M.W., 2012. A new subfamilial and tribal classification of the pantropical flowering plant family Annonaceae informed by molecular phylogenetics. Bot. J. Linn. Soc. 169, 5–40. Coughenour, J.M., Simmons, M.P., Lombardi, J.A., Cappa, J.J., 2010. Phylogeny of Celastraceae subfamily Salacioideae and tribe Lophopetaleae inferred from morphological characters and nuclear and plastid genes. Syst. Bot. 35, 358–366. Coughenour, J.M., Simmons, M.P., Lombardi, J.A., Yakobson, K., Archer, R.H., 2011. Phylogeny of Celastraceae subfamily Hippocrateoideae inferred from morphological characters and nuclear and plastid genes. Mol. Phylogenet. Evol. 59, 320–330. Cummings, M.P., Handley, S.A., Myers, D.S., Reed, D.L., Rokas, A., Winka, K., 2003. Comparing bootstrap and posterior probability values in the four-taxon case. Syst. Biol. 52, 477–487. Davis, J.I., Simmons, M.P., Stevenson, D.W., Wendel, J.F., 1998. Data decisiveness, data quality, and incongruence in phylogenetic analysis: an example from the monocotyledons using mitochondrial atpA sequences. Syst. Biol. 47, 282–310. Davis, J.I., Nixon, K.C., Little, D.P., 2005. The limits of conventional cladistic analysis. In: Albert, V.A. (Ed.), Parsimony, Phylogeny, and Genomics. Oxford University Press, Oxford, pp. 119–147. Davis, J.I., McNeal, J.R., Barrett, C.F., Chase, M.W., Cohen, J.I., Duvall, M.R., Givnish, T.J., Graham, S.W., Petersen, G., Pires, J.C., Seberg, O., Stevenson, D.W., LeebensMack, J.H., 2013. Contrasting patterns of support among plastid genes and genomes for major clades of the monocotyledons. In: Wilkin, P. (Ed.), Early Events in Monocot Evolution. Cambridge University Press, Cambridge, pp. 315– 349. Fabre, P.-H., Rodrigues, A., Douzery, E.J.P., 2009. Patterns of macroevolution among primates inferred from a supermatrix of mitochondrial and nuclear DNA. Mol. Phylogenet. Evol. 53, 808–825. Farris, J.S., Albert, V.A., Källersjö, M., Lipscomb, D., Kluge, A.G., 1996. Parsimony jackknifing outperforms neighbor-joining. Cladistics 12, 99–124. Felsenstein, J., 1973. Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Syst. Zool. 22, 240–249. Felsenstein, J., 1978. Cases in which parsimony or compatibility methods will be positively misleading. Syst. Zool. 27, 401–410. Felsenstein, J., 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39, 783–791. Fitch, W.M., 1971. Toward defining the course of evolution: minimum change for a specific tree topology. Syst. Zool. 20, 406–416. Freudenstein, J.V., Davis, J.I., 2010. Branch support via resampling: an empirical study. Cladistics 26, 643–656. Gatesy, J., Matthee, C., DeSalle, R., Hayashi, C., 2002. Resolution of a supertree/ supermatrix paradox. Syst. Biol. 51, 652–664. Goloboff, P.A., 1999. Analyzing large data sets in reasonable times: solutions for composite optima. Cladistics 15, 415–428. Goloboff, P.A., Pol, D., 2005. Parsimony and Bayesian phylogenetics. In: Albert, V.A. (Ed.), Parsimony, Phylogeny, and Genomics. Oxford University Press, Oxford, pp. 148–159.

13

Goloboff, P.A., Farris, J.S., Källersjö, M., Oxelman, B., Ramírez, M.J., Szumik, C.A., 2003. Improvements to resampling measures of group support. Cladistics 19, 324–332. Goloboff, P.A., Farris, J.S., Nixon, K.C., 2008. TNT, a free program for phylogenetic analysis. Cladistics 24, 774–786. Guindon, S., Dufayard, J.-F., Lefort, V., Anisimova, M., Hordijk, W., Gascuel, O., 2010. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst. Biol. 59, 307–321. Guo, X., Wang, R.-J., Simmons, M.P., But, P.P.-H., Yu, J., 2013. Phylogeny of the Asian Hedyotis-Oldenlandia complex (Spermacoceae, Rubiaceae): evidence for high levels of polyphyly and parallel evolution of diplophragmous capsules. Mol. Phylogenet. Evol. 67, 110–122. Johnson, K.A., Holland, B.R., Heslewood, M.M., Crayn, D.M., 2012. Supermatrices, supertrees and serendipitous scaffolding: inferring a well-resolved, genus-level phylogeny of Styphelioideae (Ericaceae) despite missing data. Mol. Phylogenet. Evol. 62, 146–158. Jukes, T.H., Cantor, C.R., 1969. Evolution of protein molecules. In: Murto, H.N. (Ed.), Mammalian Protein Metabolism, vol. 3. Academic Press, New York, pp. 21–132. Katoh, K., Toh, H., 2008. Recent developments in the MAFFT multiple sequence alignment program. Brief. Bioinform. 9, 286–298. Kluge, A.G., Farris, J.S., 1969. Quantitative phyletics and the evolution of Anurans. Syst. Zool. 18, 1–32. Lemmon, A.R., Brown, J.M., Stanger-Hall, K., Lemmon, E.M., 2009. The effect of ambiguous data on phylogenetic estimates obtained by maximum likelihood and Bayesian inference. Syst. Biol. 58, 130–145. Maddison, D.R., Maddison, W.P., 2001. MacClade: Analysis of Phylogeny and Character Evolution, Version 4.03. Sunderland, Sinauer. Malia, M.J., Lipscomb, D.L., Allard, M.W., 2003. The misleading effects of composite taxa in supermatrices. Mol. Phylogenet. Evol. 27, 522–527. Meusemann, K., von Reumond, B.M., Simon, S., Roeding, F., Strauss, S., Kuck, P., Ebersberger, I., Walzl, M., Pass, G., Breuers, S., Achter, V., von Haeseler, A., Burmester, T., Hadrys, M., Wagele, J.W., Misof, B., 2010. A phylogenomic approach to resolve the arthropod tree of life. Mol. Biol. Evol. 27, 2451–2464. Nixon, K.C., 1999. The parsimony ratchet, a new method for rapid parsimony analysis. Cladistics 15, 407–414. Nixon, K.C., Carpenter, J.M., 1996. On consensus, collapsibility, and clade concordance. Cladistics 12, 305–321. Nixon, K.C., Wheeler, Q.D., 1991. Extinction and the origin of species. In: Wheeler, Q.D., Novacek, M. (Eds.), Extinction and Phylogeny. Columbia University Press, New York, pp. 119–143. Peters, R.S., Meyer, B., Krogmann, L., Borner, J., Meusemann, K., Schutte, K., Niehuis, O., Misof, B., 2011. The taming of an impossible child: a standardized all-in approach to the phylogeny of Hymenoptera using public database sequences. BMC Biol. 9, 55. Ronquist, F., Teslenko, M., van der Mark, P., Ayers, D.L., Darling, A., Hohna, S., Larget, B., Liu, L., Suchard, M.A., Huelsenbeck, J.P., 2012. MrBayes 3.2: efficient Bayesian phylogenetic inference and model choice across a large model space. Syst. Biol. 61, 539–542. Sanderson, M.J., Kim, J., 2000. Parametric phylogenetics? Syst. Biol. 49, 817–829. Sanderson, M.J., McMahon, M.J., Steel, M., 2010. Phylogenomics with incomplete taxon coverage: the limits of inference. BMC Evol. Biol. 10, 155. Simmons, M.P., 2004. Independence of alignment and tree search. Mol. Phylogenet. Evol. 31, 874–879. Simmons, M.P., 2012a. Radical instability and spurious branch support by likelihood when applied to matrices with non-random distributions of missing data. Mol. Phylogenet. Evol. 62, 472–484. Simmons, M.P., 2012b. Misleading results of likelihood-based phylogenetic analyses in the presence of missing data. Cladistics 28, 208–222. Simmons, M.P., Freudenstein, J.V., 2011. Spurious 99% bootstrap and jackknife support for unsupported clades. Mol. Phylogenet. Evol. 61, 177–191. Simmons, M.P., Goloboff, P.A., 2013. An artifact caused by undersampling optimal trees in supermatrix analyses of locally sampled characters. Mol. Phylogenet. Evol. 69, 265–275. Simmons, M.P., Norton, A.P., 2013. Quantification and relative severity of inflated branch-support values generated by alternative methods: an empirical example. Mol. Phylogenet. Evol. 67, 277–296. Simmons, M.P., Webb, C.T., 2006. Quantification of the success of phylogenetic inference in simulations. Cladistics 22, 249–255. Simmons, M.P., Pickett, K.M., Miya, M., 2004. How meaningful are Bayesian posterior probabilities? Mol. Biol. Evol. 21, 188–199. Simmons, M.P., Müller, K.F., Norton, A.P., 2010. Alignment of, and phylogenetic inference from, random sequences: the susceptibility of alternative alignment methods to creating artifactual resolution and support. Mol. Phylogenet. Evol. 57, 1004–1016. Simmons, M.P., Bacon, C.D., Cappa, J.J., McKenna, M.J., 2012. Phylogeny of Celastraceae subfamilies Cassinoideae and Tripterygioideae inferred from morphological characters and nuclear and plastid loci. Syst. Bot. 37, 456–467. Simmons, M.P., McKenna, M.J., Bacon, C.D., Yakobson, K., Cappa, J.J., Archer, R.H., Ford, A.J., 2012b. Phylogeny of Celastraceae tribe Euonymeae inferred from morphological characters and nuclear and plastid genes. Mol. Phylogenet. Evol. 62, 9–20. Soltis, D.E., Mort, M.E., Latvis, M., Mavrodiev, E.V., O’Meara, B.C., Soltis, P.S., Burleigh, J.G., Rubio de Casas, R., 2013. Phylogenetic relationships and character evolution analysis of Saxifragales using a supermatrix approach. Am. J. Bot. 100, 916–929. Springer, M.S., Meredith, R.W., Gatesy, J., Emerling, C.A., Park, J., Rabosky, D.L., Stadler, T., Steiner, C., Ryder, O.A., Janecka, J.E., Fisher, C.A., Murphy, W.J., 2012.

14

M.P. Simmons / Molecular Phylogenetics and Evolution 74 (2014) 1–14

Macroevolutionary dynamics and historical biogeography of primate diversification inferred from a species supermatrix. PLoS ONE 7, e49521. Stamatakis, A., 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688– 2690. Stöver, B.C., Müller, K.F., 2010. TreeGraph 2: combining and visualizing evidence from different phylogenetic analyses. BMC Bioinform. 11, 7. Sukumaran, J., Holder, M.T., 2010. DendroPy: a python library for phylogenetic computing. Bioinformatics 26, 1569–1571. Suzuki, Y., Glazko, G.V., Nei, M., 2002. Overcredibility of molecular phylogenies obtained by Bayesian phylogenetics. Proc. Natl. Acad. Sci. USA, 16138–16143. Swofford, D.L., 2001. PAUP: Phylogenetic Analysis using Parsimony (and other methods). Sinauer Associates, Sunderland. Wagner, C.E., Keller, I., Wittwer, S., Selz, O.M., Mwaiko, S., Greuter, L., Sivasundar, A., Seehausen, O., 2013. Genome-wide RAD sequence data provide unprecedented resolution of species boundaries and relationships in the Lake Victoria cichlid adaptive radiation. Mol. Ecol. 22, 787–798. Wiens, J.J., 1998. Does adding characters with missing data increase or decrease phylogenetic accuracy? Syst. Biol. 47, 625–640.

Wiens, J.J., 2003. Incomplete taxa, incomplete characters, and phylogenetic accuracy: is there a missing data problem? J. Vertebr. Paleontol. 23, 297–310. Wiens, J.J., 2005. Can incomplete taxa rescue phylogenetic analyses from longbranch attraction? Syst. Biol. 54, 731–742. Yang, Z., 2007. PAML 4: phylogenetic analysis by maximum likelihood. Mol. Biol. Evol. 24, 1586–1591. Yang, Z., Rannala, B., 1997. Bayesian phylogenetic inference using DNA sequences: a Markov Chain Monte Carlo method. Mol. Biol. Evol. 14, 717–724. Zujko-Miller, C., Miller, J.A., 2003. PEST: Precision estimated by sampling traits. . Program distributed by the authors. Zurawski, G., Clegg, M.T., 1993. RbcL sequence data and phylogenetic reconstruction in seed plants: forward. Ann. Mol. Bot. Gard. 80, 523–525. Zwickl, D.J., 2006. Genetic Algorithm Approaches for the Phylogenetic Analysis of Large Biological Sequence Datasets under the Maximum Likelihood Criterion. Ph.D. Dissertation, The University of Texas at Austin. Zwickl, D.J., 2009. GARLI 0.96 settings cheat sheet (Smithsonian, June 09). Distributed by the author.