Molecular Phylogenetics and Evolution 116 (2017) 69–77
Contents lists available at ScienceDirect
Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev
The development of scientific consensus: Analyzing conflict and concordance among avian phylogenies ⁎
Joseph W. Brown , Ning Wang, Stephen A. Smith
MARK
⁎
Department of Ecology & Evolutionary Biology, University of Michigan, Ann Arbor, MI 48109, USA
A R T I C L E I N F O
A B S T R A C T
Keywords: Phylogenetic conflict Concordance Aves Supertrees
Recent developments in phylogenetic methods and data acquisition have allowed for the construction of large and comprehensive phylogenetic relationships. Published phylogenies represent an enormous resource that not only facilitates the resolution of questions related to comparative biology, but also provides a resource on which to gauge the development of concordance across the tree of life. From the Open Tree of Life, we gathered 290 avian phylogenies representing all major groups that have been published over the last few decades and analyzed how concordance and conflict develop among these trees through time. Nine large scale phylogenetic hypotheses (including a new synthetic tree from this study) were used for comparisons. We found that conflicts were over-represented both along the backbone (higher-level neoavian relationships) and within the oscine Passeriformes. Importantly, although we have made major strides in the resolution of major clades, recent published comprehensive trees, as well as trees of individual clades, continue to contribute significantly to the resolution of relationships throughout the avian phylogeny. Our analyses highlight the need for continued research into the resolution of avian relationships.
1. Introduction
including TreeBASE (Sanderson et al., 1994) and more recently the Open Tree of Life (Hinchliff et al., 2015; McTavish et al., 2015), are now available to better analyze both the novelty and congruence of inferred relationships across the tree of life across studies. The Open Tree of Life is an NSF funded project whose aim is to construct a comprehensive tree of life using published phylogenetic trees along with taxonomic data (Hinchliff et al., 2015). To facilitate this research, the Open Tree of Life has developed and provided the community with several important resources. The Open Tree Taxonomy (hereafter OTT; Rees and Cranston, 2017), unlike many other synthetic taxonomies available, attempts to include only phylogenetically appropriate taxa (i.e., through exclusion of names of dubious taxonomic status). It is also more comprehensive than other more commonly used taxonomies (e.g., NCBI) as it includes taxa regardless of whether they have molecular data associated. The Open Tree of Life also constructs and serves a draft synthetic tree of all described species (Hinchliff et al., 2015), through the grafting of OTT together with published trees identified, uploaded, and curated by the community. This resource, while continually improving, provides significant opportunities to address broad evolutionary questions that previously would have been impossible. Finally, the project also openly provides the database of published phylogenies that have been curated by the community (McTavish et al., 2015). Importantly, the taxa included in each
Large and comprehensive phylogenies (i.e., including hundreds of taxa and based on genome-scale datasets) have become more common as inference methods and sequencing techniques capable of constructing enormous datasets have been developed (e.g., Smith and Donoghue, 2008; Rabosky et al., 2013; Zanne et al., 2014; Prum et al., 2015; Simion et al., 2017). These phylogenies have, in many cases, given fresh views to macroevolution and transformed our ability to address diverse sets of comparative biological questions ranging from lineage diversification to morphological evolution to rate heterogeneity (Brockington et al., 2015; Lin et al., 2016; Scholl and Wiens, 2016). Comprehensive phylogenies that include all or nearly all taxa constructed from supertree techniques also provide a means of determining where data collection efforts should be focused (Davis and Page, 2014; Jetz et al., 2012; Hinchliff et al., 2015). While these trees may facilitate interesting biological inquiries, they also provide a resource by which we can better assess the development of congruence among evolutionary hypotheses (e.g., Davis and Page, 2014; Suh, 2016; Reddy et al., 2017). Recent efforts to better understand the development of conflict and concordance among trees have been conducted primarily with molecular data (e.g., Hinchliff and Smith, 2014; Smith and Stamatakis, 2013; Smith et al., 2015). Nevertheless, phylogenetic resources,
⁎
Corresponding authors. E-mail addresses:
[email protected] (J.W. Brown),
[email protected] (S.A. Smith).
http://dx.doi.org/10.1016/j.ympev.2017.08.002 Received 31 March 2017; Received in revised form 3 August 2017; Accepted 6 August 2017 Available online 07 August 2017 1055-7903/ © 2017 Elsevier Inc. All rights reserved.
Molecular Phylogenetics and Evolution 116 (2017) 69–77
J.W. Brown et al.
few decades were curated through the Open Tree of Life online curator (https://tree.opentreeoflife.org/curator), following the protocol of Hinchliff et al. (2015). Generally, published trees (as newick, NEXUS, or NeXML format) were obtained by appealing to authors, or imported from TreeBASE (Sanderson et al., 1994) and Dryad. We attempted to incorporate the source trees from the Davis and Page (2014) supertree study. However, we found that many trees from this resource were some form of consensus hypothesis (e.g., between parsimony and maximum likelihood) and/or included unsampled taxa (both extinct and extant) from the Davis and Page (2014) taxonomy. In sum, these trees reflected neither a specific hypothesis nor the extent of sampling of the original publication, and so were not included here. The full species-level tree of Sibley and Ahlquist (1990) has, to our knowledge, never been available in electronic format. As part of this study, JWB constructed the tree with branch lengths from Figs. 357–368, 371–385 of Sibley and Ahlquist (1990); this is the UPGMA tree commonly known as the “Sibley-Ahlquist Tapestry”, and is now freely available from the Open Tree of Life curator (study id: ot_427, tree id: tree5). The taxon labels for each source tree sampled here were mapped to the Open Tree of Life taxonomy (i.e., OTT) and trees were rooted with outgroups identified from the original study. In total, 290 avian phylogenetic hypotheses were gathered from the existing resources in the Open Tree of Life. These are all openly available in the git-based phylesystem repository (McTavish et al., 2015; https://github.com/OpenTreeOfLife/phylesystem). The distribution of trees sampled through time (Fig. 1) reflects data availability rather than research effort, as historically phylogenetic hypotheses have not been archived in machine-readable formats (Stoltzfus et al., 2012; Drew et al., 2013). Among the sampled trees, seven major hypotheses were used as focal trees for the assessment of concordance and conflict against the remaining tree set: Sibley and Ahlquist (1990), Livezey and Zusi (2007), Hackett et al. (2008), Jetz et al. (2012), Davis and Page (2014), Jarvis et al. (2014), and Prum et al. (2015). We note that we used the Jetz et al. (2012) tree limited to taxa with genetic data (6670 tips) and constrained based on the Hackett et al. (2008) backbone. In addition to the seven trees above, the Open Tree of Life synthetic tree version 7 (hereafter Opentree7, updated in Sep 2016; https://tree. opentreeoflife.org/about/synthesis-release/v7.0) was also included as one of the backbone resources (see Table 1 for a summary of the properties of these trees). Source trees may not be independent of each other. For example, the datasets used to construct phylogenies may have partial overlap (e.g., Wang et al., 2011; Kimball et al., 2013) or datasets can share constraints (e.g., the Jetz et al. (2012) tree has a backbone constraint based on Hackett et al. (2008)). We attempted to minimize these non-independent comparison as they may cause overestimation of conflicts or concordance. To this end, we filtered the source trees by including only one tree from each study (to avoid largely overlapped or same trees from the same study). However, and despite these efforts, we note that overlap can hardly be avoided among studies (especially given the high frequency use of certain genes). So, while we try to avoid this as much as was possible, there are likely to be non-independent edges between trees that were included in these analyses.
phylogeny have been mapped to a common taxonomy (i.e., OTT), which allows for comparisons to be performed across datasets without an additional tedious and error prone step of name reconciliation. Instead, this reconciliation has already been performed by those who uploaded the tree, often researchers with close knowledge of the focal organisms. Here, by utilizing the database of curated phylogenies from the Open Tree of Life, we assess the concordance and conflict among the growing number avian phylogenies that have been published during the last few decades. Methods that are used in this study can also be applied to other living groups on Earth based on the Open Tree of Life resources. As the most diverse extant tetrapod lineage with ∼10,800 recognized extant species (Gill and Donsker, 2016) [and potentially more than twice as many cryptic lineages; Barrowclough et al. (2016)], birds have experienced a rapid inter-ordinal radiation where extremely short internodes exist (Hackett et al., 2008; McCormack et al., 2013; Burleigh et al., 2015; Suh, 2016; Reddy et al., 2017). Although substantial progress has been made on reconstruction of the Aves phylogeny, discovering successive divergence of three monophyletic groups [i.e., Palaeognathae (the tinamous and flightless ratites), Galloanserae (game birds and waterfowl), and Neoaves (all other living birds), Groth and Barrowclough, 1999; Cracraft et al., 2004], resolving the avian phylogeny (especially within Neoaves) has continued to prove a difficult task for the avian systematics community since the pioneering efforts of Sibley and Ahlquist (1990). Researchers have started to explicitly assess progress in avian phylogenetics. By constructing a consensus tree based on six genome-scale phylogenies from five independent studies (i.e., Hackett et al., 2008; McCormack et al., 2013; Jarvis et al., 2014; Suh et al., 2015; Prum et al., 2015), Suh (2016) assessed the reproducibility of various avian phylogenetic hypotheses. Due to the overwhelming conflict among the source trees used (i.e., no higher-level clade could be supported by at least two out of the six trees), Suh (2016) suggested that the very onset of the neoavian radiation produced an irresolvable nine-taxon hard polytomy. Reddy et al. (2017) constructed a nearly identical summary consensus tree to Suh (2016) using a smaller sample of three major hypotheses (i.e., Jarvis et al., 2014; Prum et al., 2015; Reddy et al., 2017), but were more optimistic that more realistic biological-modelling and, importantly, careful selection of data types, will enable further progress. We note that none of the trees considered by Suh (2016) or Reddy et al. (2017) had sufficient sampling of Passeriformes (songbirds; roughly 60% of extant avian species), so conflict could not be ascertained within that clade. To date, these and other studies have mainly focused on identifying causes of conflict, attributing tree differences to various factors including gene tree discordance due to incomplete lineage sorting (ILS; Jarvis et al., 2014; Suh et al., 2015), differences in phylogenetic signal content among data types (Jarvis et al., 2014; Reddy et al., 2017), and the influence of taxon sampling (Prum et al., 2015). However, while these issues of inference are important to keep in mind for future research, little effort has been made to summarize the development and growth of consensus when considering the entire corpus of published phylogenetic hypotheses. In this study, eight large-scale avian trees (Table 1) published in different time intervals are used as exemplars to assess trends of concordance and conflict. Additionally, after filtering 290 avian source trees publicly available from the Open Tree of Life, we constructed a new comprehensive synthetic bird tree and use it for assessment as the largest avian tree to date. This synthetic tree also serves as a resource for other researchers, and as a summary point from which we can compare future comprehensive avian phylogenies.
2.2. Construction of a new synthetic tree of Aves In addition to the individual phylogenetic trees that we collected from the Open Tree of Life, we also assembled a novel synthetic avian tree using the “propinquity” pipeline from Redelings and Holder (2017). This supertree method takes as input a taxonomy tree (i.e., OTT) and a set of ranked source trees. Of the 290 avian trees collected above, 183 were selected that reflect community consensus about phylogenetic hypotheses. In general, the propinquity method constructs a supertree that displays the largest number of input tree edges while avoiding the inclusion of edges in the final tree that are unsupported by any input phylogeny. Because synthesis relies upon supertree
2. Methods 2.1. Source trees Avian phylogenetic hypotheses that have been published in the last 70
Molecular Phylogenetics and Evolution 116 (2017) 69–77
J.W. Brown et al.
Table 1 Summary of the nine major avian phylogenies. Focal tree
Data type
Analyses type
Number of taxa (mapped/total)a
Sibley and Ahlquist (1990), Figs. 357–368, 371–385 Livezey and Zusi (2007), Figs. 12–18 Hackett et al. (2008), Fig. 2 Jetz et al. (2012), Fig. 2 Davis and Page (2014), Fig. 1 Jarvis et al. (2014), Fig. 1 Prum et al. (2015), Fig. 1 Opentree 7
DNA-DNA hybridization 2954 morphological characters 19 nuclear loci 4 Mtb + 6 nuclear DNA loci 1036 source trees 14536 nuclear DNA loci 259 sequence capture DNA loci 77 source trees
UPGMA MP ML Constrained Bayesian MRPc supertree ML Bayesian Synthetic supertree
1098/1106 184/185 171/171 6669/6670 5328/5379 48/48 198/198 13756/13756
a b c
The number of tips mapped to OTT/the total number of tips in the original tree. Mt: mitochondrial. MRP: Matrix Representation with Parsimony.
the deepest (i.e., most tipward) compatible taxonomic node present in the supertree. See Redelings and Holder (2017) for more details. 2.3. Conflict and concordance analyses In total, nine major phylogenies (eight published and one new synthetic tree as described above) were used to conduct conflict and concordance analyses. To compare these trees with the 290 source trees, we added the comprehensive OTT v3.0 set of taxa to each tree by using the same synthesis approach (see above) but with only the taxonomy and the one phylogeny in question. Adding the full OTT set of taxa to the focal trees ensures an overlap of tip sets with each of the 290 source trees. The result was a comprehensive tree with only resolution of the single tree and the taxonomy (i.e., the synthesis procedure preserves all the inferred relationships of the original publication). Conflict and concordance analyses were conducted using the corresponding tools utilized by the Open Tree of Life web service (available from https://github.com/OpenTreeOfLife/reference-taxonomy). Python scripts to perform the analyses are available in the bitbucket repository (https://bitbucket.org/blackrim/opentree_birds). For these analyses, concordant and conflicting edges were identified as in Smith et al. (2013, 2015) and Redelings and Holder (2017). More explicitly, because all sources trees are rooted, a source tree edge j defines a rooted bipartition S(j) = Sin|Sout, where Sin and Sout represent the tip sets of the ingroup and outgroup, respectively. For a given edge in tree A, concordance/conflict with source tree B thus involves the overlap of tip sets. We define concordance between A and B (‘A displays B’ in Redelings and Holder (2017)) when Bin ⊂ Ain and Bout ⊂ Aout (that is, ingroup tip sets overlap, and outgroup tip sets overlap). [We note that Redelings and Holder (2017) use ⊆ rather than ⊂ because they deal with the more general case where individual trees may have incomplete tip sets. Because we synthesize OTT with each focal backbone tree (above), we ensure that all of the tips in a source tree are also present in the focal tree]. On the other hand, edges in trees A and B are identified as conflicting if none of the following are empty: Ain ∩ Bin, Ain ∩ Bout, or Bin ∩ Aout (that is, there is reciprocal overlap in the ingroup and outgroup across trees; we note that Redelings and Holder (2017) have a typo in this definition). We computed edge-specific values of concordance and conflict for each of the nine focal trees against the 290 source trees. All analyses were conducted on the focal trees supplemented with unsampled OTT taxa (except OpenTree7 and the new synthetic tree, which already possessed a full tip set), but summarized on the original published tree topologies.
Fig. 1. The number of published avian trees per year available in the Open Tree of Life tree repository.
construction, we attempted to exclude both superseded trees (i.e., trees that have been proven to be incorrect by subsequent studies) and previously published supertrees. The propinquity pipeline also requires the ranking of phylogenies. The ranking is used to ensure preference in regard to resolving conflict between input source trees during synthesis (Hinchliff et al., 2015; Redelings and Holder, 2017). For example, an input tree may have been constructed with a dataset that included good sampling for the focal clade but poor sampling for the putative outgroups. In a synthesis analysis, we may prefer to have a higher ranked tree, with better sampling, resolve the putative outgroups. For this study, after grouping the bird resource trees into separate focal clades (e.g., by order), we ranked the 183 bird trees based on mixed criteria, such as date of publication, extent of taxon and character sampling, and degree of taxonomic overlap. As to conflicting clades, JWB and NW constructed the tree ranking order to make sure confident groupings were put in higher ranks. While our criteria for ranking trees represented our best judgement of the available source trees, others may disagree. As such, we have provided the ranked set of trees used for the construction of the new synthetic tree in Supplementary Information so that they are available for evaluation and comparison. The taxonomic tree with all taxa, derived from OTT, was then used to maximize leaf set for comparison. After dividing the full data set into sub-problems based on uncontested taxa (that is, taxa from OTT that are not conflicted by any source tree), propinquity grafts together solutions from each subproblem (where the ranking of source trees comes into play) into a single supertree. It is worth noting that this procedure resolves conflicts within subproblems by simply deferring to the most highly ranked tree involved, and does not take into account statistical support for the conflicting relationships involved. Finally, taxa not present in any source tree (i.e., taxonomy-only taxa) are grafted into the supertree at
3. Results and discussion 3.1. The synthetic tree of Aves The synthetic phylogeny constructed for this study contained 13,579 tips (including 3458 subspecies) and 10,795 internal nodes, 71
Molecular Phylogenetics and Evolution 116 (2017) 69–77
J.W. Brown et al.
Fig. 2. Conflicting (left, warmer color represents more conflict), concordance (center, darker green represents more concordance), and chronological contribution (right) analyses based on three backbone trees. Conflict and concordance are measured as the number of source tree edges that conflict/support a given focal tree edge. Chronological contribution is measured as the first year in which the focal edge appeared in a published tree. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
(2015), which possessed the highest source tree rank regarding the backbone topology. Other studies focusing on individual clades (i.e., order- or family-specific) help to resolve lower level relationships towards the tips. A natural comparison to the synthetic tree constructed as part of this study is that of another supertree, Davis and Page (2014). The Davis and Page (2014) tree is an enormously useful phylogeny as both a tool for researchers and as a means of understanding the available phylogenetic information for birds. There are many similarities between the synthetic tree constructed with supertree methods and the supertree of Davis and Page (2014), but some important differences. While Davis and Page (2014) included more source trees, the synthetic tree presented here contained more taxa (complete avian taxa vs 5379, Table 1). Also, the method used to construct the synthetic tree, unlike the Matrix Representation with Parsimony (MRP) method of Davis and Page (2014), minimizes unsupported groups (i.e., clades not present in
leaving 2782 nodes (13,577–10,795, assuming a fully binary tree) to be resolved by future studies. OTT is a synthetic taxonomy comprised of numerous source taxonomies that sometimes disagree on the taxonomic status (e.g., species vs. subspecies) or name of a taxon. As a result, the OTT contains duplicated taxa. These are not expected to influence our results, and resolution of these is beyond the scope of the current study. However, continued improvement to the taxonomy will benefit the broader community and should be the focus of some future work. Opentree7 contained more tips than the synthetic phylogeny presented here (13,756; due to a different version of OTT, v2.10) but far fewer internal nodes (7157). While the changes in taxonomy since Opentree7 resulted in the loss of some tips in our synthetic tree (probably due to improvements in name reconciliation in OTT), our new synthetic tree resolved ∼3500 more internal nodes than Opentree7 through inclusion of more source trees (183 vs. 77). The higher-level inter-ordinal relationships shown by the synthetic tree follow mostly that of Prum et al.
72
Molecular Phylogenetics and Evolution 116 (2017) 69–77
J.W. Brown et al.
Fig. 3. Distributions of conflict and concordance across the six-remaining major avian phylogenies. See Fig. 2 for description of colors.
(i.e., number of changes vs. time). While branch lengths and divergence times are not the focus of this study, future studies should examine new ways to incorporate divergence times to increase the utility of this resource.
any of the input trees). As a result, there are edges in Davis and Page (2014) the resolution of which may not be supported by any input tree (Gatesy and Springer, 2004). This was not the case with the synthetic tree (Redelings and Holder, 2017). Finally, the Davis and Page (2014) was limited to trees published before 2010, and so does not include many of the major phylogenetic hypotheses published in the last 7 years. Thus, our synthetic tree provides another resource, in addition to those published, that researchers may use to conduct evolutionary analyses. It also serves as another point from which we can measure future improvements in our knowledge of the evolution of this group. Despite these benefits, there are limitations to the synthetic tree as there are limitations to any phylogeny. First, the synthetic algorithm requires ranked input source trees and the resolution is largely influenced by this ranking. Ranking input source trees is not inherently incorrect and is intended to reflect the scientific consensus. In fact, scientific consensus itself functions as an informal ranking where relationships based on published trees are generally accepted or rejected based on several factors (see below). For example, consensus does not necessarily imply accuracy as one good study can overturn dozens of previous studies due to better data or methodology. Nevertheless, this scientific consensus may be difficult to translate directly to a ranked list. Second, and as is the case with all supertree methods, the synthetic tree is at least one step removed from the original data. Inference, generally, is preferably based on the original data. However, for several reasons this may not be possible and a supertree method may be required. Finally, the synthetic tree lacks branch lengths. This is the result of several factors including that many source input trees are lacking this information, the synthetic supertree procedure does not currently accommodate branch lengths, and input trees vary in data type (e.g., DNA vs. morphology) and branch length unit
3.2. The distribution of phylogenetic conflict One major aim of this study is to examine the distribution of concordance and conflict across nine major bird phylogenies. Except for Sibley and Ahlquist (1990), all the exemplar backbone trees we considered were constructed after 2006. Each tree, at the time of publication, represented major improvements in either data collection, analysis, resolution, or support. We found that the most recent trees agreed on many relationships despite being built using different methods and using different data sources. One tree was constructed with morphological data (Livezey and Zusi, 2007), four were built with dramatically different scales of molecular data (Hackett et al., 2008; Jetz et al., 2012; Jarvis et al., 2014; Prum et al., 2015), and the rest were constructed with supertree methods (Davis and Page, 2014; Opentree7; our synthetic tree). Because Sibley and Ahlquist (1990) and Prum et al. (2015) represent the earliest and the most recent comprehensive hypotheses, respectively, we discuss the conflict and concordance from these trees along with the synthetic tree in more detail. We found more conflicting edges (warmer colours, Fig. 2) for rootward relationships (i.e., the backbone from Neognathae to Passeriformes) on the Sibley and Ahlquist (1990) tree. This was expected considering that many phylogenies have been published since 1990 and these have contradicted the Sibley and Ahlquist (1990) “tapestry” hypothesis (see also discussion in Harshman, 1994). For instance, the root of the Sibley and Ahlquist (1990) placed Galloanserae as sister to Paleognathae and has been refuted by all subsequent studies. Prum et al. 73
Molecular Phylogenetics and Evolution 116 (2017) 69–77
J.W. Brown et al.
Fig. 4. The earliest published occurrence of branches in each of the major avian phylogenies. The x-axis represents the year in which a branch in the focal tree first appears in a sampled published phylogeny, and the y-axis is the proportional accumulation of clades on each tree. Circles and dotted lines indicate the proportion of branches published in years previously to the focal tree. Trees discussed in detail in the text are indicated by thicker lines.
report that for clades with < 50% bootstrap support, almost half of the families (195/399) and two thirds of the genera (32/47) were within the oscine songbird clade. Given that Passeriformes represents ∼60% of extant avian species, these results suggest that there is still work to be done in resolving relationships with Aves.
(2015; Fig. 2) had comparatively fewer conflicting edges around Neoaves. This is most likely due to significantly lower taxon sampling (i.e., fewer sampled lineages means a smaller potential number of conflicts). The synthetic tree presented here had higher conflict than Prum et al. (2015) surrounding Neoaves. This is, at least in part, due to the comprehensive nature of the tree and is shared by Jetz et al. (2012), which contributes many of the edges to the synthetic tree due to the extensive taxonomic sampling of Jetz et al. (2012; Fig. 3). This pattern of conflicting relationships around Neoaves is largely consistent with the neoavian polytomies posited by Suh (2016) and Reddy et al. (2017), though these studies considered a small number of phylogenomic studies for comparison (see Fig. 3). A pattern of conflict shared across the focal trees involved the Passeriformes (songbirds). In fact, although the monophyly of Passeriformes was largely supported, the relationships that diverge soon after crown Passeriformes (i.e., within the oscines) were highly contested among the source trees. We found Sibley and Ahlquist (1990) to have the highest conflict among edges surrounding the Passeriformes clade. Six edges conflicted with 29–37 source trees and were supported by 13–30 trees. Prum et al. (2015) also exhibited this pattern with exceptionally high conflict in the early diverging of oscine songbirds (Fig. 2). Prum et al. (2015) sampled 15 species from 14 families in oscine songbirds, and this sparse sampling may have impacted the phylogenetic accuracy of this region. The synthetic tree showed fewer conflicts at the base of Passeriformes. The resolution exhibited by the synthetic tree reflected Moyle et al. (2016), primarily, though also influenced by Barker et al. (2015), Price et al. (2014), and Selvatti et al. (2015) as a result of their more comprehensive sampling. Furthermore, because many of the source trees used for conflict-concordance analyses also formed the major contribution to the synthetic tree reconstruction, and the synthetic reconstruction algorithm is meant to minimize these conflicts, the synthetic tree was expected to exhibit fewer conflicts. We note that strong phylogenetic conflict within Passeriformes was not discussed by Suh (2016) and Reddy et al. (2017), as they only considered trees focused on inter-ordinal relationships. While conflict and concordance within Passeriformes has not previously been explicitly evaluated, individual assessments of phylogenetic uncertainty have discussed the potential for conflict. For example, Burleigh et al. (2015)
3.3. The resolution of the Aves phylogeny through time The trees on which we conduct conflict and support analyses represent either major contributions or comprehensive analyses of the Aves clade. One central question regarding these trees is how does our knowledge about clades change over time. For example, did recent genomic studies such as Prum et al. (2015) contribute many unique clades to the literature, or were these resolutions simply recapitulating results that had been published previously? Although limited by the 290 available trees, we can begin to address these questions. When compared to our database of trees, Sibley and Ahlquist (1990) contributed almost all new relationships but for three edges (published before 1990) that involve relationships of flycatchers, Phrygilus, and Junco (Fig. 4). While we are aware that edges had previously been published in (e.g., Ho et al., 1976; Strauch, 1978), we did not have access to many of those trees as they were not available in electronic format. Although many studies have been published between 1990 and 2015, Prum et al. (2015) still contributed new edges (Fig. 4). While 143 edges had been reported in previous publications, including many in Sibley and Ahlquist (1990), 47 edges were new to the Prum et al. (2015) tree. In general, the proportion of previously published edges increases towards the present, and the proportion of novel inferred edges appears to be decelerating (Fig. 4). Nevertheless, this highlights that important contributions are still being made by large-scale genomic and molecular studies. We note some temporal trends of congruence/conflict regarding inferred braches in the major focal hypotheses (Fig. 5). In general, conflict with the Sibley and Ahlquist (1990) “tapestry” has increased towards the present, especially in regards to the large recent studies of Jetz et al. (2012), Davis and Page (2014), Hedges et al. (2015), and Burleigh et al. (2015), which conflict with more than half of the edges in Sibley and Ahlquist (1990). These same trees contribute the most 74
Molecular Phylogenetics and Evolution 116 (2017) 69–77
J.W. Brown et al.
Fig. 5. Counts of branch support and conflict through time. Each point represents the number of supporting (cross) or conflicting (circle) branches in a single source tree relative to the focal tree. Note the different scales: the maximum count for support/conflict is the number of internal branches in the smaller of the two trees (sometimes the focal tree, but more often the source tree). Absolute upper limits on counts are the following: Sibley and Ahlquist, 830; Prum et al., 190; New Synthesis, 10795. (For interpretation of the references to colour in this figure legend, the reader is referred to the web version of this article.)
added from studies published after 2015. This underscores the importance of not only large genomic studies, but also of trees of individual clades to the knowledge of bird phylogeny. Large phylogenies like those analyzed here and created as part of this study have many potential uses. They have been and continue to be used in large scale studies of biodiversity (e.g., Jetz et al., 2012; Cooney et al., 2017) and each has benefits and limitations. There are some benefits to the synthetic trees that can be produced as part of the Open Tree of Life (e.g., comprehensive), but they also carry limitations. As mentioned above, methods need to be developed to apply dates to the synthetic trees and supertrees that are produced as part of comprehensive analyses (Davis and Page, 2014; Hinchliff et al., 2015; Redelings and Holder, 2017). The development of methods for applying divergence times will dramatically improve the utility of comprehensive trees for further comparative studies. In addition to the larger comprehensive trees, the Open Tree of Life tree repository (McTavish et al., 2015), with phylogenetic hypotheses mapped to a common taxonomy (Rees and Cranston, 2017), becomes increasingly useful as it continues to grow. We demonstrate that these trees can be used to gauge how our understanding changes, to determine whether new studies are contributing new edges, and to localize the major sources of conflict. Furthermore, the relationships can serve to construct meaningful prior expectations for the resolution of clades across the tree of life (e.g., as topological priors in a Bayesian reconstruction). It is noteworthy that all these analyses depend on the availability of electronic
conflict to the Prum et al. (2015) tree as well, but in general we see high counts of supporting branches over the past 10 years. Finally, the new synthetic tree tends to have low levels of conflict with recent studies. This is to be expected, as the tree filtering and ranking procedure aimed to reflect community consensus in recognized relationships. Across the corpus of sampled trees we note a few relationships wherein recent studies are generally in agreement. For example, the sister relationship between parrots (Psittaciformes) and songbirds (Passeriformes), suggested by Ericson et al. (2006) and confirmed by Hackett et al. (2008), conflicts strongly with previous studies (e.g., the morphological analysis of Livezey and Zusi, 2007), but is supported by all but two subsequent studies (an mtDNA study by Brown et al. (2008), and the supertree of Hedges et al. (2015)). On the other hand, morphological support for the traditional ‘Falconiformes’ (including New World vultures (Cathartidae), hawks (Accipitridae), Secretarybird (Sagittariidae), and falcons (Falconidae)) displayed in Livezey and Zusi (2007) is rejected by all subsequent studies except the supertree of Davis and Page (2014). General consensus within the avian systematics community is thus borne out by our analysis of the Open Tree of Life tree repository. The synthetic tree reported here provides another interesting contrast. The largest contributing study to the edges of the synthetic tree was Jetz et al. (2012) with 4820 (Fig. 4). This is unsurprising, as Jetz et al. (2012) was, previously, the most comprehensive avian phylogeny published and expected to contribute many of the edges to the synthetic tree. However, 1890 edges were contributed after 2012, and 223 were 75
Molecular Phylogenetics and Evolution 116 (2017) 69–77
J.W. Brown et al.
tree files that are crucially important for any publication that posits phylogenetic hypotheses, but archiving such resources has not been common historically (Stoltzfus et al., 2012; Drew et al., 2013). Fortunately, image processing tools such as TreeSnatcher (Laubach et al., 2012) are available to resurrect unarchived results. As we continue to improve our view of the tree of life, it will be instructive to examine how congruence builds across major clades.
Biol. 6, 6. Burleigh, J.G., Kimball, R.T., Braun, E.L., 2015. Building the avian tree of life using a large-scale, sparse supermatrix. Mol. Phylogenet. Evol. 84, 53–63. Cooney, C.R., Bright, J.A., Capp, E.J., Chira, A.M., Hughes, E.C., Moody, C.J., Nouri, L.O., Varley, Z.K., Thomas, G.H., 2017. Mega-evolutionary dynamics of the adaptive radiation of birds. Nature 542, 344–347. Cracraft, J., Barker, F.K., Braun, M., Harshman, J., Dyke, Gareth J., Feinstein, J., Stanley, S., Cibois, A., Schikle, P., Beresford, P., García-Moreno, J., Sorenson, M.D., Yuri, T., Mindell, D.P., 2004. Phylogenetic relationships among modern birds (Neornithes): towards an avian tree of life. In: Cracraft, J., Donoghue, M.J. (Eds.), Assembling the Tree of Life. Oxford University Press, Oxford, GB, pp. 468–489. Davis, K.E., Page, R.D.M., 2014. Reweaving the Tapestry: a supertree of birds. PloS Currents: Tree of Life 1. http://dx.doi.org/10.1371/currents.tol. c1af68dda7c999ed9f1e4b2d2df7a08e. Drew, B.T., Gazis, R., Cabezas, P., Swithers, K.S., Deng, J., Rodriguez, R., Katz, L.A., Crandall, K.A., Hibbett, D.S., Soltis, D.E., 2013. Lost branches on the tree of life. PLoS Biol. 11, e1001636. Ericson, P.G.P., Anderson, C.L., Britton, T., Eizanowski, A., Johansson, U.S., Källersjö, M., Ohlson, J.I., Parsons, T.J., Zuccon, D., Mayr, G., 2006. Diversification of Neoaves: integration of molecular sequence data and fossils. Biol. Lett. 2, 543–547. Gatesy, J., Springer, M.S., 2004. A critique of matrix representation with parsimony supertrees. In: Phylogenetic Supertrees. Springer, Netherlands, pp. 369–388. Gill, F., Donsker, D., 2016. IOC World Bird List (v 6.2). http://dx.doi.org/10.14344/IOC. ML.6.2. Groth, J.G., Barrowclough, G.F., 1999. Basal divergences in birds and the phylogenetic utility of the nuclear RAG-1 gene. Mol. Phylogenet. Evol. 12, 115–123. Hackett, S.J., Kimball, R.T., Reddy, S., Bowie, R.C., Braun, E.L., Braun, M.J., Chojnowski, J.L., Cox, W.A., Han, K.L., Harshman, J., Huddleston, C.J., 2008. A phylogenomic study of birds reveals their evolutionary history. Science 320, 1763–1768. Harshman, J., 1994. Reweaving the tapestry: what can we learn from Sibley and Ahlquist (1990)? Auk 111, 377–388. Hedges, S.B., Marin, J., Suleski, M., Paymer, M., Kumar, S., 2015. Tree of life reveals clock-like speciation and diversification. Mol. Biol. Evol. 32, 835–845. Hinchliff, C.E., Smith, S.A., 2014. Some limitations of public sequence data for phylogenetic inference (in plants). PLoS ONE 9, e98986. Hinchliff, C.E., Smith, S.A., Allman, J.F., Burleigh, J.G., Chaudhary, R., Coghill, L.M., Crandall, K.A., Deng, J., Drew, B.T., Gazis, R., Gude, K., Hibbett, D.S., Katz, L.A., Laughinghouse IV, H.D., McTavish, E.J., Midford, P.E., Owen, C.L., Ree, R.H., Rees, J.A., Soltis, D.E., Williams, T., Cranston, K.A., 2015. Synthesis of phylogeny and taxonomy into a comprehensive tree of life. Proc. Natl. Acad. Sci. USA 112, 12764–12769. Ho, C.Y.K., Prager, E.M., Wilson, A.C., Osuga, D.T., Feeney, R.E., 1976. Penguin evolution: protein comparisons demonstrate phylogenetic relationship to flying aquatic birds. J. Mol. Evol. 8, 271–282. Jarvis, E.D., Mirarab, S., Aberer, A.J., Li, B., Houde, P., Li, C., Ho, S.Y.W., Faircloth, B.C., Nabholz, B., Howard, J.T., Suh, A., Weber, C.C., da Fonseca, R.R., Li, J., Zhang, F., Li, H., Zhou, L., Narula, N., Liu, L., Ganapathy, G., Boussau, B., Bayzid, M.S., Zavidovych, V., Subramanian, S., Gabaldón, T., Capella-Gutiérrez, S., Huerta-Cepas, J., Rekepalli, B., Munch, K., Schierup, M., Lindow, B., Warren, W.C., Ray, D., Green, R.E., Bruford, M.W., Zhan, X., Dixon, A., Li, S., Li, N., Huang, Y., Derryberry, E.P., Bertelsen, M.F., Sheldon, F.H., Brumfield, R.T., Mello, C.V., Lovell, P.V., Wirthlin, M., Schneider, M.P.C., Prosdocimi, F., Samaniego, J.A., Velazquez, A.M.V., Alfaro-Núñez, A., Campos, P.F., Petersen, B., Sicheritz-Ponten, T., Pas, A., Bailey, T., Scofield, P., Bunce, M., Lambert, D.M., Zhou, Q., Perelman, P., Driskell, A.C., Shapiro, B., Xiong, Z., Zeng, Y., Liu, S., Li, Z., Liu, B., Wu, K., Xiao, J., Yinqi, X., Zheng, Q., Zhang, Y., Yang, H., Wang, J., Smeds, L., Rheindt, F.E., Braun, M., Fjeldsa, J., Orlando, L., Barker, F.K., Jønsson, K.A., Johnson, W., Koepfli, K.-P., O’Brien, S., Haussler, D., Ryder, O.A., Rahbek, C., Willerslev, E., Graves, G.R., Glenn, T.C., McCormack, J., Burt, D., Ellegren, H., Alström, P., Edwards, S.V., Stamatakis, A., Mindell, D.P., Cracraft, J., Braun, E.L., Warnow, T., Jun, W., Gilbert, M.T.P., Zhang, G., 2014. Whole-genome analyses resolve early branches in the tree of life of modern birds. Science 346, 1320–1331. Jetz, W., Thomas, G.H., Joy, J.B., Hartmann, K., Mooers, A.O., 2012. The global diversity of birds in space and time. Nature 491, 444–448. Kimball, R.T., Wang, N., Heimer-McGinn, V., Ferguson, C., Braun, E.L., 2013. Identifying localized biases in large datasets: a case study using the avian tree of life. Mol. Phylogenet. Evol. 69, 1021–1032. Laubach, T., von Haeseler, A., Lercher, M.J., 2012. TreeSnatcher plus: capturing phylogenetic trees from images. BMC Bioinf. 13, 110. Lin, Q., Fan, S., Zhang, Y., Xu, M., Zhang, H., Yang, Y., Lee, A.P., Woltering, J.M., Ravi, V., Gunter, H.M., Luo, W., 2016. The seahorse genome and the evolution of its specialized morphology. Nature 540, 395–399. Livezey, B.C., Zusi, R.L., 2007. Higher-order phylogeny of modern birds (Theropoda, Aves: Neornithes) based on comparative anatomy. II. Analysis and discussion. Zool. J. Linnean Soc. 149, 1–95. McCormack, J.E., Harvey, M.G., Faircloth, B.C., Crawford, N.G., Glenn, T.C., Brumfield, R.T., 2013. A phylogeny of birds based on over 1,500 loci collected by target enrichment and high-throughput sequencing. PLoS ONE 8, e54848. McTavish, E.J., Hinchliff, C.E., Allman, J.F., Brown, J.W., Cranston, K.A., Holder, M.T., Rees, J.A., Smith, S.A., 2015. Phylesystem: A git-based data store for communitycurated phylogenetic estimates. Bioinformatics 31, 2794–2800. Moyle, R.G., Oliveros, C.H., Andersen, M.J., Hosner, P.A., Benz, B.W., Manthey, J.D., Travers, S.L., Brown, R.M., Faircloth, B.C., 2016. Tectonic collision and uplift of Wallacea triggered the global songbird radiation. Nature Commun. 7, 12709. Price, T.D., Hooper, D.M., Buchanan, C.D., Johansson, U.S., Tietze, D.T., Alström, P., Olsson, U., Ghosh-Harihar, M., Ishtiaq, F., Gupta, S.K., Martens, J., Harr, B., Singh, P.,
4. Conclusion A fundamental goal for the field of evolutionary biology and systematics is the resolution and construction of a complete tree of life. The resources for constructing comprehensive trees (e.g., phylogenetic trees, molecular and morphological data sets, and comprehensive taxonomies) are becoming available and are now of the quality that we can not only begin to construct complete trees, but also refine and identify where more work is needed. Here, we demonstrate that, while we have made major strides in our knowledge of some clades, new studies continue to contribute new edges that resolve previously ambiguous relationships. We make this observation on Aves, a relatively well-studied and small clade of the tree of life. Other parts of the tree of life that are likely to have a lower density of phylogenetic information in the form of published phylogenies or molecular data still need significant more work before they may be comprehensively resolved. Data availability All the software and data used for this study are freely available in repositories online. For the source trees and software used in the analyses, please see https://bitbucket.org/blackrim/opentree_birds. Acknowledgements We would like to thank the following for comments on previous versions of this manuscript: Ben Winger, Karen Cranston, four anonymous reviewers, and members of the Smith lab. We especially thank Edward Braun for a particularly thorough and thoughtful review of a previous draft. We thank Ben Redelings and Mark Holder for discussions on the definitions of conflict and concordance. JWB thanks Jon Hill and Katie Davis for discussions on the vexations of avian taxonomy/supertree logistics, and John Harshman for discussions regarding the Sibley and Ahlquist (1990). JWB, SAS, and NW were supported by NSF DEB AVATOL grant #1207915. Author contributions JWB and NW gathered the input source trees and constructed the synthetic tree. SAS and JWB wrote code to perform the analyses. JWB, NW, and SAS wrote the manuscript and conducted the analyses. Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ympev.2017.08.002. References Barker, F.K., Burns, K.J., Klicka, J., Lanyon, S.M., Lovette, I.J., 2015. New insights into New World biogeography: an integrated view from the phylogeny of blackbirds, cardinals, sparrows, tanagers, warblers, and allies. Auk 132, 333–348. Barrowclough, G.F., Cracraft, J., Klicka, J., Zink, R.M., 2016. How many kinds of birds are there and why does it matter? PLoS ONE 11, e0166307. Brockington, S.F., Yang, Y., Gandia-Herrero, F., Covshoff, S., Hibberd, J.M., Sage, R.F., Wong, G.K., Moore, M.J., Smith, S.A., 2015. Lineage-specific gene radiations underlie the evolution of novel betalain pigmentation in Caryophyllales. New Phytol. 207, 1170–1180. Brown, J.W., Rest, J.S., García-Moreno, J., Sorenson, M.D., Mindell, D.P., 2008. Strong mitochondrial DNA support for a Cretaceous origin of modern avian lineages. BMC
76
Molecular Phylogenetics and Evolution 116 (2017) 69–77
J.W. Brown et al. Mohan, D., 2014. Niche filling slows the diversification of Himalayan songbirds. Nature 509, 222–225. Prum, R.O., Berv, J.S., Dornburg, A., Field, D.J., Townsend, J.P., Lemmon, E.M., Lemmon, A.R., 2015. A comprehensive phylogeny of birds (Aves) using targeted next-generation DNA sequencing. Nature 526, 569–573. Rabosky, D.L., Santini, F., Eastman, J., Smith, S.A., Sidlauskas, B., Chang, J., Alfaro, M., 2013. Rates of speciation and morphological evolution are correlated across the largest vertebrate radiation. Nature Commun. 4, 1958. Reddy, S., Kimball, R.T., Pandey, A., Hosner, P.A., Braun, M.J., Hackett, S.J., Han, K.-L., Harshman, J., Huddleston, C.J., Kingston, S., Marks, B.D., Miglia, K.J., Moore, W.S., Sheldon, F.H., Witt, C.C., Yuri, T., Braun, E.L., 2017. Why do phylogenomic data sets yield conflicting trees? Data type influences the avian tree of life more than taxon sampling. Syst. Biol. http://dx.doi.org/10.1093/sysbio/syx041. Redelings, B.D., Holder, M.T., 2017. A supertree pipeline for summarizing phylogenetic and taxonomic information for millions of species. PeerJ 5, e3058. Rees, J., Cranston, K.A., 2017. Automated assembly of a reference taxonomy for phylogenetic data synthesis. Biodiv. Data J. 5, e12581. Sanderson, M.J., Donoghue, M.J., Piel, W., Eriksson, T., 1994. TreeBASE: A prototype database of phylogenetic analyses and an interactive tool for browsing the phylogeny of life. Am. J. Bot. 81, 183. Scholl, J.P., Wiens, J.J., 2016. Diversification rates and species richness across the Tree of Life. Proc. Roy. Soc. B 283, 20161334. Selvatti, A.P., Gonzaga, L.P., de Moraes Russo, C.A., 2015. A Paleogene origin for crown passerines and the diversification of the Oscines in the New World. Mol. Phylogenet. Evol. 88, 1–15. Sibley, C.G., Ahlquist, J.E., 1990. Phylogeny and Classification of Birds: A Study in Molecular Evolution. Yale University Press. Simion, P., Philippe, H., Baurain, D., Jager, M., Richter, D.J., Di Franco, A., Roure, B., Satoh, N., Quéinnec, É., Ereskovsky, A., Lapébie, P., 2017. A large and consistent
phylogenomic dataset supports sponges as the sister group to all other animals. Curr. Biol. 27, 958–967. Smith, S.A., Donoghue, M., 2008. Rates of molecular evolution are linked to life history in flowering plants. Science 322, 86–89. Smith, S.A., Stamatakis, A., 2013. Inferring and postprocessing huge phylogenies. Biological Knowledge Discovery Handbook: Preprocessing, Mining, and Postprocessing of Biological Data, pp. 1049–1072. Smith, S.A., Moore, M.J., Brown, J.W., Yang, Y., 2015. Analysis of phylogenomic datasets reveals conflict, concordance, and gene duplications with examples from animals and plants. BMC Evol. Biol. 15, 150. Stoltzfus, A., O'Meara, B., Whitacre, J., Mounce, R., Gillespie, E.L., Kumar, S., Rosauer, D.F., Vos, R.A., 2012. Sharing and re-use of phylogenetic trees (and associated data) to facilitate synthesis. BMC Res. Notes 5, 574. Strauch, J.G., 1978. The phylogeny of the Charadriiformes (Aves): a new estimate using the method of character compatibility analysis. Trans Zool. Soc. Lond. 34, 263–345. Suh, A., Smeds, L., Ellegren, H., 2015. The dynamics of incomplete lineage sorting across the ancient adaptive radiation of neoavian birds. PLoS Biol. 13, e1002224. Suh, A., 2016. The phylogenomic forest of bird trees contains a hard polytomy at the root of Neoaves. Zool. Scripta 45, 50–62. Wang, N., Braun, E.L., Kimball, R.T., 2011. Testing hypotheses about the sister group of the Passeriformes using an independent 30-locus data set. Mol. Biol. Evol. 29, 737–750. Zanne, A.E., Tank, D.C., Cornwell, W.K., Eastman, J.M., Smith, S.A., FitzJohn, R.G., McGlinn, D.J., Moles, A.T., O'Meara, B.C., Royer, D.L., Wright, I.J., Aarssen, L., Bertin, R.I., Govaerts, R., Hemmings, F., Leishman, M.R., Oleksyn, J., Reich, P.B., Sargent, R., Soltis, D.E., Soltis, P.S., Stevens, P.F., Swenson, N.G., Warman, L., Westoby, M., Beaulieu, J.M., 2014. Three keys to the radiation of angiosperms into freezing environments. Nature 506, 89–92.
77