Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics

Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics

Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx Contents lists available at ScienceDirect Molecular Phylogenetics and Evolution journal hom...

946KB Sizes 0 Downloads 29 Views

Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

Contents lists available at ScienceDirect

Molecular Phylogenetics and Evolution journal homepage: www.elsevier.com/locate/ympev

Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics Sen Zhao, Kamran Shalchian-Tabrizi ⇑, Dag Klaveness ⇑ Microbial Evolution Research Group (MERG), Department of Biology, University of Oslo, Norway

a r t i c l e

i n f o

Article history: Received 19 February 2013 Revised 1 August 2013 Accepted 9 August 2013 Available online xxxx Keywords: Phylogenomics Collodictyon Thecamonas Transcriptome Mitochondrial sequences

a b s t r a c t Recently, phylogenomic analyses have been used to assign the vast majority of eukaryotes into only a handful of supergroups. However, a few enigmatic lineages still do not fit into this simple picture. Such lineages may have originated early in the history of eukaryotes and are therefore of key importance in deduction of cellular evolution. In this study, we focus on two deeply diverging lineages, Diphyllatea and Thecamonadea. They are classified in the same phylum, Sulcozoa, but previous multigene phylogenetic analyses have included only one of these two lineages. It is therefore unclear whether they constitute one group or two distinct lineages. The study of rare genomic changes reveals that both have the fused dihydrofolate reductase (DHFR) and thymidylate synthase (TS) genes (i.e. DHFR–TS), which are separated in all other unikonts that have been investigated, indicating a possible close relationship. Their phylogenetic positions have implications for the classification of Sulcozoa and the early eukaryote evolution. Here we present a phylogenomic analysis of these species that include Illumina and 454 transcriptome data from two Collodictyon strains. A total of 42 mitochondrial proteins, which correspond to orthologs published from Thecamonas trahens (Thecamonadea), were used to reconstruct their phylogenies. In the resulting trees, Collodictyon appears as sister to Amoebozoa, whereas Thecamonas branches as the closest relative of Opisthokonta (i.e. the animal, fungi and unicellular Choanozoa). In contrast, the position of another early diverging eukaryote, Malawimonas, is unresolved. The separation of Collodictyon and Thecamonas in our studies suggests that the recently proposed Sulcozoa group is most likely paraphyletic. Furthermore, the data support the hypothesis that the two supergroups Opisthokonta and Amoebozoa, which comprise a great diversity of eukaryotes, have originated from a sulcozoan ancestor. Crown Copyright Ó 2013 Published by Elsevier Inc. All rights reserved.

1. Introduction Over the past few years, genomic sequences have been used to resolve the main branching patterns in the eukaryotic tree of life. Such phylogenomic analyses have divided the eukaryotic tree into a small number of so-called supergroups (Cavalier-Smith, 1998; Keeling et al., 2005; Parfrey et al., 2006; Simpson and Roger, 2004). These supergroups encompass the vast majority of the extant eukaryotes, but there are still certain enigmatic lineages that have unclear evolutionary origin and classification. They may constitute ancient and distinct lineages that derived very early in the evolution of eukaryotes (Brown et al., 2012; Burki et al., 2009; Minge et al., 2009; Ruiz-Trillo et al., 2008; Shalchian-Tabrizi et al., 2008; Yabuki et al., 2011). Several of these branches could be sister or closely related to one or a few supergroups. The study of their ⇑ Corresponding authors. Address: Kristine Bonnevies hus, Blindernveien 31, 0371 Oslo, Norway. E-mail addresses: [email protected] (K. Shalchian-Tabrizi), dag.klaveness@ ibv.uio.no (D. Klaveness).

phylogenies is important for understanding the evolutionary events that have given rise to eukaryote diversity, and revealing key innovations that aid in deducing the ancient features of eukaryotic cells (Cavalier-Smith, 2013). Recent phylogenomic analyses have consistently assigned most eukaryotes to the supergroups Opisthokonta, Amoebozoa, Archaeplastida and SAR (i.e. Stramenopila, Alveolata and Rhizaria; Bapteste et al., 2002; Burki et al., 2007; Rodriguez-Ezpeleta et al., 2005; Torruella et al., 2012). Another two assemblages, Hacrobia and Excavata, have been suggested but are not always supported as monophyletic groups (Burki et al., 2009; Hampl et al., 2009; Patron et al., 2007). In this study, we focus on two of the lineages that are often separated from these six supergroups, Diphyllatea (earlier named Diphyllatia) and Thecamonadea, which are both classified in the phylum Sulcozoa (see Cavalier-Smith, 2013). Among these, Diphyllatea has been placed between the earlier proposed megagroup unikonts (i.e. Opisthokonta plus Amoebozoa; also named podiates) and bikonts (all other eukaryotes) in a 124-gene tree, close to the putative root of eukaryotes (Zhao et al., 2012). In contrast, Thecamonadea has a sister relationship to Opisthokonta

1055-7903/$ - see front matter Crown Copyright Ó 2013 Published by Elsevier Inc. All rights reserved. http://dx.doi.org/10.1016/j.ympev.2013.08.005

Please cite this article in press as: Zhao, S., et al. Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.08.005

2

S. Zhao et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

(Torruella et al., 2012). They share a fusion of dihydrofolate reductase (DHFR) and thymidylate synthase (TS) genes (i.e. DHFR–TS), which are separated in all other unikonts that have been investigated. This result suggests that the two lineages may be related to each other. However, because no multigene phylogeny has ever included both lineages, it is unclear whether they belong to a single group or two distinct lineages. In addition, Diphyllatea and Thecamonadea have shown similar attraction to Malawimonadea in independent phylogenomic analyses (Derelle and Lang, 2012; Zhao et al., 2012), and thereby all three lineages may constitute a new group. Alternatively, the different positions of Diphyllatea and Thecamonadea in earlier reported trees reflect separate origins, whereas Malawimonadea could have been misplaced due to phylogenetic artifacts (Hampl et al., 2009; Cavalier-Smith, 2013). Resolving the mutual phylogenetic relationships of the Diphyllatea and Thecamonadea is crucial because it is significant for the understanding of the classification of Sulcozoa and the origin of Opisthokonta and Amoebozoa (Cavalier-Smith, 2013). In particular, the phylogenetic positions of these two taxa have implications on whether Sulcozoa is paraphyletic or holophyletic (Glucksman et al., 2013). To address these questions, we have obtained a new strain of Collodictyon (strain KIINB, isolated from Lake Inba, Japan) and performed a transcriptome survey with Illumina sequencing. This dataset was analyzed together with earlier 454 transcriptome sequences from Collodictyon triciliatum (strain Årungen, Norway; both representing Diphyllatea) to identify a large number of orthologous genes corresponding to the published sequences from Thecamonas trahens (representing Thecamonadea) and Malawimonas jakobiformis and M. californiana (representing Malawimonadea) (Derelle and Lang, 2012). In total, 42 mitochondrial-derived genes were used to construct single- and multigene alignments (consisting of 11,384 amino acid characters and 84 selected taxa) for maximum likelihood and Bayesian phylogenomic inferences. 2. Materials and methods Methods for culturing Collodictyon sp. KIINB, cDNA construction, Illumina sequencing and contig assembly are described in the supplementary materials. 454 pyrosequencing and assembly of Collodictyon triciliatum transcriptome are presented in Zhao et al., 2012. 2.1. Sequence alignment construction The phylogenomic dataset was constructed from recently published seed protein alignments (Derelle and Lang, 2012) using the following procedure. (1) Screening and identification of homologs: sequences in the seed single-gene alignments were used as queries in BlastP searches against both Illumina and 454 translated contigs of both Collodictyon strains. To improve taxon sampling, Blast searches were repeated against proteomes, mitochondrial genomes and assembled ESTs data of 29 additional species (i.e. 1 Katablepharid, 3 Excavata, 2 Amoebozoa, 8 Opisthokonta, 3 Cryptomonads, 8 Archaeplastida and 4 SAR; see Table S1). The detected homologs (evalue < 10 10) were pooled into the corresponding single-gene dataset. (2) Building and trimming of alignment: all singlegene datasets were aligned separately with MAFFT using the L-INSi algorithm (Katoh et al., 2002). Ambiguously aligned positions were automatically removed with Gblocks using default parameters (Castresana, 2000). (3) Orthologs assessment: the orthology or possible foreign origin (contamination or horizontal gene transfer) in the newly added sequences was assessed by manual inspection of single-gene trees constructed with RAxML v7.2.8 (Stamatakis,

2006); 100 maximum likelihood (ML) trees searches and 100 ML bootstrap replicates under the LG + GAMMA model. The sequences were excluded from downstream analysis if their phylogenetic placements had a supported position (i.e. bootstrap value >70%) that deviates from published global eukaryotic trees (Burki et al., 2012; Zhao et al., 2012). (4) Supermatrix assembly: the cleaned and masked single-gene alignments were concatenated into a supermatrix using Scafos (Roure et al., 2007). The supermatrix contains 11,348 amino acid positions and 84 taxa, with an average of 31.3% missing characters (see Table S2). All single-gene trees and alignments and the concatenated supermatrix are available at: http://www.mn.uio.no/ibv/english/people/aca/kamran/data/. 2.2. Phylogenomic analyses Two evolutionary models, the site-homogeneous LG model and the site-heterogeneous mixture CAT model, were separately applied to the phylogenomic inferences. The fitness of these models was evaluated by the cross-validation (CV) test implemented in Phylobayes v3.3b (Lartillot and Philippe, 2004). The supermatrix was randomly split (without replacement) into 10 learning replicates, each constituting 90% (10,213 amino acids) of the original alignement, and 10 test replicates, each constituting 10% (1135 amino acids) of the original alignment. For each replicate, a Markov Chain Monte Carlo (MCMC) was run for 4000 cycles with a 25% burnin using the CAT model and 2000 cycles with a 25% burnin using the LG model. The average CV log likelihood scores over 10 replicates of each model were calculated from the post burnin cycles. The statistical fit of the CAT model was preferable to that of the LG model (a likelihood score of 2210 ± 458). Bayesian inference was performed with Phylobayes v3.3b under the best-fit model (CAT) in combination with the four gamma categories approximating site-rate variation. Two independent chains were run for a total of 20,000 cycles, and covergence between the chains was determined by examining the discrepancy among the bipartitions (maxdiff < 0.15). The consensus topology and posterior probability were calculated from saved trees after a burnin of 7000 cycles. The analysis of CAT-bootstrap proportions was performed on 100 pseudoreplicates generated by Seqboot (Phylip package; Felsenstein, 2001). We ran 8000 cycles for each replicate under the CAT + GAMMA model with a burnin of 4000 cycles. The resulting 100 trees were used to calculate bootstrap support using Consense (Phylip package; Felsenstein, 2001). Due to the computational burden, only one dataset was subjected to the CAT-bootstrap analysis (corresponding to Fig. 1D). In addition, we recoded the amino acids (AA) into six functional categories (Dayoff6 class) to reduce possible compositional biases. The recoded data were analyzed with the Phylobayes CAT model and two independent MCMC chains with 15,000 cycles and a burnin of 5000 cycles. The ML analyses were performed using RAxML v7.2.8. The best topology was established under the PROTGAMMALGF model with 16 random starting trees. Bootstrap support was evaluated with 100 pseudoreplicates under the PROTCATLGF model. Moreover, we considered the evolutionary tempo and mode of each protein composing the concatenated alignment during ML analyses. This investigation was conducted by specifying separate evolutionary models for each single gene alignment (Table S3), which was defined using ProtTest (Darriba et al., 2011). 2.3. Removal of fast evolving sites To increase the phylogenetic versus non-phylogenetic signal, we progressively removed fast evolving sites from the concatenated alignment. Specifically, the evolutionary rates of sites were estimated using AIR under the LG + GAMMA + F model (Kumar

Please cite this article in press as: Zhao, S., et al. Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.08.005

S. Zhao et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

A

3

B

C

D

Fig. 1. Bayesian phylogenomic analyses of Collodictyon and Thecamonas. The topologies are inferred under the CAT + GAMMA model with Phylobayes v3.3b. (A) The tree is constructed with 84 taxa and 11,384 characters. Numbers on the nodes show the posterior probabilities (PP) calculated from the original/trimmed (i.e. 15% fastest evolving site removal) alignments. Nodes with 1.00 PP are marked by a bullet. The taxon Rhizaria is chimerical and composed of sequences from Bigelowiella natans and Paracercomonas marina. (B) The tree is a reduced version of Fig. 1A. (C) The tree is constructed with 82 taxa (i.e. Collodictyon excluded); (D) The tree is constructed with 83 taxa (i.e. Malawimonas excluded). The trees C–D are reduced illustrations of the original topologies (full trees are shown in Figs. S4A and S4B). Bayesian posterior probabilities, bootstrap CAT proportions (i.e. CAT-BP in Fig. 1D) and ML bootstrap values (uniform/separate model) of key nodes labeled by number are listed in the separate tables to the right. Support values were calculated from both original alignment (Full) and the alignments without the 15% fastest evolving sites (15%). Star ‘⁄’ indicates the corresponding nodes were not observed in the ML bootstrap consensus tree. All phylogenetic trees are rooted using the alpha-proteobacteria as the outgroup.

et al., 2009). As the phylogeny used for site rate estimation can influence the results (Philippe et al., 2011), we applied eight different topologies as input trees (Fig. S7), and the site rates from all eight calculations were averaged before further analyses. The sites were subsequently removed in 5% intervals (i.e. removal of the 5–45% fastest evolving sites) from the full alignment. Finally, these trimmed alignments were used for phylogenomic analyses using RAxML v7.2.8 under the PROTCATLGF model and 100 bootstrap replicates. However, because the use of different tree topologies might influence the calculation of evolutionary rates, we

investigated the distribution of site rates separately for all eight datasets. Comparisons of the data demonstrated nearly identical distributions of site rates for the eight datasets (Fig. S8). Nevertheless, we still implemented the average site rates from four datasets lacking prokaryotes (Figs. S7.5–S7.8) and subsequently removed the 15% fastest evolving sites. The use of the ML method described above resulted in essentially the same tree topology and similar bootstrap values as previously obtained (Fig. S9). We are therefore confident that the procedure used here for rate estimation was robust and not significantly affected by the number of input tree

Please cite this article in press as: Zhao, S., et al. Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.08.005

4

S. Zhao et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

topologies. All bioinformatic analyses were conducted on the freely available Bioportal at the University of Oslo (Kumar et al., 2009; http://www.bioportal.uio.no). 2.4. Tree topology test Alternative tree topologies were assessed with an approximately unbiased (AU) test. Two different datasets were used in the test: one with complete taxon sampling and the other with Malawimonas excluded. All tree topologies were constructed manually (Fig. S5) and used for the calculation of site likelihood under the PROTGAMMALGF model implemented in RAxML v7.2.8. The AU test was performed with CONSEL (Shimodaira and Hasegawa, 2001). 3. Results and discussion 3.1. Collodictyon and Thecamonas are separated from Malawimonas In our phylogenies, Malawimonas was the most unstable taxon. The position was strongly influenced by the phylogenetic method and taxa that were used. To investigate the placement of Collodictyon and Thecamonas and their putative relationship to Malawimonas, we reconstructed Bayesian trees with the favored CAT model on several permuted datasets. First, using the original dataset, Malawimonas was placed within the unikonts, close to the base of Amoebozoa (PP = 0.69; Fig. 1A) and separated from both Collodictyon and Thecamonas. Second, we removed the taxa with more than 60% missing characters to assess the impact of missing data. Third, data recoding of the protein sequences into six amino acid categories, which reduces amino acid compositional biases, was performed. However, neither of these two analyses altered the separation of these lineages (Figs. S2A and S3). Furthermore, we assessed the impact of taxon sampling on the phylogeny of Malawimonas by removing Collodictyon from the dataset. Importantly, Malawimonas shifted position in the resulting tree and became sister to all unikonts (Figs. 1C, S1B and S4A), similar to previous observations (Brown et al., 2012; Hampl et al., 2009; Rodriguez-Ezpeleta et al., 2007; Zhao et al., 2012). This demonstrates that the position of Malawimonas is highly sensitive to taxon sampling and the inclusion of Collodictyon. It remained at this basal position even after the exclusion of the taxa with more than 60% missing characters (Fig. S2B) or the removal of up to the 15% fastest evolving sites (Fig. 2A). In all, using the CAT model, neither Collodictyon nor Thecamonas was recovered as sister to Malawimonas. Hence, the data did not support earlier reported trees in which these lineages were clustered together (Derelle and Lang, 2012; Zhao et al., 2012). In contrast to the Bayesian analyses with the CAT model, the ML trees generated with the less fitting LG model showed that Malawimonas weakly grouped with Collodictyon or Thecamonas in both Bayesian and ML trees. Based on the original dataset, Malawimonas branched as sister to Collodictyon (58% BP; Fig. S1A), whereas it clustered with Thecamonas after the deletion of a large number of fast evolving sites (30–45%); but this phylogeny always gained lower than 50% BP (Fig. 2A). Removing such a large portion of sites also weakened the bootstrap support for bikonts and SAR (both < 75% BP) as well as Opisthokonta (<80% BP) (Fig. 2B). This result indicates that a massive loss of valuable phylogenetic information occurred, raising serious questions about this particular topology. Nevertheless, the assessment of topologies with the AU test (under a ML framework) could not exclude the possibility that Malawimonas is sister to Thecamonas (P = 0.258) or Thecamonas plus Opisthokonta (P = 0.205, Fig. S5A). Furthermore, the position that Malawimonas was placed within Excavata or close to bikonts could not be rejected in the AU test, although we never observed these phylogenies in our trees (Fig. S5A).

3.2. Collodictyon and Thecamonas – sisters of two eukaryotic supergroups? The indefinite position of Malawimonas could have caused lowered statistical support for the entire unikonts branch. Accordingly, removing Malawimonas from the data significantly increased the support values (71% and 74% BP) for unikonts under the uniform and separate models (Figs. 1D and S1C). Importantly, the removal of Malawimonas did not change the phylogenetic position of either Collodictyon or Thecamonas in both Bayesian and ML trees. It appears that their positions are less dependent on taxon sampling than Malawimonas. In the further assessment of the positions of Collodictyon and Thecamonas, we therefore excluded Malawimonas from the data. With this dataset, Collodictyon was recovered as sister to Amoebozoa (0.73 PP, 68% and 70% BP in Figs. 1D and S1C) and Thecamonas always branched with Opisthokonta (0.99 PP, 89% and 91% BP in Figs. 1D and S1C). These results are consistent with the tree in Fig. 1A. The removal of the fastest evolving sites increased the support for the clustering of Collodictyon with Amoebozoa, up to 81% BP when 15% sites were deleted (Fig. 2C). AU test on this trimmed alignment rejected the possibility that Collodictyon branches with either Thecamonas (P = 0.038) or Opisthokonta (P = 0.009), and favored an affinity to Amoebozoa. Yet an alternative placement as a sister group to all unikonts could not be rejected (P = 0.365; Fig. S5B). Hence, our data suggested Collodictyon and Thecamonas as two distinct lineages with affinity to the two supergroups Amoebozoa and Opisthokonta. A previous phylogenomic analysis indicated Collodictyon at the split between unikonts and bikonts, but a relationship to Amoebozoa could not be excluded (Zhao et al., 2012; Yabuki et al., 2013). Here, we inferred the phylogeny from a new and independent mitochondrial dataset (e.g. only three genes overlapping with the dataset used in Zhao et al., 2012), which consistently showed it close to Amoebozoa. However, the AU topology assessment could not reject a deeper placement. Hence, the data presented here and in Zhao et al., 2012 are not contradictory but together provide stronger support for Collodictyon as one of the earliest diverging eukaryotic lineages. 3.3. Use of outgroup did not affect the position of Collodictyon and Thecamonas Unlike a previous phylogenomic analysis of Collodictyon (Zhao et al., 2012), our study included several alpha-proteobacteria as an outgroup. Using these sequences, the putative root of the eukaryotic tree separated the Opisthokonta and Amoebozoa from the other eukaryotes (i.e. bikonts; see Fig. 1) as earlier discussed (Richards and Cavalier-Smith, 2005; Roger and Simpson, 2009; Derelle and Lang, 2012). Although most of the eukaryotic supergroups (e.g. Opisthokonta, Amoebozoa, Excavata, Archaeplastida and SAR) were recovered with moderate to high statistical support values in our analyses (Fig. 1A), the inclusion of an outgroup in our data could potentially affect the phylogeny of these deep branching lineages. Therefore, we reconstructed the tree without the alpha-proteobacteria sequences. The phylogenomic analyses showed essentially the same topology, in which Malawimonas was sister to the unikonts (Fig. S6A), Collodictyon was closely related to Amoebozoa and Thecamonas grouped with Opisthokonta (Fig. S6B). Hence, irrespective of the inclusion of outgroup sequences, the data presented here support the separation of Collodictyon and Thecamonas. 3.4. Paraphyly of Sulcozoa and early evolution of unikonts Regardless of the placement of Collodictyon at the split of the unikonts-bikonts (Zhao et al., 2012) or close to Amoebozoa (in this study), all multigene phylogenies presented so far show Diphyllatea

Please cite this article in press as: Zhao, S., et al. Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.08.005

5

B ● ● ●

20



0

0 0

10

20

30

40

50

% sites removed

0

Bikonts SAR Excavata Opisthokonta

10

20

30

40

50

% sites removed

C

0

20

40

60

80

100









bootstrap value



60

80

● ●

40

40

60

bootstrap value

80

100

A

20

bootstrap value

100

S. Zhao et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

0

10

20

30

40

50

% sites removed Fig. 2. Changes of bootstrap support for key branches and major supergroups in the inferred trees relative to the percentage of the fastest evolving sites removed. (A and B) Sites were deleted in 5% increments from the alignment consisting of 82 taxa (i.e. excluding Collodictyon). (C) Sites were deleted from the alignment consisting of 83 taxa (i.e. excluding Malawimonas). Mala and Coll are the abbreviations of Malawimonas and Collodictyon. ML bootstrap values were calculated under the PROTCATLGF model in RAxML v7.2.8. The horizontal dashed lines mark the 80% bootstrap support threshold.

Fig. 3. Evolutionary position of Sulcozoa. The tree is rooted based on the unikonts-bikonts dichotomy hypothesis (Richards and Cavalier-Smith, 2005). The arrow indicates the alternative rooting position (i.e. between Euglenozoa and the other eukaryotes), as suggested by Cavalier-Smith (2010). The box highlighted in red shows the newly proposed protozoan phylum Sulcozoa (Cavalier-Smith, 2013). The red dashed lines indicate the possible phylogenetic position of Diphyllatea in the eukaryotic tree of life. The black dashed line suggests the unstable position of Malawimonadea; its alternative placements in the eukaryotic tree of life are shown by solid circles. (For interpretation of the references to color in this figure legend, the reader is referred to the web version of this article.)

Please cite this article in press as: Zhao, S., et al. Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.08.005

6

S. Zhao et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx

as a deep lineage at the base of the unikonts (Fig. 3). Recently, Diphyllatea has been classified into a new phylum, Sulcozoa, together with Thecamonadea and a few other basal branches of unikonts based on cellular features interpreted as shared ancestral characters (Cavalier-Smith, 2013). For instance, a microtubule-supported feeding groove, ventral pointed pseudopodia and a dorsal semi-rigid pellicle have been proposed as homologous phenotypes that evolved in the ancestor of Sulcozoa (Cavalier-Smith, 2013). Here we present the multigene phylogeny of Sulcozoa including both the two subphyla Apusozoa (represented by Thecomonadea) and Varisulca (represented by Diphyllatea). Given that the sulcozoan cell structures are homologous and that Thecomonadea and Diphyllatea have evolved from a sulcozoan ancestor (CavalierSmith, 2013), our trees suggest that Sulcozoa is paraphyletic and not holophyletic (Fig. 3). The evidence for the paraphyly of this phylum is significant because it implies that the two supergroups Opisthokonta and Amoebozoa may have evolved from a sulcozoan ancestor through the transformation and loss of typical sulcozoan cellular structures (Cavalier-Smith, 2013; Fig. 3). However, it should be noted that the phylogenomic analyses presented here lack other early diverging lineages belonging to Sulcozoa, i.e. Discocelida, Mantamonadia, Rigifilida and Breviata. Although the phylogenies need to be further confirmed with a wider sample of taxa and more gene sequences, it is evident that Diphyllatea and Thecamonadea are crucial for the reconstruction of the putative common ancestor of unikonts and the origin of Opisthokonta and Amoebozoa.

4. Conclusions In this study, we have assembled mitochondrial genes from a broad range of eukaryotes, including previously neglected enigmatic lineages. Phylogenomic analyses based on these data favor the hypothesis that Diphyllatea and Thecamonadea are separate and deep eukaryotic lineages. Thecamonadea is strongly grouped with Opisthokonta; Diphyllatea shows an affinity to Amoebozoa, but a deeper placement in the tree obtained from other sequence data (Zhao et al., 2012) is difficult to exclude. The split between Thecamonadea and Diphyllatea strongly suggests that Sulcozoa is most likely paraphyletic and has given rise to the Opisthokonta and Amoebozoa supergroups.

Acknowledgments We thank Koichi Watanabe and Akinori Yabuki for providing the Collodictyon strain used here. We are grateful of B. Franz Lang for sharing mitochondrial gene alignments of Andalucia, Seculamonas and Histiona. Illumina sequencing and bioinformatics services were conducted in Norwegian Sequencing Center and Bioportal at University of Oslo (UiO). We thank Dr. Russell J. S. Orr for discussions and comments on the manuscript. This work was supported by a research grant from University of Oslo to DK and KST as well as Ph.D. fellowship to SZ.

Appendix A. Supplementary material Supplementary data associated with this article can be found, in the online version, at http://dx.doi.org/10.1016/j.ympev.2013.08. 005.

Reference Bapteste, E., Brinkmann, H., Lee, J.A., Moore, D.V., Sensen, C.W., Gordon, P., Durufle, L., Gaasterland, T., Lopez, P., Muller, M., Philippe, H., 2002. The analysis of 100 genes supports the grouping of three highly divergent amoebae: Dictyostelium, Entamoeba, and Mastigamoeba. Proc. Natl. Acad. Sci. USA 99, 1414–1419. Brown, M.W., Kolisko, M., Silberman, J.D., Roger, A.J., 2012. Aggregative multicellularity evolved independently in the eukaryotic supergroup Rhizaria. Curr. Biol. 22, 1123–1127. Burki, F., Shalchian-Tabrizi, K., Minge, M., Skjaeveland, A., Nikolaev, S.I., Jakobsen, K.S., Pawlowski, J., 2007. Phylogenomics reshuffles the eukaryotic supergroups. PLoS ONE 2, e790. Burki, F., Inagaki, Y., Bråte, J., Archibald, J.M., Keeling, P., Cavalier-Smith, T., Sakaguchi, M., Hashimoto, T., Horak, A., Kumar, S., Klaveness, D., Jakobsen, K.S., Pawlowski, J., Shalchian-Tabrizi, K., 2009. Large-scale phylogenomic analyses reveal that two enigmatic protist lineages, Telonemia and Centroheliozoa, are related to photosynthetic chromalveolates. Genome Biol. Evol. 1, 231–238. Burki, F., Okamoto, N., Pombert, J.F., Keeling, P.J., 2012. The evolutionary history of haptophytes and cryptophytes: phylogenomic evidence for separate origins. Proc. Biol. Sci. 279, 2246–2254. Castresana, J., 2000. Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. Mol. Biol. Evol. 17, 540–552. Cavalier-Smith, T., 1998. A revised six-kingdom system of life. Biol. Rev. Camb. Philos. Soc. 73, 203–266. Cavalier-Smith, T., 2010. Kingdoms Protozoa and Chromista and the eozoan root of the eukaryotic tree. Biol. Lett. 6, 342–345. Cavalier-Smith, T., 2013. Early evolution of eukaryote feeding modes, cell structural diversity, and classification of the protozoan phyla Loukozoa, Sulcozoa, and Choanozoa. Eur. J. Protistol. 49, 115–178. Darriba, D., Taboada, G.L., Doallo, R., Posada, D., 2011. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164–1165. Derelle, R., Lang, B.F., 2012. Rooting the eukaryotic tree with mitochondrial and bacterial proteins. Mol. Biol. Evol. 29, 1277–1289. Felsenstein, J., 2011. PHYLIP (phylogeny inference package). Distributed by the author. Seattle (WA): Department of Genetics, University of Washington (03.06.01). Glucksman, E., Snell, E.A., Cavalier-Smith, T., 2013. Phylogeny and evolution of Planomonadida (Sulcozoa): eight new species and new genera Fabomonas and Nutomonas. Eur. J. Protistol. 49, 179–200. Hampl, V., Hug, L.A., Leigh, J.W., Dacks, J.B., Lang, B.F., Simpson, A.G.B., Roger, A.J., 2009. Phylogenetic analyses support the monophyly of Excavata and resolve relationships among eukaryotic ‘‘supergroups’’. Proc. Natl. Acad. Sci. USA 106, 3859–3864. Katoh, K., Misawa, K., Kuma, K., Miyata, T., 2002. MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res. 30, 3059–3066. Keeling, P.J., Burger, G., Durnford, D.G., Lang, B.F., Lee, R.W., Pearlman, R.E., Roger, A.J., Gray, M.W., 2005. The tree of eukaryotes. Trends Ecol. Evol. 20, 670–676. Kumar, S., Skjaeveland, A., Orr, R.J., Enger, P., Ruden, T., Mevik, B.H., Burki, F., Botnen, A., Shalchian-Tabrizi, K., 2009. AIR: a batch-oriented web program package for construction of supermatrices ready for phylogenomic analyses. BMC Bioinformatics 10, 357. Lartillot, N., Philippe, H., 2004. A Bayesian mixture model for across-site heterogeneities in the amino-acid replacement process. Mol. Biol. Evol. 21, 1095–1109. Minge, M.A., Silberman, J.D., Orr, R.J., Cavalier-Smith, T., Shalchian-Tabrizi, K., Burki, F., Skjaeveland, A., Jakobsen, K.S., 2009. Evolutionary position of breviate amoebae and the primary eukaryote divergence. Proc. Biol. Sci. 276, 597–604. Parfrey, L.W., Barbero, E., Lasser, E., Dunthorn, M., Bhattacharya, D., Patterson, D.J., Katz, L.A., 2006. Evaluating support for the current classification of eukaryotic diversity. PLoS Genet. 2, e220. Patron, N.J., Inagaki, Y., Keeling, P.J., 2007. Multiple gene phylogenies support the monophyly of cryptomonad and haptophyte host lineages. Curr. Biol. 17, 887– 891. Philippe, H., Brinkmann, H., Lavrov, D.V., Littlewood, D.T., Manuel, M., Worheide, G., Baurain, D., 2011. Resolving difficult phylogenetic questions: why more sequences are not enough. PLoS Biol. 9, e1000602. Richards, T.A., Cavalier-Smith, T., 2005. Myosin domain evolution and the primary divergence of eukaryotes. Nature 436, 1113–1118. Rodriguez-Ezpeleta, N., Brinkmann, H., Burey, S.C., Roure, B., Burger, G., Loffelhardt, W., Bohnert, H.J., Philippe, H., Lang, B.F., 2005. Monophyly of primary photosynthetic eukaryotes: green plants, red algae, and glaucophytes. Curr. Biol. 15, 1325–1330. Rodriguez-Ezpeleta, N., Brinkmann, H., Burger, G., Roger, A.J., Gray, M.W., Philippe, H., Lang, B.F., 2007. Toward resolving the eukaryotic tree: the phylogenetic positions of jakobids and cercozoans. Curr. Biol. 17, 1420–1425. Roger, A.J., Simpson, A.G., 2009. Evolution: revisiting the root of the eukaryote tree. Curr. Biol. 19, R165–167.

Please cite this article in press as: Zhao, S., et al. Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.08.005

S. Zhao et al. / Molecular Phylogenetics and Evolution xxx (2013) xxx–xxx Roure, B., Rodriguez-Ezpeleta, N., Philippe, H., 2007. SCaFoS: a tool for selection, concatenation and fusion of sequences for phylogenomics. BMC Evol. Biol. 7 (Suppl. 1), S2. Ruiz-Trillo, I., Roger, A.J., Burger, G., Gray, M.W., Lang, B.F., 2008. A phylogenomic investigation into the origin of metazoa. Mol. Biol. Evol. 25, 664–672. Shalchian-Tabrizi, K., Minge, M.A., Espelund, M., Orr, R., Ruden, T., Jakobsen, K.S., Cavalier-Smith, T., 2008. Multigene phylogeny of choanozoa and the origin of animals. PLoS ONE 3, e2098. Shimodaira, H., Hasegawa, M., 2001. CONSEL: for assessing the confidence of phylogenetic tree selection. Bioinformatics 17, 1246–1247. Simpson, A.G., Roger, A.J., 2004. The real ‘kingdoms’ of eukaryotes. Curr. Biol. 14, R693–696. Stamatakis, A., 2006. RAxML-VI-HPC: maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics 22, 2688– 2690.

7

Torruella, G., Derelle, R., Paps, J., Lang, B.F., Roger, A.J., Shalchian-Tabrizi, K., RuizTrillo, I., 2012. Phylogenetic relationships within the Opisthokonta based on phylogenomic analyses of conserved single-copy protein domains. Mol. Biol. Evol. 29, 531–544. Yabuki, A., Nakayama, T., Yubuki, N., Hashimoto, T., Ishida, K., Inagaki, Y., 2011. Tsukubamonas globosa n. gen., n. sp., a novel excavate flagellate possibly holding a key for the early evolution in ‘‘Discoba’’. J. Eukaryot. Microbiol. 58, 319–331. Yabuki, A., Ishida, K., Cavalier-Smith, T., 2013. Rigifila ramosa n. gen., n. sp., a filose apusozoan with a distinctive pellicle, is related to Micronuclearia. Protist 164, 75–88. Zhao, S., Burki, F., Brate, J., Keeling, P.J., Klaveness, D., Shalchian-Tabrizi, K., 2012. Collodictyon-an ancient lineage in the tree of eukaryotes. Mol. Biol. Evol. 29, 1557–1568.

Please cite this article in press as: Zhao, S., et al. Sulcozoa revealed as a paraphyletic group in mitochondrial phylogenomics. Mol. Phylogenet. Evol. (2013), http://dx.doi.org/10.1016/j.ympev.2013.08.005