The complete mitochondrial genome of a turbinid vetigastropod from MiSeq Illumina sequencing of genomic DNA and steps towards a resolved gastropod phylogeny

The complete mitochondrial genome of a turbinid vetigastropod from MiSeq Illumina sequencing of genomic DNA and steps towards a resolved gastropod phylogeny

Gene 533 (2014) 38–47 Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene The complete mitochondrial geno...

2MB Sizes 0 Downloads 35 Views

Gene 533 (2014) 38–47

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

The complete mitochondrial genome of a turbinid vetigastropod from MiSeq Illumina sequencing of genomic DNA and steps towards a resolved gastropod phylogeny S.T. Williams ⁎, P.G. Foster, D.T.J. Littlewood Natural History Museum, Life Sciences Department, Cromwell Rd, London SW7 5BD, UK

a r t i c l e

i n f o

Article history: Accepted 2 October 2013 Available online 11 October 2013 Keywords: Mitochondrial genome Phylogeny Gastropoda Mitochondrial enrichment

a b s t r a c t A need to increase sampling of mitochondrial genomes for Vetigastropoda has been identified as an important step towards resolving relationships within the Gastropoda. We used shotgun sequencing of genomic DNA, using an Illumina MiSeq, to obtain the first mitochondrial genome for the vetigastropod family Turbinidae, doubling the number of genomes for the species-rich superfamily Trochoidea. This method avoids the necessity of finding suitable primers for long PCRs or primer-walking amplicons, resulting in a timely and cost-effective method for obtaining whole mitochondrial genomes from ethanol-preserved tissue samples. Bayesian analysis of amino acid variation for all available gastropod genomes including the new turbinid mtgenome produced a well resolved tree with high nodal support for most nodes. Major clades within Gastropoda were recovered with strong support, with the exception of Littorinimorpha, which was polyphyletic. We confirm here that mitogenomics is a useful tool for molluscan phylogenetics, especially when using powerful new models of amino acid evolution, but recognise that increased taxon sampling is still required to resolve existing differences between nuclear and mitochondrial gene trees. Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved.

1. Introduction Mollusca is one of the largest phyla of metazoans. Although the focus of many wide-ranging studies, the evolutionary relationships among different molluscan classes and within some major clades are still uncertain. Mitochondrial genomes have proven to be useful in phylogenetic analyses of many groups (e.g. Cameron et al., 2012; Jex et al., 2010), but have met with limited success in Mollusca. For instance recent studies have shown that whole mitochondrial genomes are not able to resolve well-accepted clades within Lophotrochozoa or deep relationships within Mollusca (e.g. Bernt et al., 2013a; Boore et al., 2004; Dreyer and Steiner, 2004; Stöger and Schrödl, 2013; Waeschenbach et al., 2006). Moreover, a recent phylogenetic analysis of bivalves using partial mitochondrial genomes (Plazzi et al., 2011) resulted in a wellsupported phylogeny, but with many nodes in conflict with phylogenies based on standard nuclear genetic markers (e.g. 18S, 28S, Histone 3) Abbreviations: A + T, percentage of adenosine and thymidine; atp6, gene encoding ATP synthase subunit 6; atp8, gene encoding ATP synthase subunit 8; bp, base pairs; cox1, 2, 3, gene encoding cytochrome c oxidase subunits I, II, and III; cyt b, gene encoding cytochrome b apoenzyme; gDNA, genomic DNA; mtDNA, mitochondrial DNA; mtgenome, mitochondrial genome; nad1–6 and nad4L, gene encoding nicotinamide adenine dinucleotide dehydrogenase subunits 1–6 and 4L; NGS, next generation sequencing; PCR, polymerase chain reaction; tRNA, transfer RNA gene. ⁎ Corresponding author. Tel.: +44 20 7942 5351. E-mail address: [email protected] (S.T. Williams).

and larger transcriptomic studies (Sharma et al., 2012). These studies bring into question the validity of using mitochondrial genomes for phylogenetic analyses within Mollusca. On the other hand it has been noted that sampling within Mollusca is not uniform across groups (Stöger and Schrödl, 2013). Expanded taxon sampling has seen improvements of phylogenies within molluscan subgroups (e.g. cephalopods, Allcock et al., 2011) and may yet help to resolve relationships within Mollusca as a whole. Mitochondrial genomes of metazoans are frequently characterised by accelerated substitution rates and among-lineage compositional heterogeneity (Foster et al., 1997), but denser taxon sampling provides opportunities for developing and refining improved substitution models for phylogeny reconstruction (e.g. Rota-Stabelli et al., 2009). Increased taxon sampling, particularly of non-model taxa, is needed in order to test more robustly the value of mitochondrial genomes in resolving relationships within and among molluscan groups. For instance, there are entire mitochondrial genomes for only four genera assigned to the ancient and diverse Vetigastropoda. The superfamily Trochoidea is the largest of the vetigastropod superfamilies, with more than 2000 species (Geiger et al., 2008). Members of this superfamily occur globally, at all latitudes, from the high intertidal to bathyal depths. Trochoideans have been the focus of many ecological, evolutionary, morphological, molecular and systematic studies over the last century, but to date (May 2013), only one trochoid mitochondrial genome is available on GenBank (Tegula brunnea NC_016954, Simison, unpubl).

0378-1119/$ – see front matter. Crown Copyright © 2013 Published by Elsevier B.V. All rights reserved. http://dx.doi.org/10.1016/j.gene.2013.10.005

S.T. Williams et al. / Gene 533 (2014) 38–47

In this study we present the whole mitochondrial genome for Lunella aff. cinerea, the first representative of the family Turbinidae, an iconic family within the superfamily Trochoidea. Previous studies have shown that Lunella aff. cinerea belongs to a cryptic species complex and as yet its specific name has not yet been confirmed (Williams et al., 2011, 2012). It is endemic to north-western Australia and occurs in large numbers in the high intertidal zone on rocky shores. 2. Materials and methods 2.1. Mitochondrial enrichment and DNA extraction We used a museum specimen of Lunella aff. cinerea, preserved in 95% ethanol in 2008 (SE side of Channel I., Middle Arm, Darwin Harbour, Northern Territory, Australia; NHMUK 20100448). Approximately 25 mg of ethanol-preserved foot tissue was cut into small pieces and pressed onto an absorbent tissue to remove excess ethanol before it was immersed in 0.01 Tris-EDTA buffer for 10 minutes to allow any remaining ethanol in the tissue to exchange with buffer. Once the tissue sank to the bottom of the tube, it was removed from the buffer and excess liquid was removed by again pressing against a clean absorbent tissue. The tissue was then cut into small pieces and a modified version of the mitochondrial enrichment method of Tamura and Aotsuka (1988) was followed. This method was used as it has demonstrated utility for enriching for mitochondrial DNA in studies where pseudogenes were an issue (e.g. Williams and Knowlton, 2001). The tissue was placed in 1 ml of chilled homogenising buffer (0.25 M sucrose, 10 mM EDTA, 30 mM Tris-HCl, pH 7.5) in a 2 ml tube and homogenised using a micro-homogeniser for approximately 30 seconds to ensure cells were broken. The sample was then centrifuged at 1000g for 15 minutes at 4 °C in order to pellet nuclei and cellular debris. The supernatant was transferred to a clean, chilled 2 ml tube and kept on ice. The nuclear pellet was resuspended in 600 μl aliquot of chilled homogenisation buffer, spun at 1000g for 10 minutes at 4 °C in order to repellet nuclei and cellular debris. The nuclear pellet was discarded and the combined supernatants were spun at 12,000g for 30 minutes at 4 °C to pellet the mitochondrion. Instead of following Tamura and Aotsuka's (1988) protocol for DNA extraction, the supernatant was discarded and the putative mitochondrial pellet was resuspended in 180 μl ATL buffer (Qiagen) by gently flicking the pellet. DNA was then extracted using the QIAamp DNA extraction kit, with the minor modifications that the proteinase K was left to incubate in the ATL buffer overnight at 55 °C and the final DNA was eluted twice, each time using 200 μl of buffer AE. 2.2. Illumina sequencing with MiSeq MiSeq sequencing libraries were prepared and run at the University of Cambridge (Department of Biochemistry; Shilo Dickens). The gastropod gDNA library was indexed and run simultaneously with 4 other samples, over 250 cycles yielding paired reads of 150 bp each. 2.3. Assembly Raw Illumina reads were analysed and assembled using Geneious v.6.0. Paired reads were trimmed with a minimum of 18 bp removed from each end, to ensure removal of any indexed tags not removed during post-processing. The first assembly was set at ‘low sensitivity/ fastest’ in order to generate contigs. Ranked on quality (HQ%) the first 500 contigs were BLASTed against NCBI GenBank using BLASTx and BLASTn to search for contigs with mitochondrial protein-coding and RNA genes. Contigs with hits to mitochondrial genes or genomes were identified, disassembled, and reassembled using Geneious. 5′ and 3′ ends of the final assembly were scrutinised for overlap to establish a circular mtDNA.

39

2.4. Genome annotation Genome annotation was carried out using MacVector (v. 12.7, MacVector, Inc.) and using the MITOS pipeline, which has been specifically developed for annotation of metazoan mitochondrial genomes (Bernt et al., 2013b). BLAST searches of open reading frames were used to confirm the positions of protein-coding genes and nucleotide searches were used to confirm the location of ribosomal genes. Gene boundaries were determined by alignment with homologous regions of published mitochondrial genomes of gastropods. ARWEN (Laslett and Canbäck, 2008) and MITOS were used to identify tRNAs. Secondary structures for tRNAs were drawn using mt-tRNA-Draw (BETA Version 0.5; Youngblood and Masta, unpublished). 2.5. Phylogenetic analyses The new mitochondrial genome Lunella aff. cinerea was included in phylogenetic analyses along with all but two complete gastropod genomes available on GenBank, plus a partial genome for Nerita melanotragus (GU810158; Castro and Colgan, 2010). Preliminary analyses showed that two taxa were highly divergent (Lottia digitalis NC_007782.1 and Cepaea nemoralis NC_001816.1). The mt genome for Lottia digitalis is the single published representative for the Patellogastropoda. Other major clades in this study are represented by multiple specimens, therefore helping to break up branch lengths. Cepaea nemoralis is a well studied land snail, which is known to have unusually accelerated rates of mitochondrial evolution with some lineages within a single population estimated to have diverged 20 million years ago (Thomaz et al., 1996). We therefore excluded the sequences for these two taxa as they are likely to be prone to artefactual phylogenetic placement resulting from longbranch attraction (Yang and Rannala, 2012). The Cepaea sequence has also been excluded from other studies (e.g. White et al., 2011). Two freshwater bivalves were used as outgroups for rooting the phylogenetic trees. The decisions to treat Gastropoda as monophyletic and to use bivalves as the outgroup were based on two recent phylogenomic studies (Kocot et al., 2011; Smith et al., 2011). Concatenated amino-acid sequences for all protein-coding genes were used in maximum likelihood and Bayesian analyses. Genes were extracted from GenBank files. For the new sequence, the genes were translated using NCBI translation Table 5, for invertebrate mitochondria. Amino-acid sequences were aligned with Clustal Omega v 1.1.0. An amino acid residue identified as a “J” in ATP6 for Rapana venosa (NC_011193.1) was replaced with an ‘X’. Gblocks v 0.91b was used to identify conserved sites, using default settings except that parameter b5 was set to h (alignment positions with half gaps allowed). A BIONJ tree was made from p-distances. That tree was used with ProtTest v 3.2-20130314 (Darriba et al., 2011) to determine the best model available in PhyML (Guindon and Gascuel, 2003). The best model available in PhyML was MtArt + I + G + F (AIC weight 1.0), and this model was used in a PhyML analysis, bootstrapping with 100 pseudoreplicates. However, ProtTest does not test the Mtzoa model, therefore this model was compared with the MtArt model, using the same BIONJ tree using p4 (Foster, 2004). This test showed that the Mtzoa model was much better than the MtArt model for these data (Table 2), so the Mtzoa + I + G + F model was used in a Bayesian analysis in p4. Two separate MCMC chains were run, each for one million generations, by which time the average standard deviation of split frequency between the two runs had dropped to 0.015, showing good agreement and implying convergence. A burnin of half of the samples was used. A consensus tree was made from both runs together. In both Bayesian and Maximum Likelihood analyses consensus tree branches with less than 50% support were collapsed. Nodal support is indicated by posterior probabilities (PP) or bootstrap support (B/S). Preliminary analyses showed that protein sequences were well diverged amongst the taxa used in this study, and use of DNA sequences in phylogenetic analyses was deemed inappropriate given likely

40

S.T. Williams et al. / Gene 533 (2014) 38–47

Table 1 Position (start, stop), strand (Str) and lengths (Lnt) of genes in the mitochondrial genome of Lunella aff. cinerea, and for protein-coding genes, their initiation and termination codons (In./Term.) and amino acid sequence lengths (Laa). Standard abbreviations are used for protein coding genes and both one and three letter abbreviations are given for tRNAs, along with the codon used. Schematic diagrams for tRNAs are given in Fig. 3. Gene

Start

Stop

Str.

Lnt

In./Term.

Laa

cox1 cox2 tRNA ASP (D-GAC) atp8 atp6 tRNA PHE (F-UUC) nad5 tRNA HIS (H-CAC) nad4 nad4L tRNA SER1 (S1-UCA) Cytb nad6 tRNA PRO (P-CCA) nad1 tRNA LEU1 (L1-UUA) tRNA LEU2 (L2-CUA) 16S tRNA TYR1 (Y1-UAU) tRNA VAL (V-GUA) 12S tRNA MET (M-AUG) tRNA TYR2 (Y2-UAC) tRNA CYS (C-UGC) tRNA TRP (W-UGA) tRNA GLN (Q-CAA) non-coding tRNA GLU (E-GAA) tRNA GLY (G-GGA) cox3 tRNA LYS (K-AAA) tRNA ALA (A-GCA) tRNA ARG (R-CGA) tRNA ASN(N-AAC) tRNA THR (T-ACA) tRNA ILE (I-AUC) nad3 tRNA SER2 (S2-AGC) nad2

1 1705 2488 2555 2841 3569 3734 5489 5573 6955 7339 7444 8644 9155 9402 10,350 10,470 10,539 11,912 12,118 12,184 13,213 13,440 13,509 13,579 13,651 13,720 14,515 14,584 14,658 15,503 15,565 15,654 15,759 15,875 15,946 16,018 16,409 16,481

1536 2397 2554 2734 3536 3639 5488 5554 6961 7251 7405 8583 9150 9222 10,349 10,417 10,537 11,911 11,978 12,183 13,212 13,282 13,505 13,577 13,649 13,719 14,514 14,583 14,651 15,437 15,569 15,632 15,722 15,830 15,945 16,014 16,371 16,477 17,641

+ + + + + − − − − − − − − − − − − − − − − − − − − − + + + + + + + + + + + + +

1,536 693 67 180 696 71 1755 66 1389 297 67 1140 507 68 948 68 68 1373 67 66 1029 70 70 69 71 69 795 69 68 780 67 68 69 72 71 69 354 69 1,161

ATG/TAA ATG/TAG

512 231

ATG/TAA ATG/TAG

60 232

ATG/TAA

585

ATG/TAA ATA/TAA

463 99

ATG/TAA ATG/TAA

380 169

ATG/TAA

316

ATG/TAA

260

ATG/TAG

118

ATG/TAG

387

saturation. Furthermore, recent phylogenomic studies of deep relationships among Arthropoda have shown differences between nucleotide and amino acid trees suggesting that lineages may be differentially biased in their use of amino acid codons and that this synonymous codon-useage bias can lead to false topologies, particularly in nucleotide trees (Rota-Stabelli et al., 2013). 3. Results 3.1. Next generation sequencing for complete mitochondrial genome From the MiSeq run, a total of 251 cycles yielded 3Gb of which 63% were greater than or equal to QC = 30%. Post-sequencing, we were provided with raw output data and, of the five original samples

(one gastropod, four other samples) run simultaneously, 35% of the indexed reads were identified as having come from the gastropod sample suggesting it was overly represented by 15%. For the gastropod data, a total of 4.1 million indexed paired-end 150 bp reads were provided. From the initial Geneious assembly of raw, trimmed, paired reads from the MiSeq, 2 of 500 contigs yielded putative mitochondrial fragments; contig73 [788 reads, 13980bp] and contig450 [137 reads; 3666 bp]. Using Sequencher v. 5.0 (GeneCodes Corp), consensus sequences of these contigs overlapped sufficiently to suggest they contained the complete mtDNA. Mitochondrial gene identity and complement was also verified by a search of ORF identities using MacVector. The consensus sequence of the two overlapping contigs was used to remap the raw (trimmed) data. Just 2072 of the total 4.1 million reads mapped to the consensus sequence, but visualisation of the aligned reads confirmed full genome coverage and overlapping ends (thus completing the circular mitochondrial genome). MacVector, ARWEN and MITOS, in combination with reference to published mtDNAs of other gastropod molluscs was used to annotate the full mtDNA, with special care taken in areas of low depth of coverage. The final annotated mtDNA, with overlapping fragments removed, was mapped again to the raw, trimmed reads for final verification. Of the 1951 reads mapped and used to illustrate coverage (below), 92% were at Q30 or higher. The complete genome of Lunella aff. cinerea (GenBank Accession number: KF700096; Fig. 1) was obtained in a single partial run on MiSeq (coverage shown in Fig. 2). The genome contains the full complement of 13 protein-coding genes, two ribosomal RNA genes and 22 transfer RNA genes plus one additional Tyrosine tRNA (Fig. 1; Table 1). Protein and ribosomal genes are coded on both the plus and minus strands. Protein coding genes and tRNAs are on the plus strand for approximately one third of the length of the genome, and on the minus strand for the remaining two thirds (Fig. 1). Only in one small region of non-coding DNA do tRNAs occur encoded on both strands. A total of 23 putative secondary structures were identified with ARWEN and MITOS (Fig. 3). All putative secondary structures demonstrate the conventional cloverleaf structure. A published sequence for L. undulata (EF406323; as Turbo undulatus; tRNA-Ser) matches to 7339–7405 bp of L. aff. cinerea, which we also identified as Serine tRNA, although the sequences are quite divergent (20 mismatches out of 67 bp). tRNA gene boundaries identified by MITOS were more consistent with secondary structure reconstructions, but ARWEN identified more tRNAs. The gene order is similar to that found in two other vetigastropod genera (Haliotis and Tegula). The genome is 17,670 bp. The genome is A + T rich, with adenine and thymine accounting for 29.8% and 36.3% of nucleotides respectively (66.1% total). Only 14.5% nucleotides are cytosine and 19.4% are guanine. The cox1 sequence for this animal has been published previously (HQ681131) and is identical to a region of 658 bp of the MiSeq sequence (39–696 bp). Sequence for 16S rRNA (10709–11279 bp) is only 1 bp different from previously published sequence for another animal of the same species. Sequence for 12S rRNA (12312–12940 bp) differs by 1 bp from an unpublished sequence for another animal of the same species. Coverage in these regions was quite high, but drops as low as 10×, suggesting that this level of coverage is enough to obtain good sequence. 3.2. Phylogenetic analyses

Table 2 Log likelihood values for the best models of amino acid substitution; MtArt (Abascal et al., 2007); Mtzoa (Rota-Stabelli et al., 2009). G: gamma distributed rate variation; I: invariant sites; F: base composition. Model

p4

PhyML

MtArt + G + F MtArt + G + I + F Mtzoa + G + F Mtzoa + G + I + F

−103793.24 −103667.17 −103213.22 −103087.60

−103793.23 −103667.16

A total of 60 complete mitochondrial genomes were available for Gastropoda on GenBank in May 2013. These were analysed along with the new sequence for Lunella aff. cinerea and an almost complete genome for Nerita melanotragus. All taxa had all 13 protein coding genes and both ribosomal RNA genes. Two taxa were removed after preliminary analyses showed that they were too divergent (Lottia digitalis NC_007782.1 and Cepaea nemoralis NC_001816.1). The final list of taxa used and their accession numbers are given in Supplementary Table 1.

S.T. Williams et al. / Gene 533 (2014) 38–47

41

Fig. 1. Map of the mitochondrial genome of Lunella aff. cinerea (GenBank accession number KF700096). Arrows indicate the direction of transcription; yellow clockwise and green anticlockwise arrows are protein coding genes and red outline arrows are ribosomal genes. The tRNAs are designated by their three letter abbreviations (see Table 1) and the arrows indicate direction of transcription.

Minimum p-distances between taxa ranged from 0.38 up to 0.52 after the most divergent taxa were removed. A total of 2689 amino acid residues remained after ambiguously aligned sites were removed, corresponding to 60% of the total 4468 amino acids in the initial alignment. Conserved sites were found in all genes except atp8 (Table 3), which was not used in phylogenetic analyses.

The two 50% consensus trees resulting from the Bayesian and Maximum Likelihood analyses had a similar topology, with no wellsupported branches (PP N90% or B/S N 70%) in conflict. Both trees show strong support for major clades, however the p4 Bayesian tree was better resolved and had higher support values, particularly within the Neogastropoda. As this analysis used the best model, only the Bayesian tree is shown (with PP and B/S support).

Fig. 2. Distribution and depth coverage of Lunella aff. cinerea mitochondrial genome below horizontal depiction of genome showing gene annotations with direction of transcription.

42

S.T. Williams et al. / Gene 533 (2014) 38–47

Fig. 3. Putative secondary structures for 23 amino acids (including two Tyrosine tRNAs).

S.T. Williams et al. / Gene 533 (2014) 38–47 Table 3 Number of amino acid residues (aa) in alignment before and after removal of nonconserved positions with Gblocks. Gene

Original length (aa)

Length after Gblocks (aa)

% remaining

atp6 atp8 cox1 cox2 cox3 cytb nad1 nad2 nad3 nad4 nad4L nad5 nad6

259 94 518 423 329 397 344 456 155 503 164 624 202 4468

153 0 503 213 251 341 243 137 65 334 57 333 59 2689

59 0 97 50 76 86 71 30 42 66 35 53 29 60

4. Discussion 4.1. Next generation sequencing for complete mitochondrial genomes PCR based strategies used to enrich for mitochondrial DNA or mitochondrial genomes, can be readily confounded by difficult or degraded DNA templates. When quality of DNA is not an issue, PCR and sequencing can fail due to lack of availability of suitable primers, unusual GC content, long homopolymer regions, repeat regions, complex secondary structures or a combination of these problems (Kieleczawa, 2006). Next generation sequencing working directly on gDNA is here shown to be an alternative strategy for capturing the complete sequence of mtDNA. This method is likely to be especially useful in highly divergent groups where universal, or even bespoke, PCR primers routinely fail. Although we originally multiplexed five unrelated samples in a single MiSeq run, over 35% of the resulting sequences were from the Lunella aff. cinerea sequence (other non-molluscan samples not reported here). Despite the uneven sequencing favouring our sample and mitochondrial enrichment, mitochondrial coverage for L. aff. cinerea was low in a few regions. Clearly, for such an approach to be reliable and effective when not running single samples, sufficient reads must be of mitochondrial origin and providing sufficient depth of coverage to capture the entire mtDNA. Previous studies have applied this technology to single samples (e.g. Groenenberg et al., 2012). Opportunities for longer reads (e.g. newly developed chemistries allowing 2 × 250 bp) and the apparent stochastic nature of indexing (library preparation) success need to be taken into account in applying this technique routinely to multiple templates, but we have confirmed an effective and relatively low-cost alternative to characterising mtDNAs using an alcohol-fixed museum specimen. Moreover, we also have to hand a wealth of nuclear markers for interrogation (not reported here). 4.2. Gene order and phylogenetic utility In this paper we present the first mitochondrial genome for the emblematic vetigastropod family Turbinidae. The genome of Lunella aff. cinerea doubles the number of genomes for the superfamily Trochoidea and brings to five the number of total number of vetigastropod genera with published mitochondrial genomes. The Lunella aff. cinerea genome includes 13 protein coding genes, two ribosomal RNA genes, and 23 putative tRNAs, including two tyrosine tRNAs. The mitochondrial genome of Lunella aff. cinerea is 17,670 bp, only 20 bp shorter than that of Tegula brunnea, which belongs in the same superfamily. Excluding tRNAs, which are highly mobile in molluscan mitochondrial genomes (Gissi et al., 2008; Grande et al., 2008; Rawlings et al., 2010), there are two gene orders so far observed in Vetigastropoda: the one found here for Lunella aff. cinerea (Table 1, Fig. 4) and another for Fissurella volcano. The order found in Lunella is shared by other

43

vetigastropods (two species of Haliotis and Tegula brunnea), as well as some cephalopods, polyplacophorans and caenogastropods which has led previous authors to suggest that it is the ancestral gene order for this group (Grande et al., 2008; Rawlings et al., 2010). A partial genome for Nerita melanotragus also shows the same gene order (Castro and Colgan, 2010). Only a single gene rearrangement from the putative ancestral gene order (Fig. 4B) is required to achieve the protein coding gene order in Fissurella and another, longer, inversion results in a gene order commonly observed in caenogastropods. A common gene order observed in euthyneurans, however, shows little similarity to the ancestral gene order (Fig. 4D; Rawlings et al., 2010). Genes differ in their rate of evolution and we showed that in this study the atp8 was too variable for phylogenetic studies of deep relationships within Gastropoda. After removal of ambiguously aligned sites by Gblocks not a single site remained. On the other hand, cox1, cox3, nad1 and cytb all retained a large proportion of sites (N 70%) suggesting that these genes retain phylogenetic signal at the amino acid level even among ancient relationships. This knowledge may be particularly useful for phylogenetic studies focusing on shallower relationships, where more rapidly evolving genes are likely to be more useful and a switch from amino acids to DNA sequence data is necessary. For instance, Lunella aff. cinerea is a common species found on rocky shores and is endemic to north western Australia. It belongs to a species complex of seven species that range across the central Indo-West Pacific and southwest Pacific. Small scale population studies using cox1 were not definitive in defining species within this group (Williams et al., 2012) but recovery of the entire mitochondrial genome will provide the opportunity to design primers to address species delimitation within this species complex. 4.3. Phylogeny Recent studies using mitochondrial genomes to resolve gastropod relationships have not always been successful (e.g. Castro and Colgan, 2010; Stöger and Schrödl, 2013), probably due in part to the limited number of taxa with complete sequences. Rapid and ongoing advances in next generation sequencing techniques have resulted in an equally rapid increase in the number of mitochondrial genomes published for Mollusca. This means that the effect of limited and unequal taxon sampling can now be addressed. For instance, the need to increase sampling of mitochondrial genomes for Vetigastropoda has been recognised as an important step towards resolving molluscan relationships (Stöger and Schrödl, 2013). Another important factor in phylogenetic analyses is the use of an appropriate model (Yang and Rannala, 2012). We used here amino acid, rather than nucleotide, variation and a model that has been specifically designed for mitochondrial genomes of deuterostome and lophotrochozoan datasets (Mtzoa; Rota-Stabelli et al., 2009). We also allowed for lineage-specific compositional heterogeneity in amino acid composition by using p4 (Foster, 2004), which has been shown to overcome systematic artefacts in mitogenomic studies (Blanquart and Gascuel, 2011; Bourlat et al., 2009) and removed the fastest evolving proteins. Such procedures help to minimise the effect of long branch attraction and improve phylogenetic analyses (Yang and Rannala, 2012). In this study we show that the mitochondrial genome is a useful marker for determining gastropod systematics (Fig. 5). The combination of a complex model using amino acid variation incorporating base composition heterogeneity, a larger number of taxa, plus a new vetigastropod mitochondrial genome has resulted in a wellsupported topology that clearly identifies three major gastropod clades (Euthyneura, Vetigastropoda and Neogastropoda). A fourth group, Littorinimorpha is not monophyletic, with two polyphyletic clades corresponding to Vermetoidea and Rissoidea, and one species, Cymatium parthenopeum (Tonnoidea) falling within Neogastropoda. This may be the result of limited taxon sampling, as Tonnoidea have clustered within Neogastropoda in other molecular studies (e.g. Colgan et al., 2007; Cunha

44

S.T. Williams et al. / Gene 533 (2014) 38–47

Fig. 4. Four protein coding gene orders found in gastropods (excluding tRNAs). Genes shown in grey have the same gene order as gene order ‘B’. A. A common protein coding gene order in caenogastropods, with exemplar genera (Rawlings et al., 2010); B. A common gene order found in three of four vetigastropod genera investigated to date, but also occurring in Cephalopoda, Polyplacophora (chitons), and Caenogastropoda; C. The only alternate gene order found to date in Vetigastropoda; D. A common gene order in Euthyneura (Pulmonata and Opistobranchia; White et al., 2011). Arrows show where a single gene inversion is likely to have occurred between A and B and between B and C.

et al., 2009; Plazzi et al., 2013), but are sister to Neogastropoda in a molecular study based on two nuclear and three mitochondrial genes and with much greater taxon sampling (Zou et al., 2011). Moreover, some molecular studies including nuclear genes and greater taxon sampling (but fewer outgroups) have recovered Vermetoidea and Rissoidea as a clade (e.g. Colgan et al., 2007). Relationships among these major gastropod clades, although receiving high support, still need to be tested further by the addition of missing taxon groups. For instance, the single published patellogastropod genome was excluded from this analysis because it was deemed to be too divergent. In future analyses it would be useful to include multiple taxa from this group in order to help break up long branches. Relationships within Euthyneura are consistent with most previous mitogenomic studies (e.g. Grande et al., 2002, 2008; Medina et al., 2011; Plazzi et al., 2013; White et al., 2011; Stöger and Schrödl, 2013;

although see Gaitán-Espitia et al., 2013), although most branches received higher support in this study. Like most previous mitogenomic studies we found the opisthobranchs to be monophyletic (but with the inclusion of Siphonaria) and a pulmonate grade (see Gaitán-Espitia et al., 2013 for an exception). Ellobiidae were also non-monophyletic: the pyramidellid, Pyramidella dolabrata nests within Euthyneura and Systellommatophora is non-monophyletic. Some of these findings are consistent with studies using either nuclear markers alone or nuclear markers in combination with one or more partial mitochondrial genes (e.g. Jörger et al., 2010; Winnepenninckx et al., 1998). One of the main differences is that analyses of nuclear genes have consistently recovered Opisthobranchia as a grade and a monophyletic Pulmonata (including Pyramidella) (e.g. Dayrat et al., 2011; Dinapoli and Klussmann-Kolb, 2010; Jörger et al., 2010; Klussmann-Kolb et al., 2008; Kocot et al., 2013). These differences appear largely to be the effect of differences

Fig. 5. Phylogeny of Gastropoda resulting from Bayesian analysis of amino acid variation of 12 protein-coding mitochondrial genes (atp8 was excluded) using the Mtzoa + I + G + F model as implemented in p4. Taxon groups are identified by lines or blocks. Non-monophyletic groups are indicated with an asterisk. Nodal support is shown as Bayesian posterior probabilities/ML bootstrap values. Full support (PP = 1.0/B/S = 100%) is shown as a star.

S.T. Williams et al. / Gene 533 (2014) 38–47

45

46

S.T. Williams et al. / Gene 533 (2014) 38–47

in rooting of the tree (Kocot et al., 2013; Schrödl et al., 2011; Stöger and Schrödl, 2013), rather than any inherent difference in the choice of markers. For instance, if our tree is rooted on the nudibranch Notodoris, rather than the freshwater bivalves, Opisthobranchia form a basal grade and we recover a monophyletic Pulmonata. In this topology, Siphonaria are sister to a clade with Sacoglossa + Pulmonata, which is also consistent with nuclear gene trees. The correct rooting of the tree is obviously essential for evolutionary studies of this group (e.g. Kocot et al., 2013) and more gastropod outgroups (e.g. Patellogastropoda, as discussed above) should be sequenced in order to further test the utility of mitogenomics within Euthyneura. Neogastropods are a group of active predators that are united by morphological adaptations to a carnivorous lifestyle (Cunha et al., 2009). In our study Neogastropoda was monophyletic except for the inclusion of Cymatium parthenopeum. Although other studies have observed the same result, a recent study by Zou et al. (2011) including additional taxa resulted in tonnoidean families plus Ficidae forming a monophyletic clade as the sister group to Neogastropoda. Relationships within Neogastropoda including non-monophyly of Muricoidea are generally consistent with those found in other mitogenomic studies using amino acid variation (Cunha et al., 2009) and molecular studies including nuclear genes (Colgan et al., 2007; Oliverio and Modica, 2010; Zou et al., 2011). The topology of some branches differed between our study and Plazzi et al. (2013), another mitogenomic study using amino acid variation, but these branches received only poor support in Plazzi et al. (2013, BSb 70%). In the present study Conoidea were monophyletic, unlike some earlier mitogenomic studies (Cunha et al., 2009). Greater taxon sampling is required to confirm these results. In this study a sister relationship between Neritomorpha and Vetigastropoda is strongly supported in the Bayesian tree. In a study based on three nuclear and two mitochondrial genes by Aktipis and Giribet (2010), Neritimorpha was sister to a clade comprising Neomphalina, Cocculinoidea and Vetigastropoda, including Patellogastropoda in the optimal parameter set. Conversely, a previous mitogenomic study using sequence data rather than amino acids did not resolve relationships among Caenogastropoda, Heterobranchia, Vetigastropoda and Neritimorpha (Castro and Colgan, 2010). Relationships within Vetigastropoda are consistent with phylogenetic studies using nuclear and/or mitochondrial genes and greater taxon sampling (e.g. Williams et al., 2012 and references therein). 4.4. Conclusions We present here the first entire mitochondrial genome for a turbinid gastropod using a method that requires neither PCR primers nor any previous knowledge of the organism's mitochondrial gene sequence. Moreover we show that mitogenomics is a powerful tool for investigating phylogenetic relationships within Gastropoda, and suggest (as others have done) that phylogenetic inconsistencies may be in part due to limited taxon sampling. The rise of next generation sequencing and simple methods, such as the one in this study should see a rapid increase in the number of genomes available for molecular study, which will address this issue. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2013.10.005. Conflict of interest None declared. Acknowledgments We thank Andrea Waeschenbach for help with mt-tRNA-draw and access to chemicals, Richard Willan for providing the sample of Lunella, Matthias Bernt for help with MITOS and two anonymous reviewers for helpful comments. We thank Megan Schwartz for help in the lab and

Shilo Dickens for preparing the sample for the MiSeq. This work was part funded by a BBSRC grant to DTJL and PGF (BB/H023534/1).

References Abascal, F., Posada, D., Zardoya, R., 2007. MtArt: a new model of amino acid replacement for Arthropoda. Mol. Biol. Evol. 24, 1–5. Aktipis, S.W., Giribet, G., 2010. A phylogeny of Vetigastropoda and other ‘archaeogastropods’: reorganizing old gastropod clades. Invertebr. Biol. 129, 220–240. Allcock, A.L., Cooke, I.R., Strugnell, J.M., 2011. What can the mitochondrial genome reveal about higher-level phylogeny of the molluscan class Cephalopoda? Zool. J. Linnean Soc. 161, 573–586. Bernt, M., et al., 2013a. A comprehensive analysis of bilaterian mitochondrial genomes and phylogeny. Mol. Phylogenet. Evol. 69, 352–364. Bernt, M., Braband, A., Middendorf, M., Misof, B., Rota-Stabelli, O., Stadler, P.F., 2013b. Bioinformatics methods for the comparative analysis of metazoan mitochondrial genome sequences. Mol. Phylogenet. Evol. 69, 320–327. Blanquart, S., Gascuel, N., 2011. Mitochndrial genes support a common origin of rodent malaria parasites and Plasmodium falciparum relatives infecting great apes. BMC Evol. Biol. 11, 70. Boore, J.L., Medina, M., Rosenberg, L.A., 2004. Complete sequences of two highly rearranged molluscan mitochondrial genomes, those of the scaphopod Graptacme eborea and of the bivalve Mytilus edulis. Mol. Biol. Evol. 21, 1492–1503. Bourlat, S., Rota Stabelli, O., Lanfear, R., Telford, M.J., 2009. The mitochondrial genome of Xenoturbella bocki (phylum Xenoturbellida) is ancestral within the deuterostomes. BMC Evol. Biol. 9, 107. Cameron, S.L., Lo, N., Bourguignon, T., Svenson, G.J., Evans, T.A., 2012. A mitochondrial genome phylogeny of termites (Blattoidea: Termitoidae): robust support for interfamilial relationships and molecular synapomorphies define major clades. Mol. Phylogenet. Evol. 65, 163–173. Castro, L.R., Colgan, D.J., 2010. The phylogenetic position of Neritimorpha based on the mitochondrial genome of Nerita melantragus (Mollusca: Gastropoda). Mol. Phylogenet. Evol. 57, 918–923. Colgan, D.J., Ponder, W.F., Beacham, E., Macaranas, J., 2007. Molecular phylogenetics of Caenogastropoda (Gastropoda: Mollusca). Mol. Phylogenet. Evol. 42, 717–737. Cunha, R.L., Grande, C., Zardoya, R., 2009. Neogastropod phylogenetic relationships based on entire mitochondrial genomes. BMC Evol. Biol. 9, 210. Darriba, D., Taboada, G.L., Doallo, R., Posada, D., 2011. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164–1165. Dayrat, B., et al., 2011. Phylogenetic relationships and evolution of pulmonate gastropods (Mollusca): new insights from increased taxon sampling. Mol. Phylogenet. Evol. 59, 425–437. Dinapoli, A., Klussmann-Kolb, A., 2010. The long way to diversity — phylogeny and evolution of the Heterobranchia (Mollusca: Gastropoda). Mol. Phylogenet. Evol. 55, 60–76. Dreyer, H., Steiner, G., 2004. The complete sequence and gene organization of the mitochondrial genome of the gadilid scaphopod Siphonodentalium lobatum (Mollusca). Mol. Biol. Evol. 31, 605–617. Foster, P.G., 2004. Modeling compositional heterogeneity. Syst. Biol. 53, 485–495. Foster, P.G., Jermiin, L.S., Hickey, D.A., 1997. Nucleotide composition bias affects amino acid content in proteins coded by animal mitochondria. J. Mol. Evol. 44, 282–288. Gaitán-Espitia, J.D., Nespolo, R.F., Opazo, J.C., 2013. The complete mitochondrial genome of the land snail Cornu aspersum (Helicidae: Mollusca): intra-specific divergence of protein-coding genes and phylogenetic considerations within Euthyneura PLOSone: e67299. Geiger, D.L., Nützel, A., Sasaki, T., 2008. Vetigastropoda. In: Ponder, W.F., Lindberg, D.R. (Eds.), Phylogeny and Evolution of the Mollusca. University of California Press, Berkeley and Los Angeles, pp. 297–330. Gissi, C., Iannelli, F., Pesole, G., 2008. Evolution of the mitochondrial genome of Metazoa as exemplified by comparison of congeneric species. Heredity 101, 301–320. Grande, C., Templado, J., Cervera, J.L., Zardoya, R., 2002. The complete mitochondrial genome of the nudibranch Roboastra europaea (Mollusca: Gastropoda) supports the monophyly of opisthobranchs. Mol. Biol. Evol. 19, 1672–1685. Grande, C., Templado, J., Zardoya, R., 2008. Evolution of a gastropod mitochondrial genome arrangements. BMC Evol. Biol. 8, 61. Groenenberg, D.J.S., Pirovano, W., Gittenberger, E., Schilthuizen, M., 2012. The complete mitogenome of Cylindrus obtusus (Helicidae, Ariantinae) using Illumina next generation sequencing. BMC Genomics 13, 114. Guindon, S., Gascuel, O., 2003. A simple, fast and accurate algorithm to estimate large phylogenies by maximum likelihood. Syst. Biol. 52, 696–704. Jex, A.R., Hall, R.S., Littlewood, D.T.J., Gasser, R.B., 2010. An integrated pipeline for nextgeneration sequencing and annotation of mitochondrial genomes. Nucleic Acids Res. 38, 522–533. Jörger, K.M., Stöger, I., Kano, Y., Fukuda, H., Knebelsberger, T., Schrödl, M., 2010. On the origin of Acochlidia and other enigmatic euthyneuran gastropods, with implications for the systematics of Heterobranchia. BMC Evol. Biol. 10, 323. Kieleczawa, J., 2006. Fundamentals of sequencing of difficult templates — an overview. J. Biomol. Tech. 17, 207–217. Klussmann-Kolb, A., Dinapoli, A., Kuhn, K., Streit, B., Albrecht, C., 2008. From sea to land and beyond — new insights into the evolution of euthyneuran Gastropoda (Mollusca). BMC Evol. Biol. 8, 57. Kocot, K.M., et al., 2011. Phylogenomics reveals deep molluscan relationships. Nature 477, 452–456.

S.T. Williams et al. / Gene 533 (2014) 38–47 Kocot, K.M., Halanych, K.M., Krug, P.J., 2013. Phylogenomics supports Panpulmonata: Opisthobranch paraphyly and key evolutionary steps in a major radiation of gastropod molluscs. Mol. Phylogenet. Evol. 69, 764–771. Laslett, D., Canbäck, B., 2008. ARWEN, a program to detect tRNA genes in metazoan mitochondrial nucleotide sequences. Bioinformatics 24, 172–175. Medina, M., et al., 2011. Crawling through time: transition of snails to slugs dating back to the Paleozoic, based on mitochondrial phylogenomics. Mar. Genomics 4, 51–59. Oliverio, M., Modica, M.V., 2010. Relationships of the haematophagous marine snail Colubraria (Rachiglossa: Colubrariidae), within the neogastropod phylogenetic framework. Zool. J. Linnean Soc. 158, 779–800. Plazzi, F., Ceregato, A., Taviani, M., Passamonti, M., 2011. A molecular phylogeny of bivalve mollusks: ancient radiations and divergences as revealed by mitochondrial genes. PLoS One 6, e27147. Plazzi, F., Ribani, A., Passamonti, M., 2013. The complete mitochondrial genome of Solemya velum (Mollusca: Bivalvia) and its relationships with Conchifera. BMC Genomics 14, 409. Rawlings, T.A., MacInnis, M.J., Bieler, R., Boore, J.L., Collins, T.M., 2010. Sessile snails, dynamic genomes: gene rearrangements within the mitochondrial genome of a family of caenogastropod molluscs. BMC Genomics 11, 440. Rota-Stabelli, O., Yang, Z., Telford, M.J., 2009. MtZoa: a general mitochondrial amino acid substitutions model for animal evolutionary studies. Mol. Phylogenet. Evol. 52, 268–272. Rota-Stabelli, O., Lartillot, N., Philippe, H., Pisani, D., 2013. Serine codon-usage bias in deep phylogenomics: pancrustacean relationships as a case study. Syst. Biol. 62, 121–133. Schrödl, M., Jörger, K.M., Wilson, N.G., 2011. A reply to Medina et al. (2011): crawling through time: transition of snails to slugs dating back to the Paleozoic based on mitochondrial phylogenomics. Mar. Genomics 4, 301–303. Sharma, P.P., et al., 2012. Phylogenetic analysis of four nuclear protein-encoding genes largely corroborates the traditional classification of Bivalvia (Mollusca). Mol. Phylogenet. Evol. 65, 64–74.

47

Smith, S.A., et al., 2011. Resolving the evolutionary relationships of molluscs with phylogenomic tools. Nature 480, 364–367. Stöger, I., Schrödl, M., 2013. Mitogenomics does not resolve deep molluscan relationships (yet?). Mol. Phylogenet. Evol. 69, 376–392. Tamura, K., Aotsuka, T., 1988. Rapid isolation method of animal mitochondrial DNA by the alkaline lysis procedure. Biochem. Genet. 26, 815–819. Thomaz, D., Guiller, A., Clarke, B., 1996. Extreme divergence of mitochondrial DNA within species of pulmonate land snails. Proc. R. Soc. Lond. B 263, 363–368. Waeschenbach, A., Telford, M.J., Porter, J.S., Littlewood, D.T.J., 2006. The complete mitochondrial genome of Flustrellidra hispida and the phylogenetic position of Bryozoa among the Metazoa. Mol. Phylogenet. Evol. 40, 195–207. White, T.R., et al., 2011. Ten new complete mitochondrial genomes of pulmonates (Mollusca: Gastropoda) and their impact on phylogenetic relationships. BMC Evol. Biol. 11, 295. Williams, S.T., Knowlton, N., 2001. Mitochondrial pseudogenes are pervasive and often insidious in the snapping shrimp genus Alpheus. Mol. Biol. Evol. 18, 1484–1493. Williams, S.T., Apte, D., Ozawa, T., Kagilis, F., Nakano, T., 2011. Speciation and dispersal along continental coastlines and island arcs in the Indo-West Pacific turbinid gastropod genus Lunella. Evolution 65, 1752–1771. Williams, S.T., Hall, A., Kuklinski, P., 2012. Unravelling cryptic diversity in the Indo-West Pacific gastropod genus Lunella (Turbinidae) using elliptic Fourier analysis. Am. Malacol. Bull. 30, 189–206. Winnepenninckx, B.M.H., Reid, D.G., Backeljau, T., 1998. Performance of 18S rRNA in littorinid phylogeny (Gastropoda: Caenogastropoda). J. Mol. Evol. 47, 586–596. Yang, Z., Rannala, B., 2012. Molecular phylogenetics: principles and practice. Nat. Rev. Genet. 13, 303–314. Zou, S., Li, Q., Kong, L., 2011. Additional gene data and increased sampling give new insights into the phylogenetic relationships of Neogastropoda, within the caenogastropod phylogenetic framework. Mol. Phylogenet. Evol. 61, 425–435.