Gene 233 (1999) 89–99
www.elsevier.com/locate/gene
The complete sequence of the mitochondrial genome of Daphnia pulex (Cladocera: Crustacea) Teresa J. Crease * University of Guelph , Department of Zoology, Guelph, Ont. N1G 2W1, Canada Received 7 January 1999; received in revised form 29 March 1999; accepted 14 April 1999
Abstract The sequence of the mitochondrial DNA (mtDNA) of the branchiopod crustacean Daphnia pulex has been completed. It is 15 333 bp with an A+T content of 62.3%, and contains the typical complement of 13 protein-coding, 22 transfer RNA (tRNA) and two ribosomal RNA (rRNA) genes. Comparison of this sequence with the sequences of the other eight completely sequenced arthropod mtDNAs showed that gene order and orientation are identical to that of Drosophila but different from Artemia due to the rearrangement of two tRNA genes. Nucleotide composition, codon usage, and amino acid composition are very similar in the crustaceans, but divergent from insects and chelicerates which show a much higher bias towards A+T. However, with few exceptions, the mitochondrial proteins of Daphnia are more similar to those of the dipteran insects (Drosophila and Anopheles) than to those of Artemia, at both the nucleotide and amino acid levels, suggesting that Artemia mtDNA is evolving at an accelerated rate. These results also show that sequence evolution and the evolution of nucleotide composition can be decoupled. Analysis of nucleotide substitution patterns in COII showed that there has been an unbiased acceleration of the overall substitution rate in Artemia. In contrast, the accelerated substitution rate in Apis is due partly to extreme A+T mutation pressure. Secondary structures are proposed for the Daphnia tRNAs and rRNAs. The tRNAs are similar to those of other arthropods but tend to have TYC arms that are only 4 bp long. The rRNA secondary structures are similar to those proposed for insects except for the absence of a small number of helices in Daphnia. Phylogenetic analysis of second codon positions grouped Daphnia with Artemia, as expected, despite the latter’s accelerated divergence rate. In contrast, the unusual pattern of mtDNA divergence in Apis led to a topology in which the holometabolous insects (Anopheles, Drosophila, Apis) appeared to be paraphyletic with respect to the hemimetabolous insect, Locusta, due to the early branching of Apis. © 1999 Elsevier Science B.V. All rights reserved. Keywords: Arthropoda; Codon usage; Gene order; Ribosomal genes
1. Introduction The mitochondria of metazoan animals contain their own circular genomes which range in size from 14 kb to 42 kb (reviewed in Wolstenholme, 1992). Despite the threefold range in size, the gene content of the molecule is remarkably conserved; with few exceptions it contains 22 tRNA genes, two rRNA genes, 13 protein-coding genes, and a non-coding region containing the origin of replication. In contrast, the arrangement of these genes Abbreviations: A, adenine; aa, amino acid(s); C, cytosine; COI–III, cytochrome oxidase subunits I–III; CytB, cytochrome B; G, guanine; lrRNA, large subunit rRNA; mtDNA, mitochondrial DNA; ND1–6, NADH dehydrogenase subunits 1–6; nt, nucleotide(s); rRNA, ribosomal RNA; srRNA, small subunit rRNA; T, thymine; tRNA, transfer RNA; ts, transition(s); tv, transversion(s); U, uracil. * Tel.: 519-824-4120; fax: 519-767-1656. E-mail address:
[email protected] ( T.J. Crease)
within the molecule can vary substantially, both within and among animal phyla, and it has been suggested that such gene rearrangements contain phylogenetic information about ancient divergences due to their apparently low frequency of occurrence (e.g., Boore et al., 1995, 1998). Mitochondrial gene order is now known for a substantial number of arthropod taxa representing all four of the major groups; crustaceans, insects, myriapods and chelicerates (Boore et al., 1998), but only eight mitochondrial genomes have been completely sequenced. Of these, five are from insects, two are from chelicerates (ticks) and only one, from Artemia franciscana, is from a crustacean. Thus, few detailed comparisons of the molecular characteristics (nt composition, codon usage) of non-insect arthropod mtDNA have been done (e.g., Staton et al., 1997). The sequence of approximately one half of the mtDNA from the branchiopod crustacean,
0378-1119/99/$ – see front matter © 1999 Elsevier Science B.V. All rights reserved. PII: S0 3 7 8 -1 1 1 9 ( 9 9 ) 0 0 15 1 - 1
90
T.J. Crease / Gene 233 (1999) 89–99
Daphnia pulex, has been described previously (Crease and Little, 1997), and comparisons of this sequence with those of the other insects and Artemia revealed that the Daphnia mitochondrial proteins are more similar to those of dipteran insects (Drosophila, Anopheles) than they are to those of the other crustacean. Even so, patterns of nt bias at 4-fold degenerate third codon positions are very similar within insects and crustaceans, but divergent between the groups (Crease and Little, 1997). The purpose of this paper is to describe the sequence of the remaining half of the D. pulex mitochondrial genome, and to report the results of more detailed comparative analyses between the entire Daphnia mtDNA sequence and those of the other completely sequenced arthropod mtDNAs.
2. Materials and methods 2.1. mtDNA cloning and sequencing Extraction of D. pulex mtDNA is described by Stanton et al. (1991) who showed that PstI cuts the molecule into four fragments. These fragments were cloned into the plasmid, pBluescript II KS+ ( Van Raay and Crease, 1994). Digestion of the largest PstI fragment with BamHI and SstI produced fragments of 5.4 kb, 2.6 kb, and 1.1 kb which were subcloned. The new sequence reported in this study includes the 2.6 kb PstI fragment, and the 5.4 kb PstI–SstI fragment ( Fig. 1). An Erase-a-Base kit (Promega) was used to generate a set of unidirectional, deleted subclones of these two subclones from both ends. Sequencing of a partial set of deletion subclones of the large Pst I fragment from the COI end (Fig. 1) confirmed that no small fragments were lost during subcloning ( Van Raay and Crease, 1994). Plasmids clones were sequenced on an ABI 377 automated sequencer (Perkin-Elmer) using the Dideoxy terminator TaqFS kit (Perkin-Elmer). Plasmid DNA was isolated using the alkaline-lysis method (Lee and Rasheed, 1990) and further purified for sequencing by precipitation in 1 M sodium chloride and 6.5% polyethylene glycol 8000. 2.2. Sequence analysis DNA sequences were aligned using the DNAStar software package (DNAStar). Daphnia mitochondrial genes were first identified by searching for similarities with other mtDNAs. Comparisons were then made between Daphnia and the other eight complete arthropod mtDNA sequences: A. franciscana ( X69067, Valverde et al., 1994), Drosophila yakuba ( X03240, Clary and Wolstenholme, 1985), Anopheles gambiae (L20934, Beard et al., 1993), Anopheles quadrimaculatus (L04272,
Fig. 1. Gene map of Daphnia pulex mtDNA showing the cloning strategy and region of sequence presented in this study. Transfer RNA genes are indicated by their single-letter IUPAC-IUB codes. The sequence of region A was reported in Van Raay and Crease (1994; accession No. Z15015), the sequence of region B was reported in Crease and Little (1997, accession No. U65669) and the sequence of region C is reported in this study. The entire Daphnia mtDNA sequence has been deposited in the GenBank under accession No. AF117817.
Mitchell et al., 1993), Locusta migratoria ( X80245, Flook et al., 1995), Apis mellifera (L06178, Crozier and Crozier, 1993), Ixodes hexagonus (AF081828, Black and Roehrdanz, 1998) and Rhipicephalus sanguineus (AF081829, Black and Roehrdanz, 1998). Only one of the two Anopheles sequences was used in some analyses due to their high similarity. The 5∞ and 3∞ ends of the Daphnia rRNA genes were inferred from alignment with the other arthropod sequences and by comparison with the secondary structure models of Van de Peer et al. (1997) and DeRijk et al. (1997), and must be considered tentative. Secondary structures were postulated for these rRNAs using structures proposed for A. gambiae, obtained from the rRNA database (http://rrna.uia.ac.be/), as templates. The program RNAdraw (Matzura and Wennborg, 1996) was used to suggest stable secondary structures, based on energetic criteria, for regions in which the primary sequences were very divergent. Secondary structures of the tRNAs were obtained by eye using the general model of Sprinzl et al. (1989). The aa sequences of protein-coding genes were inferred using the Drosophila mtDNA genetic code. Alignments and similarities between these sequences were determined using the AALIGN program in DNAStar with the PAM250 matrix of Lipman and
91
T.J. Crease / Gene 233 (1999) 89–99 Table 1 Organization of the Daphnia pulex mitochondrial genomea Gene tRNAIle tRNAGln tRNAMet ND2 tRNATrp tRNACys tRNATyr COI tRNALeu(UUR) COII tRNALys tRNAAsp ATPase8 ATPase6 COIII tRNAGly ND3 tRNAAla tRNAArg tRNAAsn tRNASer(AGN) tRNAGlu tRNAPhe ND5 tRNAHis ND4 ND4L tRNAThr tRNAPro ND6 CytB tRNASer(UCN) ND1 tRNALeu(CUN) LSU rRNA tRNAVal SSU rRNA Control region
Cb
X
X X
X X X X X X
X X X X X
Begins
Ends
Length
IGNc
1 66 134 198 1186 1253 1328 1397 2935 3004 3683 3757 3821 3976 4650 5439 5500 5853 5920 5985 6052 6117 6184 6250 7952 8016 9339 9646 9711 9778 10 298 11 432 11 491 12 430 12 506 13 821 13 892 14 645
64 133 197 1185 1251 1316 1391 2934 3002 3682 3752 3821 3982 4649 5438 5499 5852 5918 5984 6051 6116 6184 6249 7957 8015 9336 9614 9710 9775 10 290 11 431 11 500 12 426 12 496 13 819 13 892 14 644 15 333
64 68 64 988 66 64 64 1538 68 679 70 65 162 674 789 61 353 66 65 67 65 68 66 1708 64 1321 276 65 65 513 1134 69 936 67 1314 72 753 689
1 0 0 0 1 11 5 0 1 0 4 −1 −7 0 0 0 0 1 0 0 0 −1 0 −6 0 2 31 0 2 7 0 −10 3 9 1 −1 0 689
Start
Stop
ATG
T--
(A)TTA
T--
ATG
T--
GTG ATG ATG
TAG TA TAA
ATT
TA -
ATG
T--
ATG ATT
T-TAA
ATT ATG
TAA TAA
ATG
TAA
a The location of the 5∞ and 3∞ ends of all genes has not been confirmed experimentally. b X indicates that the gene is transcribed from the complementary strand. c Intergenic nucleotide. Negative numbers indicate that adjacent genes overlap.
Pearson (1985). The nt sequences of the Daphnia protein-coding genes were added to the alignment of the other six arthropod sequences generated by Flook et al. (1995). All termination codons and the initiation codon for COI were omitted from this alignment. MEGA ( Kumar et al., 1993) was used to estimate nt and aa divergence, and to construct phenograms from the resulting matrices using the neighbor-joining method (Saitou and Nei, 1987). MEGA was also used to construct cladograms from the nt data using the criterion of maximum parsimony. The program DNAML in the software package PHYLIP v3.5c ( Felsenstein, 1993) was used to construct a tree from the nt sequence data using the maximum likelihood method.
3. Results and discussion 3.1. Genome organization The mitochondrial genome of Daphnia is 15 333 bp and contains the standard complement of 13 proteincoding genes, 22 tRNA genes, and two rRNA genes ( Fig. 1, Table 1). Some of the genes overlap ( Table 1) as in other animal mtDNAs. In Daphnia, this occurs six times and involves a total of 26 bp. The average length of overlaps is 4.33 bp. Excluding the large region between tRNAIle and the srRNA gene, there are 14 noncoding regions, totaling 79 bp. The average length of these regions is 5.64 bp. Van Raay and Crease (1994) described the structural features of the large region
92
T.J. Crease / Gene 233 (1999) 89–99
between tRNAIle and srRNA in detail and concluded that it is very likely to be the mtDNA control region. At 689 bp, this putative control region is less than half as long as the control region in Artemia ( Table 2). The gene order in Daphnia mtDNA is identical to that in Drosophila. Other than the two chelicerates, Ixodes, a tick (Black and Roehrdanz, 1998), and Limulus polyphemus, a horseshoe crab (Staton et al., 1997), these are the only two arthropod taxa from different genera that have the same mtDNA gene order. Since it is unlikely that mtDNA from such distantly related taxa would converge on the same gene order independently, it seems reasonable to suggest that the Daphnia/ Drosophila arrangement is ancestral for the crustacean– insect clade.
( Tables 2 and 3). Nucleotide composition is most similar among these sequences at the second codon position ( Table 3) with a range of only 6.9% ( T and C ) between the highest and lowest values. In contrast, the range is 20.9% (A) at the third position.
3.3. tRNA genes Proposed secondary structures have already been published for 15 of the 22 Daphnia mitochondrial tRNAs ( Van Raay and Crease, 1994; Crease and Little, 1997). Structures for the remaining seven are shown in Fig. 2. In general, the Daphnia tRNAs are similar to those found in other metazoan animals, although the TYC arms tend to be only 4 bp, instead of 5 bp, in length. An even shorter TYC arm of 3 bp occurs in tRNAGly (Crease and Little, 1997). The structures for tRNAGlu ( Fig. 3) and tRNASer(AGN) (Crease and Little, 1997) are also somewhat unusual in that their TYC arms are 6 bp in length. The anticodon sequences of all 22 Daphnia tRNAs are identical to those of Drosophila, suggesting that both mtDNAs use the same genetic code. As is the case in other metazoan mtDNA ( Wolstenholme 1992), the most commonly used codon in degenerate codon families often does not match the anticodon ( Table 4). For example, the most commonly used codon ends in U for six of the eight anticodons with a G in the wobble (first) position. Similarly, the most commonly used codon ends in U in six of the 4-fold degenerate codon families whose anticodon has a U in the wobble position. The tolerance of mismatches at the wobble position in codon–anticodon interactions is a common feature of metazoan mtDNA, but its mechanism has not been conclusively established.
3.2. Nucleotide composition The nt composition of the strand of Daphnia mtDNA that codes for the same genes as the first strand in Drosophila is 62.3% A+T (16.7% G, 31.5% A, 30.8% T, 21.0% C ). The overall A+T content of Daphnia mtDNA is most similar to that of Artemia ( Table 2). This same pattern holds for the protein-coding and rRNA genes, and for the control region: Daphnia is always much more similar to Artemia than it is to any of the insects or chelicerates ( Table 2). Daphnia is also most similar to Artemia in the nt composition of the three codon positions of the proteincoding genes. This is particularly evident at the third position where the A+T content is 69.4% and 63.3% in Artemia and Daphnia, respectively, but averages 91.8% in the insects and 82.5% in the chelicerates ( Table 3). The A+T content of the third position is similar to that of the overall genome in the crustaceans, whereas this position is substantially more A+T-biased than is the overall genome in both the insects and the chelicerates Table 2 Characteristics of the mtDNA of nine arthropods Taxon
Ixodes Rhipicephalus Anopheles gambiae A. quadrimaculatus Drosophila Locusta Apis Artemia Daphnia
Total length (bp)
14 539 14 710 15 363 15 455 16 019 15 722 16 343 15 770 15 333
Total (%A+T )
72.6 77.9 77.6 77.4 78.6 75.3 84.9 64.5 62.3
No. of codonsa
3599 3592 3734 3729 3728 3714 3676 3521 3681
PCGb (%A+T )
71.0 77.9 75.9 75.4 76.7 74.1 83.2 63.9 60.4
lrRNAc
srRNAd
Control region
Length
%A+T
Length
%A+T
Length
%A+T
1229 1190 1325 1321 1326 1314 1371 1153 1314
76.9 81.4 82.5 82.2 83.4 78.9 85.3 64.0 68.3
712 693 800 794 789 827 786 712 753
78.7 79.1 79.6 80.5 79.3 76.0 81.4 61.4 67.2
359 263 519 625 1077 875 827 1770 689
71.9 64.1d 94.2 93.5 92.9 86.0 96.0 68.0 67.1
a Termination codons are not included in codon totals. b Protein coding genes. c Values are approximate as exact 5∞ aand 3∞ ends have not been mapped in most taxa. d Mean of the two control regions present in this genome.
93
T.J. Crease / Gene 233 (1999) 89–99 Table 3 Base composition at each codon position in the protein-coding genes of mtDNA from nine arthropods First codon position
Ixodes Rhipicephalus Anopheles gambiae A. quadrimaculatus Drosophila Locusta Apis Artemia Daphnia
Second codon position
Third codon position
%A
%T
%C
%G
%A
%T
%C
%G
%A
%T
%C
%G
35.2 38.4 31.1 31.5 31.1 33.8 40.2 27.0 25.9
34.1 35.2 37.8 37.6 38.6 35.0 39.0 31.7 29.4
13.4 10.9 11.1 11.3 10.6 12.4 8.6 18.7 19.3
17.3 15.5 20.0 19.6 19.7 18.8 12.2 22.6 25.4
18.4 20.4 20.4 20.5 20.2 20.4 23.5 18.6 17.3
50.0 50.2 46.2 46.2 46.2 45.7 51.8 44.9 45.2
18.5 17.3 19.4 19.3 19.3 20.0 14.2 21.1 21.0
13.1 12.1 14.0 14.2 14.2 13.9 10.5 15.4 16.5
35.8 43.1 45.3 44.3 45.3 44.4 47.6 29.9 26.7
39.6 46.5 47.0 46.1 48.5 43.0 47.6 39.5 36.5
15.3 7.0 4.3 5.0 3.3 8.0 2.8 17.0 20.5
9.3 3.4 3.4 4.6 2.9 4.6 2.0 13.6 16.3
Fig. 2. Proposed secondary structures for seven tRNAs in the newly sequenced region of Daphnia pulex mtDNA. The secondary structures were drawn with the program CARD ( Winnepenninckx et al., 1995).
3.4. rRNA genes The putative length of the Daphnia lrRNA and srRNA genes is 1314 bp and 753 bp, respectively. Both genes are longer than their counterparts in Artemia, but are usually shorter than their insect counterparts ( Table 2). The structure for the Daphnia srRNA was modified from the one proposed by Van Raay and Crease (1994) to conform to the current model of Van de Peer et al. (1997). It is similar to that of Anopheles,
except that helices 8 and 12 seem to be absent ( Fig. 3A), although Van de Peer et al. (1997) suggest that very short versions of these helices are present in Artemia. In addition, helix 41 is absent in Daphnia but present in the insects. The two crustacean sequences are most divergent in both length and primary sequence at the 5∞ end, upstream of helix 26 (data not shown). The Daphnia lrRNA is also quite similar to that of Anopheles, with greater divergence occurring at the 5∞ end, upstream of helix C1 (data not shown). Like the lrRNA of Artemia,
94
T.J. Crease / Gene 233 (1999) 89–99
Fig. 3. Proposed secondary structures for the mitochondrial rRNAs of Daphnia pulex. The structures were drawn with the program CARD ( Winnepenninckx et al., 1995). (A) Secondary structure of srRNA. Helix numbering is from Van de Peer et al. (1997). (B) Secondary structure of lrRNA. Helix numbering is from De Rijk et al. (1997).
it does not appear to possess helix G13 ( Fig. 3B), although this helix is present in the insects (De Rijk et al., 1997). 3.5. Protein-coding genes Three of the initiation codons in Daphnia mtDNA (ATG, ATT and GTG: Table 1) have commonly been found in other animal mtDNAs (reviewed in Wolstenholme, 1992), although there seems to be little conservation in their use across taxa. For example, only four genes (ATPase6, COIII, ND3 and ND1) start with the same codon in both crustaceans. Van Raay and Crease (1994) proposed a 4 bp initiation codon (ATTA) for COI in Daphnia based on comparison with the Drosophila sequence (Clary and Wolstenholme, 1985) and the fact that there are no other initiation codons at the beginning of the gene, but this has not been confirmed experimentally. Six genes have complete TAA termination codons ( Table 2) while ATPase6 and ND3 terminate with TA, although the first nt of the adjacent gene (COIII and tRNAAla, respectively) could be used to complete the termination codon. The remaining five genes all have termination codons consisting only of T. The 3∞ end of
each of these genes is immediately adjacent to a tRNA gene from which a complete termination codon could not be generated, even considering overlaps. This association between incomplete termination codons and tRNA genes appears to be quite common in metazoan mtDNA ( Wolstenholme, 1992). In these cases, it is likely that complete termination codons are generated by polyadenylation of transcripts as has been shown for some human mtDNA genes (Ojala et al., 1981). In general, codon usage in Daphnia mtDNA is most similar to that of Artemia ( Table 4), as is aa composition of the mitochondrial proteins ( Table 5). It is well known that nt composition influences codon usage and indeed, regressions of the ratio of G+C-rich codons (Pro, Ala, Arg, Gly) to A+T-rich codons (Phe, Ile, Met, Tyr, Asn, Lys) against both total A+T content (R2=0.72, p<0.01), and A+T content of the protein-coding genes (R2=0.76, p<0.01) are significant. The aa sequences of the Daphnia mitochondrial proteins, with the exception of COI, are more similar to those of the dipteran insects than they are to those of Artemia, despite the fact that the two crustaceans are very similar with respect to nt composition, codon usage, and aa composition (Table 6). To determine if this pattern persists despite similarities in nt composition
95
T.J. Crease / Gene 233 (1999) 89–99 Table 4 Codon usage in protein-coding genes of the mtDNA of eight arthropodsa Amino Acid
Codon
Daph.
Arte.
Anop.b
Dros.
Locu.
Apis
Ixod.
Rhip.
Asn gtt Lys ctt Thr tgt
AAT AAC AAA AAG ACT ACC ACA ACG AGT AGC AGA AGG ATT ATC ATA ATG CAT CAC CAA CAG CCT CCC CCA CCG CGT CGC CGA CGG CTT CTC CTA CTA GAT GAC GAA GAG GCT GCC GCA GCG GGT GGC GGA GGG GTT GTC GTA GTG TAT TAC TCT TCC TCA TCG TGT TGC TGA TGG TTT TTC TTA TTG
75 43 48 41 79 54 42 13 49 20 68 1 186 93 90 52 48 42 40 32 60 51 23 11 22 6 24 7 142 66 96 40 37 38 44 37 82 68 50 28 34 34 59 147 97 42 86 53 44 69 109 28 50 23 24 11 61 42 257 88 201 73
93 31 41 40 71 24 45 12 30 20 68 7 203 80 128 59 55 20 49 20 66 27 33 15 24 7 20 13 127 50 100 34 50 21 55 33 78 46 40 18 58 20 63 75 70 40 83 44 102 45 117 56 73 21 33 13 69 22 215 98 185 65
182 22 66 29 107 4 93 2 52 8 46 0 315 13 198 25 61 15 71 2 78 3 52 1 6 0 47 5 29 1 40 1 60 7 79 2 105 12 59 0 51 5 137 25 91 3 105 5 139 24 117 5 81 5 35 5 98 3 325 34 518 22
193 13 76 9 97 3 85 2 34 1 73 0 345 15 195 18 65 12 70 0 79 3 45 3 8 0 45 6 36 2 19 2 54 10 82 1 125 9 37 2 67 2 129 22 90 3 93 8 142 28 120 4 102 3 40 2 96 6 313 17 542 25
141 46 71 29 59 12 131 3 24 3 81 2 333 42 237 44 46 25 60 3 51 4 78 3 18 1 34 2 53 5 72 4 67 13 66 11 63 6 79 2 92 1 105 10 95 5 80 3 146 34 122 7 120 4 36 8 96 4 251 87 340 47
238 11 152 8 53 5 72 1 18 2 81 2 476 24 312 22 57 3 39 2 36 3 64 0 9 0 29 1 35 1 36 0 52 5 74 5 20 1 36 0 47 0 85 3 67 1 53 0 209 10 53 11 166 1 25 0 78 5 354 26 472 24
115 47 98 22 54 29 50 7 32 2 64 15 311 125 232 64 44 23 43 6 63 36 33 5 14 7 22 5 67 37 70 7 43 21 43 36 54 18 41 3 34 6 87 54 74 11 62 36 98 24 110 58 99 6 27 6 78 17 287 100 265 52
179 29 126 10 69 5 67 1 30 2 72 7 422 42 293 25 59 6 51 5 53 17 47 3 11 0 32 1 59 6 42 1 49 9 69 10 60 14 41 2 50 13 79 21 57 7 70 5 111 19 89 25 129 1 28 8 74 6 344 51 355 24
Ser gct
Ile gat Met cat His gtg Gln uug Pro tgg
Arg tcg
Leu tag
Asp gtc Glu ttc Ala tgc
Gly tcc
Val tac
Tyr gta Ser tga
Cys gca Trp tca Phe gaa Leu taa
a The Daphnia tRNA anticodon (5∞–3∞) is given in lower-case letters for each amino acid. b Anopheles gambiae.
96
T.J. Crease / Gene 233 (1999) 89–99
Table 5 Amino acid composition (%) of translated protein-coding sequences in the mtDNA of nine arthropods Amino acid
Daph.
Arte.
A. gam.
A. qua.
Dros.
Locu.
Apis
Ixod.
Rhip.
Ala Arg Asn Asp Cys Gln Glu Gly His Ile Leu Lys Met Phe Pro Ser Thr Trp Tyr Val Codon ratioa
6.2 1.6 3.2 2.0 1.0 2.0 2.2 7.4 2.4 7.5 16.8 2.4 4.0 9.4 3.9 9.5 5.1 2.8 3.1 7.5 0.65
5.2 1.8 3.5 2.0 1.3 2.0 2.5 6.1 2.1 8.0 15.9 2.3 5.3 8.9 4.0 11.1 4.3 2.6 4.2 6.7 0.53
4.7 1.6 5.5 1.8 1.1 2.0 2.2 5.8 2.0 8.8 16.4 2.5 6.0 9.6 3.6 8.4 5.5 2.7 4.4 5.5 0.42
4.8 1.6 5.6 1.8 1.0 2.0 2.1 5.8 2.1 9.1 16.3 2.5 6.0 9.8 3.6 8.2 5.5 2.7 4.4 5.0 0.42
4.6 1.6 5.5 1.7 1.1 1.9 2.2 5.9 2.1 9.7 16.8 2.3 5.7 8.9 3.5 9.1 5.0 2.7 4.6 5.2 0.43
4.0 1.5 5.0 2.2 1.2 1.7 2.1 5.6 1.9 10.1 14 2.7 7.6 9.1 3.7 9.8 5.5 2.7 4.8 4.9 0.38
1.6 1.1 6.8 1.6 0.7 1.1 2.1 3.7 1.6 13.6 15.5 4.4 9.1 10.3 2.8 9.1 3.6 2.3 6.0 3.3 0.18
3.2 1.3 4.5 1.8 0.9 1.4 2.2 5.1 1.9 12.1 13.8 3.3 8.2 10.8 3.8 10.7 3.9 2.6 3.4 5.1 0.32
3.3 1.2 5.8 1.6 1.0 1.6 2.2 4.5 1.8 12.9 13.5 3.8 8.9 11.0 3.3 9.9 4.0 2.2 3.6 3.9 0.27
a [Pro+Ala+Arg+Gly]/[Phe+Ile+Met+Tyr+Asn+Lys]. Table 6 Comparison of mitochondrial protein-coding genes in the mtDNA of seven arthropodsa Gene
ATPase6 ATPase8 CO I CO II CO III Cyt B ND 1 ND 2 ND 3 ND 4 ND 4L ND 5 ND 6
Daphnia
Artemia
Anopheles gambiae
Drosophila
Locusta
Apis
Ixodes
No. aa
No. aa
%aa
%nt
No. aa
%aa
%nt
No. aa
%aa
No. aa
%aa
No. aa
%aa
No. aa
%aa
224 53 512 226 262 377 311 329 117 440 91 569 170
219 53 511 227 257 381 298 296 111 386 85 538 155
57 26 83 65 65 68 53 35 48 42 48 41 30
56 41 71 62 62 63 55 36 49 47 45 44 33
226 53 512 228 262 378 314 341 117 447 101 580 174
69 38 83 69 75 69 62 37 54 44 49 48 35
64 53 72 67 69 66 65 44 60 52 57 55 48
224 53 512 228 262 378 324 341 117 446 96 573 174
66 28 83 70 74 73 59 36 57 45 51 47 38
225 52 511 227 263 379 313 342 115 444 97 572 173
60 26 77 58 76 67 58 35 47 44 47 47 29
226 52 520 225 259 383 305 333 117 447 87 554 167
44 28 67 56 51 54 44 20 46 33 33 37 17
220 51 512 225 261 366 313 318 111 436 91 554 141
46 31 77 59 51 59 46 37 46 42 30 42 29
a No. aa=number of amino acids; %aa=percentage amino acid similarity to Daphnia; %nt=percentage nucleotide similarity to Daphnia.
between the crustaceans, nt similarity of the Daphnia protein-coding genes was compared with those of Artemia and Anopheles gambiae ( Table 6). The similarity is always highest between Daphnia and Anopheles, reinforcing the conclusion of Crease and Little (1997) that bias in nt composition does not greatly constrain divergence among mitochondrial proteins at either the nt or the aa level. Conversely, aa sequence similarity can persist despite differences in nt composition bias. Based on the pattern of divergence among the arthropod taxa, Crease and Little (1997) suggested that Artemia mtDNA is evolving at an accelerated rate, as
has also been suggested for Apis mtDNA (Crozier and Crozier, 1993). To determine if the pattern of substitution differs in the more rapidly evolving taxa, a detailed comparison of sequence divergence in COII was undertaken in three pairs of taxa: Daphnia/Artemia, Daphnia/Anopheles gambiae, and Anopheles/Apis. Daphnia and Anopheles show the highest similarity between insects and crustaceans, while Artemia and Apis both appear to be evolving at an accelerated rate within their group. Thus, the pattern of substitution in the Daphnia/Anopheles intergroup comparison was compared with the pattern of substitution in each intragroup
97
T.J. Crease / Gene 233 (1999) 89–99
comparison (Daphnia/Artemia and Anopheles/Apis), each of which involves a rapidly evolving taxon. COII was chosen for this analysis because it displays the typical pattern of sequence divergence and is the most easily aligned gene in all taxa. The only internal gap that had to be introduced into the alignment was the loss of two consecutive aa in Apis relative to the other taxa. As expected, the highest proportion of substitutions (ts+tv) occurs at the third codon position and the lowest proportion occurs at the second position for all three comparisons ( last column, Table 7). None of these distributions is significantly different from one another. The only significant difference in the two comparisons involving Daphnia is the distribution of tv types within the third codon position (x2=12.65, p<0.01). The proportion of A/T tv is much higher in the Daphnia/Anopheles comparison than in the Daphnia/Artemia comparison. This is consistent with a higher A+T bias at the third codon position in Anopheles relative to the crustaceans ( Table 2). Jermiin and Crozier (1994)) suggested that this A/T tv bias could lead to a higher proportion of tv relative to ts. However, this was not the case in the two Daphnia comparisons. Indeed, the overall proportion of tv at the third position is actually higher in the Daphnia/Artemia comparison (58%) than in the Daphnia/Anopheles comparison (56%), although this difference is not significant (x2=0.107, p>0.5). As above, the distribution of tv types within the third codon position is significantly different (x2=22.208, p<0.005) in the two comparisons involving Anopheles due to the extreme A/T tv bias in Anopheles/Apis compared to Anopheles/Daphnia. Again, this is consistent with the fact that both insect mtDNAs, but especially Apis, are A+T-biased relative to Daphnia. However,
unlike the first analysis, this bias does influence the overall proportion of tv at the third codon position (x2=17.796, p<0.01). Only 56% of the third position substitutions are tv in the Anopheles/Daphnia comparison as opposed to 82% in the Anopheles/Apis comparison ( Table 7). This is consistent with the suggestion that extreme bias in nt composition can significantly alter the proportion of tv (Jermiin and Crozier, 1994). Moreover, unlike the first analysis, there is also a significant difference in the distribution of ts across codon positions (x2=9.928, p<0.01) with a higher proportion of ts occurring at first and second codon positions in the Anopheles/Apis comparison. This analysis suggests that the cause of the evolutionary acceleration in Artemia and Apis mtDNA differs. The rapid rate of divergence of Apis mtDNA is thought to be largely due to A/T mutation pressure which appears to increase the frequency of A/T tv (Crozier and Crozier, 1993; Jermiin and Crozier, 1994), particularly at the third codon position ( Table 7). The increased proportion of ts at first and second positions [also noted by Crozier and Crozier (1993) in a comparison of Apis and Drosophila] could also be explained by A+T mutation pressure if C and G in Anopheles tend to be replaced by T and A in Apis. This is indeed the case (66% of ts at codon position one, 61% of ts at codon position two), although the bias is not significant (x2=2.74, 0.888, p>0.1). Thus, it seems likely that something in addition to A+T mutation pressure may be contributing to the difference in the distribution of ts across codon positions in Apis. In contrast, the overall similarity in patterns of divergence in the two comparisons involving Daphnia suggests that the accelerated rate of divergence in Artemia is unbiased and has little, if anything, to do with nt composition. There are simply more substitutions ‘across the board’. Whether this is due to an increase in
Table 7 Nucleotide substitutions in COII between (A) Daphnia and Artemia, (B) Daphnia and Anopheles and (C ) Anopheles and Apisa Taxa
A Daphnia vs Artemia B Daphnia vs Anopheles C Anopheles vs Apis
Codon position
1 2 3 Total 1 2 3 Total 1 2 3 Total
Transitions
Transversions
A/G
C/T
Total
A/T
A/C
G/T
G/C
Total
0.74 0.31 0.34 0.46 0.62 0.23 0.22 0.36 0.72 0.17 0.16 0.42
0.26 0.69 0.66 0.54 0.38 0.77 0.78 0.64 0.28 0.84 0.84 0.59
0.31 0.14 0.55 112 0.37 0.12 0.51 107 0.44 0.27 0.29 66
0.25 0.29 0.42 0.36 0.43 0.30 0.65 0.54 0.53 0.31 0.95 0.71
0.34 0.37 0.31 0.33 0.31 0.35 0.28 0.29 0.30 0.34 0.04 0.17
0.19 0.13 0.16 0.16 0.13 0.15 0.04 0.08 0.12 0.21 0.01 0.08
0.22 0.21 0.11 0.15 0.13 0.20 0.03 0.08 0.05 0.14 0.00 0.04
0.25 0.17 0.58 143 0.25 0.17 0.58 119 0.28 0.19 0.53 156
Total Ts
Total Tv
Ts/Tv ratio
Total changes
0.49 0.40 0.42 0.44 0.57 0.39 0.44 0.47 0.40 0.38 0.18 0.30
0.51 0.60 0.58 0.56 0.43 0.61 0.56 0.53 0.60 0.62 0.82 0.70
0.97 0.67 0.73 0.78 1.30 0.65 0.80 0.90 0.67 0.59 0.23 0.42
0.28 0.16 0.57 255 0.31 0.15 0.55 226 0.32 0.21 0.47 222
a The total number of nt changes within each category is shown in bold at the intersection of row and column totals. All other values are proportions. Distributions that are significantly different between A and B, or between B and C are underlined in the body of the table.
98
T.J. Crease / Gene 233 (1999) 89–99
the mutation rate itself, or to an increase in the probability that new mutations will become fixed in this lineage is unknown. 3.6. Phylogenetic considerations Flook et al. (1995) constructed phylogenies from the protein-coding genes of Artemia and six insect mtDNAs and found, with strong bootstrap support, that the dipteran insects (Drosophila and Anopheles) formed a monophyletic group with Locusta as its sister group. Black and Roehrdanz (1998) constructed phylogenies from the aa sequences of these same taxa plus three chelicerates (two ticks and a horseshoe crab) and the earthworm, Lumbricus terrestris, which was used as an outgroup. In these trees, insects (except Apis) and Artemia formed one monophyletic group and the chelicerates plus Apis formed a second group. There was very strong bootstrap support for a tick/Apis clade. Black and Roehrdanz (1998) showed that this result was due to the convergence in aa sequences between the tick, Rhipicephalus, and Apis which is driven by constraints imposed by the high A+T content of their mitochondrial genomes. In the present study, the Daphnia protein-coding sequences were added to the alignment of the insects plus Artemia used by Flook et al. (1995) and trees were constructed from it using cladistic, phenetic and maximum likelihood methods. Only the second codon position, with the lowest bias in nt composition, was used in the analyses. In all trees, with or without positions containing gaps, and with or without ts, Daphnia and Artemia clustered with one another as expected. Consistent with previous results, Apis never grouped with the other holometabolous insects. Naylor and Brown (1998) analyzed mitochondrial nt and aa sequences from 19 animal taxa and failed, with strong statistical support, to recover the accepted phylogeny for chordates. They argued that a non-random distribution of homoplasy, such as that caused by extreme bias in nt composition, could lead to the inference of incorrect, yet robust, phylogenies because the underlying assumption of all methods that test the reliability of phylogenetic analyses is that homoplasy is random. However, Naylor and Brown (1998) were able to identify sites that recovered the correct chordate phylogeny with strong statistical support. These sites were the first and second codon positions of sites that modally code for proline, cysteine, methionine, glutamine and asparagine. To determine if they could recover the ‘correct’ insect topology, these sites were used to construct a maximum likelihood tree for the seven insect and crustacean taxa. The ts/tv ratio was set to 0.6 based on the average ratio for first and second codon positions in the six previously-sequenced taxa (Flook et al., 1995). The resulting tree, ((((Anopheles, Drosophila) Locusta)
Apis) (Daphnia, Artemia)), is identical to the maximum likelihood tree based on second position nt from all codons. This suggests that Apis really is outside the Locusta/Diptera clade, that the genes yielding correct results vary among data sets, a possibility suggested by Naylor and Brown (1998), or that homoplasy is nonrandom in Apis, even at the ‘informative’ sites. The last argument is supported by the convergence in aa sequence between Apis and Rhipicephalus and by the elevated proportion of ts at first and second codon positions in Apis (Crozier and Crozier, 1993; Table 7). Thus, it is clear that the unusual pattern of divergence in Apis mtDNA has partially obscured its phylogenetic signal. Conversely, the unbiased evolutionary acceleration in Artemia mtDNA has not obscured the phylogenetic signal so that Artemia correctly groups with Daphnia, even though the Daphnia nt and protein sequences are both more similar to those of dipteran insects.
Acknowledgements This research was supported by a research grant from NSERC of Canada. Comments by two anonymous reviewers greatly improved the paper. I thank A. Holliss for expert sequencing of the plasmid DNA templates, P. Flook for providing the mtDNA alignment of insects and Artemia, and W. Black for providing the tick mtDNA sequences prior to their release to the GenBank.
References Beard, C.B., Hamm, D.M., Collins, F.H., 1993. The mitochondrial genome of the mosquito Anopheles gambiae: DNA sequence genome organization and comparisons with mitochondrial sequences of other insects. Insect Mol. Biol. 2, 103–104. Black IV, W.C., Roehrdanz, R.L., 1998. Mitochondrial gene order is not conserved in arthropods: prostriate and metastriate tick mitochondrial genomes. Mol. Biol. Evol. 15, 1772–1785. Boore, J.L., Collins, T.M., Stanton, D., Daehler, L.L., Brown, W.M., 1995. Deducing the pattern of arthropod phylogeny from mitochondrial DNA rearrangements. Nature 376, 163–167. Boore, J.L., Lavrov, D.V., Brown, W.M., 1998. Gene translocation links insects and crustaceans. Nature 392, 667–668. Clary, D.O., Wolstenholme, D.R., 1985. The mitochondrial DNA molecule of Drosophila yakuba: nucleotide sequence gene organization and genetic code. J. Mol. Evol. 22, 252–271. Crease, T.J., Little, T.J., 1997. Partial sequence of the mitochondrial genome of the crustacean Daphnia pulex. Curr. Genet. 31, 48–54. Crozier, R.H., Crozier, Y.C., 1993. The mitochondrial genome of the honeybee Apis mellifera: complete sequence and genome organization. Genetics 133, 97–117. De Rijk, P., Van de Peer, Y., De Wachter, R., 1997. Database on the structure of large ribosomal subunit RNA. Nucleic Acids Res. 25, 117–122. Felsenstein, J., 1993. PHYLIP version 3.5. University of Washington, Seattle. Flook, P.K., Rowell, C.H.F., Gellissen, G., 1995. The sequence organ-
T.J. Crease / Gene 233 (1999) 89–99 ization and evolution of the Locusta migratoria mitochondrial genome. J. Mol. Evol. 41, 928–941. Jermiin, L.S., Crozier, R.H., 1994. The cytochrome b region in the mitochondrial DNA of the ant Tetraponera rufoniger: sequence divergence in Hymenoptera may be associated with nucleotide content. J. Mol. Evol. 38, 282–294. Kumar, S., Tamura, K., Nei, M., 1993. MEGA: molecular evolutionary genetics analysis version 1.01. The Pennsylvania State University, University Park, PA. Lee, S., Rasheed, S., 1990. A simple procedure for maximum yield of high-quality plasmid DNA. BioTechniques 9, 676–679. Lipman, D.J., Pearson, W.R., 1985. Rapid and sensitive protein similarity searches. Science 227, 1435–1441. Matzura, O., Wennborg, A., 1996. RNAdraw: an integrated program for RNA secondary structure calculation and analysis under 32-bit Microsoft Windows. CABIOS 12, 247–249. Mitchell, S.E., Cockburn, A.F., Seawright, J.A., 1993. The mitochondrial genome of Anopheles quadrimaculatus species A: complete nucleotide sequence and gene organization. Genome 35, 1058–1073. Naylor, G.J.P., Brown, W.M., 1998. Amphioxus mitochondrial DNA chordate phylogeny and the limits of inference based on comparisons of sequences. Syst. Biol. 47, 61–76. Ojala, D., Montoya, J., Attardi, G., 1981. tRNA punctuation model of RNA processing in human mitochondria. Nature 290, 470–474.
99
Saitou, N., Nei, M., 1987. The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol. Biol. Evol. 4, 406–425. Sprinzl, M., Hartmann, T., Weber, J., Blank, J., Zeidler, R., 1989. Compilation of tRNA sequences and sequences of tRNA genes. Nucleic Acids Res. 17, 1–172. Stanton, D.J., Crease, T.J., Hebert, P.D.N., 1991. Cloning and characterization of Daphnia mitochondrial DNA. J. Mol. Evol. 33, 152–155. Staton, J.L., Daehler, L.L., Brown, W.M., 1997. Mitochondrial gene arrangement of the horseshoe crab Limulus polyphemus L: conservation of major features among arthropod classes. Mol. Biol. Evol. 14, 867–874. Valverde, J.R., Batucecas, B., Moratilla, C., Marco, R., Garesse, R., 1994. The complete mitochondrial DNA sequence of the crustacean Artemia franciscana. J. Mol. Evol. 39, 400–408. Van de Peer, Y., Jansen, J., De Rijk, P., De Wachter, R., 1997. Database on the structure of small ribosomal subunit RNA. Nucleic Acids Res. 25, 111–116. Van Raay, T.J., Crease, T.J., 1994. Partial mitochondrial DNA sequence of the crustacean Daphnia pulex. Curr. Genet. 25, 66–72. Winnepenninckx, B., Van de Peer, Y., Backeljau, T., De Wachter, R., 1995. CARD: a drawing tool for RNA secondary structure models. BioTechniques 18, 1060–1063. Wolstenholme, D.R., 1992. Animal mitochondrial DNA: structure and evolution. Int. Rev. Cytol. 141, 173–216.