Gene 396 (2007) 66 – 74 www.elsevier.com/locate/gene
The complete nucleotide sequence of the mitochondrial genome of the oriental fruit fly, Bactrocera dorsalis (Diptera: Tephritidae) D.J. Yu a,b,⁎, L. Xu a , F. Nardi c , J.G. Li d , R.J. Zhang b a
b
Shenzhen Entry-Exit Inspection & Quarantine Bureau, Shenzhen, PR China Institute of Entomology & State Key Laboratory for Biocontrol, Sun Yat-Sen University, Guangzhou, PR China c Department of Evolutionary Biology, University of Siena, Siena, Italy d Beijing Entry-Exit Inspection & Quarantine Bureau, Beijing, PR China Received 4 July 2006; received in revised form 30 January 2007; accepted 20 February 2007 Available online 15 March 2007 Received by: G. Pesole
Abstract The complete mitochondrial genome of the oriental fruit fly Bactrocera dorsalis s.s. has been sequenced, and is here described and compared with the homologous sequences of Bactrocera oleae and Ceratitis capitata. The genome is a circular molecule of 15,915 bp, and encodes the set of 37 genes generally found in animal mitochondrial genomes. The structure and organization of the molecule is typical and similar to the two closely related species B. oleae and C. capitata, although it presents an interesting case of putative intra-molecular recombination. The relevance of the growing comparative dataset of tephritid complete mitochondrial genomes is discussed in relation to the possibility to develop robust assays for species discrimination in quarantine and agricultural monitoring practices, as well as basic phylogeography/population genetic studies. © 2007 Elsevier B.V. All rights reserved. Keywords: Mitochondrial genome; Oriental fruit fly; Bactrocera dorsalis complex; Tephritidae; Intramolecular recombination, Species discrimination
1. Introduction The oriental fruit fly, Bactrocera dorsalis (Hendel) is one of the most economically important fruit fly pests (Clarke et al., 2005). This polyphagous species has been recorded in Asia Abbreviations: atp6-8, genes encoding ATP synthase subunits 6 and 8; cob, gene encoding cytochrome oxidase b; cox1-3, genes encoding cytochrome oxidase subunits 1-3; CR, control region; nad1-6 and nad4L, genes encoding NADH dehydrogenase subunits 1-6 and 4L; NCR, non-coding region; rrnL and rrnS, genes encoding the large (16S) and small (12S) subunits of ribosomal RNA; LSU and SSU, Large and Small ribosomal RNA subunits; PCG, protein coding gene; trnX, genes encoding transfer RNA molecules, with the corresponding amino acid denoted by the one-letter code and anticodon indicated in parentheses (XXX) when required; tRNA-X, transfer RNA molecules with corresponding amino acids denoted with a one-letter code; Kb, kilobases; nt, nucleotides; PCR, Polymerase Chain Reaction; bp, base pair; mtDNA, mitochondrial DNA. ⁎ Corresponding author. Shenzhen Entry-Exit Inspection and Quarantine Bureau, 2049 Heping Road, 518001, Shenzhen, Guangdong, PR China. Tel.: +86 755 82117990; fax: +86 755 25588630. E-mail address:
[email protected] (D.J. Yu). 0378-1119/$ - see front matter © 2007 Elsevier B.V. All rights reserved. doi:10.1016/j.gene.2007.02.023
from 117 host species, in 76 genera and 37 families (Allwood et al., 1999), among which are a number of commercially grown fruits (White and Elson-Harris, 1992) such as cherry, plum and peaches. The species is widespread throughout much of South East Asia, Pakistan, India, Sri Lanka, Sikkim, Myanmar, Southern China, Japan (eradicated from Ryukyu Island), and some pacific islands including Hawaii (Carroll et al., 2002). In addition, given its high polyphagy and capability of adapting to new areas, B. dorsalis is causing serious concern in terms of its possible introduction in other economically significant fruit growing regions worldwide. As an example, the annual investigation plan for fruit flies in China reported the presence of B. dorsalis in more than 10 provinces, and its range may expand to Southeast and Middle China as a consequence of the global climate warming (Hou, 2005). B. dorsalis s.s. belongs to a large group of morphologically similar species generally referred to as the B. dorsalis complex. Beginning with the revision of Drew and Hancock (1994), this group of species has been subdivided in the last decade to include 75 recognized entities. Among these, a group of sibling
D.J. Yu et al. / Gene 396 (2007) 66–74
species, hardly distinguishable on morphological grounds, has drawn a lot of attention as it includes, besides some non pest species and B. dorsalis s.s., at least four species of major agricultural importance: Bactrocera carambolae Drew & Hancock, Bactrocera papayae Drew & Hancock, Bactrocera occipitalis (Bezzi) and Bactrocera philippinensis Drew & Hancock. An efficient and reliable way to discriminate among these (as well as other) species has become crucial in terms of international trade and quarantine controls (Follett and Neven, 2006), and is foreseeable only using molecular methodologies (for a review of earlier methods and recent perspectives: Clarke et al., 2005). The animal mitochondrial genome is a closed circular molecule ranging in size from 14 to 19 kb (Wolstenholme, 1992; Boore, 1999). The gene content and organization are generally conserved. Mitochondrial genome sequences have frequently been used as molecular markers in studies focusing at the species level (for a comprehensive review see Avise, 2000) although some limitations, such as their dependency on the female lineage only, and the possible detrimental effects of non neutrality and introgression, are emerging (Ballard and Whitlock, 2004; Ballard and Rand, 2005). Substitution rates are generally higher in the mitochondrial than in the nuclear genome (Brown et al., 1982), providing more resolution at shallow divergence levels. Furthermore it is inherited via the female line only, and generally does not recombine (Scheffler, 1999). Most importantly, the effective population size of the molecules is one fourth of their nuclear counterparts and the mean coalescence time is shorter. Therefore mitochondrial phyletic lineages reach monophyly faster after population/species are genetically separated (Avise, 2000). In recent years, as the sequencing of multiple complete mitochondrial genomes is becoming more commonplace, comparative analyses at the genus/species level have been produced that use complete molecules instead of one or a subset of genes (Ballard, 2000a,b; Yukuhiro et al., 2002; Nardi et al., 2003; see also Rand, 2001). About 60 insect mitochondrial genomes are now available (Amiga database: Feijao et al., 2006), 11 among higher diptera and four (from three species, including the genome of B. dorsalis presented here) from tephritid fruit flies. Therefore the comparative study of complete mitochondrial genomes is becoming a viable approach to the study of certain insect groups at the species level, and could be
67
applied to species diagnostics in key taxa, such as tephritid fruit flies. In the short term, background information on complete mitochondrial genomes in closely related taxa can be used to select one or more genes as the most variable/informative for specific problems. This strategy was applied in Nardi et al. (2003) where, following a comparison of two completely sequenced mitochondrial genomes, the gene nad1 was chosen for a wider phylogeographic analysis in B. oleae (Nardi et al., 2005). Here we report and describe the complete nucleotide sequence of the mitochondrial genome of B. dorsalis s.s. from a Chinese population, and compare the sequence and genome organization with the two other available complete mitochondrial genomes from tephritid fruit flies, the congeneric B. oleae (Nardi et al., 2003) and C. capitata (Spanos et al., 2000), that belongs to a different tribe in the same subfamily. The possible utility of mitochondrial sequences for species discrimination in the B. dorsalis complex is also discussed. 2. Materials and methods Larvae of B. dorsalis s.s. were collected from Longan (Dimocarpus longan Lour.) in Shenzhen, Guangdong Province of China, and reared to adulthood under laboratory conditions (Yu et al., 2004). Adult specimens were identified to the species level according to the taxonomic key of Drew and Hancock (1994). Total DNA was isolated from single adult specimens using the DNEasy Tissue Kit (QUIAGEN, Germany), and the DNA of one single individual was used for all amplifications and sequencing that resulted in the complete genome sequence. The entire mitochondrial genome was amplified via PCR in 22 overlapping pieces ranging in size from 150 bp to 2 kb (Fig. 1; primer sequences and amplification conditions available upon request from Yu) using a recombinant Taq DNA Polymerase (TaKaRa, Japan). Most primers are based on Simon et al. (1994), with modifications based on the available sequences of C. capitata (GenBank accession no. AJ242872) and B. oleae (GenBank accession no. AY210702); additional primers were designed on available sequences. Most fragments were gel purified (QIAquick Gel Extraction Kit: QIAGEN, Germany) and directly sequenced on both strands using PCR and internal primers. To ensure maximum accuracy, each fragment was sequenced twice independently, and in case of discrepancies a third PCR product was sequenced. The fragment encompassing
Fig. 1. Sequencing strategy. Full lines identify amplification products (1–20 and 22), the double line (21) identifies a fragment that was cloned before sequencing. See section 2 for details.
68
D.J. Yu et al. / Gene 396 (2007) 66–74
the CR and nad2 gene (fragment 21 in Fig. 1) was cloned in a pGEM-T vector (Promega, USA) and three clones were sequenced. For this region cloning was carried out to obtain better sequencing reads rich in areas rich in homopolymer runs. In both cases discrepancies among different reads (roughly 3‰ of bases) were corrected on a two out of three basis. Primers 1007SpacerFw: (GAGGGTTACCCCCATTTATTG) and 1684SpacerRev (TTGATCGTCACCGATTAAAGC) were used to amplify and sequence a fragment of 878 bp (encompassing the area around trnW/C/Y) in three additional specimens of B. dorsalis s.s. from China (Shanghai, Yunnan and Guangdong provinces) as well as B. dorsalis from U.S.A, Thailand, Vietnam, Japan (laboratory strain), B. papayae from Malaysia, B. carambolae from Thailand, B. philippinensis and B. occipitalis from the Philippines. These additional sequences were used for comparison, and not included in the complete genome sequence. Sequencing and cloning were performed at Beijing Aoke Biological Limited Inc using a ABI3730 DNA Sequencer (PE Applied Biosystems, USA) and BigDyeTM chemistry. Electropherograms were manually inspected for accuracy of the base calls and assembled. Sequence annotation, and the comparison with C. capitata and B. oleae was performed using the DNAStar package (DNAStar, USA) and the on-line blast tools available through the NCBI web site Altschul et al., 1997). Folding of the repeated fragment and free energy values were studied in the trnCspacer-trnY region and in an equally long area in the CR using the mfold software (default parameters for DNA; Zucker, 2003). The complete sequence of B. dorsalis mtDNA was deposited in GenBank under accession no. DQ845759, additional sequences of the trnW/C/Y region under accession nos. EF377349–59.
in B. oleae, where the longest intergenic spacer is 28 bp, and more similar to C. capitata, where two longer (N 40 bp) spacers are observed on both sides of trnQ. Neither B. oleae nor C. capitata display significant intergenic spacers in the region between trnC and trnY (1 bp and 16 bp, respectively). Base composition shows both genome wide and strand specific compositional biases. Overall composition is A = 39.3%, C = 16.2%, G = 10.2%, T = 34.3%, giving a total A + T content of 73.6%. This figure is very similar to the congeneric B. oleae (72.6%) and slightly lower than C. capitata (77.5%). The A + T content of isolated protein coding genes, rRNAs, tRNAs and the CR is 71.2%, 77.8%, 75.2% and 88.1%, respectively, again similar to B. oleae and slightly lower than in C. capitata (Table 2). Considering the two strands separately, an asymmetrical compositional bias can be observed (Hassanin et al., 2005), and is most evident comparing gene sequences on opposite strands: genes encoded on the J strand (nad2-3 and 6, cox1-3 and b, atp6 and 8) have a comparable content in As and Ts (32.9% A; 37.6% T, on average), while genes on the N strand (nad1, 45 and 4L, rrnS and rrnL) display a strong bias towards higher A % than T% (48.4% A; 26.5%T). This parallels observations from B. oleae (Nardi et al., 2003), apart from the fact that here the bias can also be observed in rRNA genes, though to a lesser extent. 3.2. Protein coding genes and codon usage The mitochondrial genome of B. dorsalis encodes the regular set of 13 PCGs found with few exceptions in all animal mitochondrial genomes. All protein coding genes, with the exception of cox1 and atp8, start with a typical ATN initiation codon (cox2, 3 and b, atp6 and nad4 and 4L with ATG; nad2-3
3. Results and discussion 3.1. Genome organization and base composition The mitochondrial genome of B. dorsalis is a closed circular molecule of 15,915 bp, well in the range of the other two tephritid genomes available (15,815 in B. oleae and 15,980 in C. capitata). The gene content is typical of other metazoan mitochondrial genomes (Fig. 2, Table 1): 13 PCGs (cox1-3 and b, nad1-6 and 4L, atp6 and 8), 22 tRNA genes (one for each amino acid, two for Leucine and Serine) and two for mitochondrial rRNA subunits (rrnS and rrnL). Gene order follows the basic Pancrustacean arrangement (Crease, 1999). Only one long unassigned region is present between rrnS and trnI, and it was deemed homologous to the CR (also known as A + T rich region in insects) by positional homology, general structure and base content. Genome organization is very compact, with only 218 nucleotides dispersed in 14 intergenic spacers and contiguous genes overlapping at 5 boundaries by a total of 27 bases. Besides the aforementioned CR, two of the intergenic spacers appear of significant length: 66 bp between trnQ and trnM and 45 bp between trnC and trnY. This is unlike the situation
Fig. 2. Graphical representation of the mitochondrial genome of Bactrocera dorsalis. Orientation of genes and tRNAs is specified by the direction of arrows. Positive number at gene boundaries mark intergenic spacers (in bp, see Table 1), negative numbers gene overlaps. Shading identifies different regions (PCGs, rRNAs and CR).
D.J. Yu et al. / Gene 396 (2007) 66–74
69
Table 1 Genes and gene regions in the mtDNA of B. dorsalis Gene
Span
IGS a
Start
Stop
Gene
Span
IGS a
Start
Stop
trnI trnQ trnM Nad2 c trnW c trnC c trnY c Cox1 c trnL(UUR) Cox2 trnK trnD atp8 atp6 Cox3 trnG Nad3 trnA trnR
1–66 64–132 b 199–267 268–1290 1301–1369 1362–1424 b 1470–1536 b 1535–3069 3070–3135 3140–3829 3834–3904 3905–3971 3972–4133 4127–4803 4804–5592 5602–5666 5667–6018 6019–6083 6091–6154
−3 66 0 10 −8 45 −2 0 4 4 0 0 −7 0 9 0 0 7 11
– – – ATT – – – TCG – ATG – – GTG ATG ATG – ATT – –
– – – TAA – – – TA – TAA – – TAA TA TAA – T – –
trnN trnS(AGY) trnE trnF Nad5 trnH Nad4 Nad4L trnT trnP Nad6 Cob trnS (UCN) Nad1 trnL(CUN) rrnL trnV rrnS CR
6166–6230 6231–6298 6299–6365 6384–6448 b 6449–8168 b 8184–8249 b 8250–9590 b 9584–9880 b 9883–9947 9948–10,013 b 10,016–10,539 10,540–11,674 11,675–11,741 11,757–12,696 b 12,707–12,771 b 12,772–14,104 b 14,105–14,176 b 14,177–14,966 b 14,967–15,915
0 0 18 0 15 0 −7 2 0 2 0 0 15 10 0 0 0 0 0
– – – – ATT – ATG ATG – – ATT ATG – ATA – – – – –
– – – – T – TAG TAA – – TA T – T – – – – –
a b c
Intergenic nucleotides observed after the indicated gene. Positive numbers identify spacers (in bp), negative numbers identify gene overlaps. Encoded on the N strand (Simon et al., 1994). Region sequenced in additional geographic samples and species from the dorsalis complex.
and 5-6 with ATT; nad1 with ATA: Table 1). Initiation codons ATG or ATA, encoding for Methonine, are the most typical among insects (Bae et al., 2004; Kim et al., 2005), ATT codon is less common but often observed among tephritids (Table 3). The gene for cox1 is initiated by a TCG codon, as in B. oleae and C. capitata., and the hexanucleotide ATTTAA, proposed as an initiation signal in mosquitoes and observed in most dipterans analyzed so far, is exactly adjacent to this triplet. The start codon for gene atp8 (GTG) is different from both B. oleae (ATC) and C. capitata (ATT). Canonical TAA and TAG termination codons are found in five (cox2-3, nad2 and 4L, and atp8) and one (nad4) PCGs, respectively. The remaining genes show an incomplete termination codon (TA in cox1, atp6 and nad6; T in nad3 and 5, cytb and nad1: Table 1) likely extended to TAA during the maturation of transcript, a phenomenon commonly observed in metazoan mitochondrial genes (Clary and Wolstenholme 1985; Bae et al., 2004; Junqueira et al., 2004). In four cases (cytb, nad3 and 6, atp6) a complete stop codon TAA or TAG is present on the genomic sequence, but the last one or two adenines are overlapping with the neighboring gene, and were considered to be incomplete. Considering only those three genes that strictly do not have complete termination codons in B. dorsalis, one (nad1) is in common with both B. oleae and
C. capitata, one (cox1) with B. oleae, and one (nad5) with C. capitata. This is somehow expected given the close taxonomic proximity of the three species, but not fully justified by the 10– 20% of differences observed at the nucleotide level between PCGs in the three species, suggesting some role for selection. A comparison of the length of protein coding genes among the three tephritid species show that this figure is very conserved, with a difference of 2 codons at most (nad4L: Table 3). 3.3. rRNAs and tRNAs The two genes encoding the small and the large ribosomal subunits are located between trnL(CUN) and trnV, and between trnV and the CR. The length of B. dorsalis rrnS and rrnL was determined to 790 bp and 1333 bp, respectively, based on the location of neighboring tRNAs and a comparison with other related sequences and structures (B. oleae, unpublished data). These figures are very similar to B. oleae and C. capitata and well in the range of other dipterans (Kim et al., 2005). The complete set of 22 tRNAs typical of metazoan mitochondrial genomes is present in B. dorsalis. Their secondary structures generally conform to a regular cloverleaf structure (Table 4). Anticodon sequences are the same as in B. oleae and C. capitata. In tRNA-S(AGN) eight unpaired
Table 2 Length and base composition of different genomic regions in B. dorsalis, B. oleae and C. capitata Species and accession number
mtDNA
B. dorsalis DQ845759 b B. oleae AY210702 C. capitata AJ242872
15,915 15,815 15,980
a b
In base pairs. GenBank accession number.
Size
a
CR
rRNAs
tRNAs
PCGs
AT%
Size
AT%
Size
AT%
Size
AT%
Size
AT%
73.6 72.6 77.5
949 949 1004
88.1 86.9 91.1
2123 2116 2123
77.8 77.1 80.2
1467 1484 1472
75.2 75.1 76.8
11,185 11,188 11,272
71.2 70.1 75.5
70
D.J. Yu et al. / Gene 396 (2007) 66–74
Table 3 Position, length, initiation and termination codons in B. dorsalis, B. oleae and C. capitata PCGs B. dorsalis
B. oleae
C. capitata
Gene
From
To
Length
Start
Stop
From
To
Length
Start
Stop
From
To
Length
Start
Stop
nad2 cox1 cox2 atp8 atp6 cox3 nad3 nad5 nad4 nad4L nad6 cob nad1
268 1535 3140 3972 4127 4804 5667 6449 8250 9584 10,016 10,540 11,757
1290 3069 3829 4133 4803 5592 6018 8168 9590 9880 10,539 11,674 12,696
1023 1535 690 162 677 789 352 1720 1341 297 524 1135 940
ATT TCG ATG GTG ATG ATG ATT ATT ATG ATG ATT ATG ATA
TAA TA TAA TAA TA TAA T T TAG TAA TA T T
206 1428 3033 3864 4019 4696 5559 6357 8156 9490 9922 10,446 11,663
1228 2961 3722 4025 4696 5484 5921 8075 9496 9786 10,446 11,582 12,602
1023 1534 690 162 678 789 354 1719 1341 297 525 1137 940
ATT TCG ATG ATC ATG ATG ATC ATT ATG ATG ATC ATG ATG
TAA T TAA TAA TAA TAA TAG TAA TAA TAA TAA TAG T
293 1527 3138 3974 4129 4806 5666 6452 8260 9600 10,026 10,550 11,767
1315 3062 3824 4135 4806 5594 6019 8168 9600 9890 10,550 11,686 12,706
1023 1536 687 162 678 789 354 1717 1341 291 525 1137 940
ATT TCG ATG ATT ATG ATG ATA ATT ATG ATG ATT ATG ATT
TAA TAA TAA TAA TAA TAA TAA T TAA TAA TAA TAG T
nucleotides appear to replace the DHU arm (Table 4), as in B. oleae and C. capitata (personal observation based on sequence AJ242872). In B. dorsalis and C. capitata, but not B. oleae, an alternative structure with a paired stem can be drawn by allowing the suboptimal pairing [CGT]TC[AAG]. 3.4. Intra-molecular recombination and the origin of the intergenic spacers The longer intergenic spacer (66 bp: Table 1) shows no significant similarity (e-value b 0.01) to other known sequences in the GenBank database or to other regions in the B. dorsalis genome, and therefore it is impossible to formulate hypotheses on its origin. On the other hand, the smaller spacer sequence (45 bp: Table 1) has a clear counterpart in the CR, where the first 33/45 bases of the spacer are found exactly repeated (Blast evalue: e− 5). Sequences encompassing the two repeated regions have been independently confirmed by comparison with GenBank entries AB191470–3, that encompasses the whole CR in four B. dorsalis s.s. specimens from different locations (Nakahara et al., unpublished data), and by re-sequencing the area around trnW/C/Y in additional specimens. In detail, bases 1425–1457 correspond to bases 15309– 15341 in inverted orientation, while the remaining 12 bp of the spacer (bases 1458–1469) have no resemblance with the colinear bases 15297–15308 in the CR. Interestingly, this repeated segment lays in regions capable of forming extremely rich secondary structure motifs (Fig. 3). Bases 1425–1469 are surrounded by two tRNA genes, and are predicted to form a small stem (5 bp) themselves (total dG = − 11.62). Bases 15297– 15341 are predicted to form a long stem structure together with a partially complementary neighboring sequence (dG = − 19.76). Secondary structures, formed by tRNA genes or self complementary sequences, are believed to play a major role as hotspots for recombination (Stanton et al., 1994). We hypothesize that this duplicated sequence could mark a recent recombination event. Intra-mitochondrial recombination has been described in recent years for mitochondrial genomes (reviewed in Dowton and Campbell, 2001). This process seems to be responsible for the formation of the molecular chimeras
observed in human mitochondria (Kajander et al., 2000) and in the somatic tissue of the Manila clam (Passamonti et al., 2003), as well as for the apparently abnormal rate of gene rearrangement in certain hymenopteran lineages (Dowton and Austin, 1999). The alternative possibility of the duplication of a large fragment encompassing trnY to CR and subsequent loss, in one copy, of all 7 genes but not of the repeated sequence (duplication and random loss model: Boore, 2000) seems highly unlikely in this case as there is no other trace of this supposed duplication anywhere in the region. With regards to the direction of the recombination, not enough comparative data are available to draw firm conclusions, but since the spacer copy of the sequence is absent in the closely related B. oleae and C. capitata, while the CR copy is present (25 out of 33 identical bases) at least in B. oleae, we hypothesize that the CR copy is the original one. This could have been duplicated as a second insertion between trnC and trnY. In order to better determine the lineage where the recombination took place, the occurrence of both copies of the repeated sequence was explored across available Bactrocrea species. The CR copy is clearly recognizable in B.carambolae (30 bp identity, Blast e-value b 0.005) and B. papayae (28/29 bp identity, Blast e-value b 0.005), but also, with decreasing similarity, across all Bactrocera species for which sequence data is available (AF033920–34: Hoeben and Ma, unpublished data), including the aforementioned B. oleae (Nardi et al., 2003). This is consistent with the hypothesis that this is the original copy. The spacer copy is present with minor modifications (1/3 changes) across all geographic samples and species studied from the dorsalis complex (sequences obtained in this study: B. dorsalis s.s., B. papayae, B. carambolae, B. philippinensis and B. occipitalis) but absent in B. oleae. Therefore the recombination event can be placed, to our best knowledge, on the lineage leading to B. dorsalis after the separation with B. oleae, but before the diversification of the B. dorsalis species group. Additional sequence data from species of Bactrocera inside and outside the dorsalis complex would be useful to pinpoint this event with more precision.
Table 4 Sequence, with secondary structure landmarks, of B. dorsalis tRNAs Acceptor arm
trnI trnQ trnM trnW trnC trnY trnL(UUR) trnK trnD trnG trnA trnR trnN trnS(AGY) trnE trnF trnH trnT trnP trnS(UCN) trnL(CUN) trnV
[AATGAAT] [TATATTT] [AAAAAGA] [AAGGCTT] [GGCTTTA] [GATTAAA] [TCTAATA] [CATTAGA] [AAAAAAT] [ATCTATA] [AGGGTTG] [GAATATG] [TTAATTG] [GAAGTAT] [GTTTATA] [ATTCAAG] [ATCTAAA] [GTTTTAA] [CAGGAGG] [AGTTAAT] [ACTATTT] [CAATTTA]
a
DHU arm TG TG TA TA TA TG TG TG TA TA TA AA AA GA TA TA TA TA TA GA TG AA
[CCT] GATAAA [GTGTA] TGA [AGCT] AATTA [AGTT] AATTA [GTCA] ATAA [GCT] GAAGTTTA [GCA] GATTAG [ACT] GAAAGCA [GTTA] AAAAA [GTAT] ATAA [GTTA] ATTA [GCGA] TTTA [GCC] AAAAAGA CGTTCAAG a [GTTT] AATAA [GCT] TAAAATAG [GTTT] ATTTA [GTTT] AATAA [GTTT] ATTTA [GCTT] GAAC [GCA] GATTAG [GCTT] ATTTAGTA
Anticodon arm [AGG] [TGCAC] [AGCT] [AACT] [TGAC] [GGC] [TGC] [AGT] [TAAC] [GTAT] [TAAC] [TTGC] [GGC] [AAAC] [AGC] [AAAT] [AAAC] [AAAT] [AAGC] [TGC] [AAGC]
G A A A A G A A A A A A T A A A A A A A A A
[TTACC] [AAAGT] [CTGGG] [ATAGC] [TTAGA] [ATAGA] [ATGGA] [CTGGT] [TTAGT] [TTTGA] [TTTGA] [ATTAG] [TATCA] [AAAAG] [TTACA] [TAACA] [TTGAT] [TTGGT] [TTAAT] [TATGT] [ATAAA] [TTTCA]
TT TT TT CT CT CT TT CT AT CT TT TT CT CT TT CT TT CT TT TT TT TT
GAT TTG CAT TCA GCA GTA TAA CTT GTC TCC TGC TCG GTT GCT TTC GAA GTG TGT TGG TGA TAG TAC
TψC arm AG AT AC AA AT AA GC AA AA AA AT AC AA AA AT GA GT AA GG AA AA AT
[GGTAA] [ACTTT] [CCCAT] [GCTAT] [TCTAA] [TCTAT] [TCCAT] [ACCAT] [ACTAA] [TCATA] [TCAAA] [CTAAT] [TGATA] [CTTTT] [TGTAA] [TGTTA] [GTCAA] [ACCAA] [ATTAA] [ACATA] [TTTAT] [TGAAA]
ATAAT TAGA TTAT AAAT AGGA TTAT ATAT CTTAT AATT AGGT AAGT CTTA GAACT ATCTTT AAAT GGGT TGAA AAAT TGAT AGAT TTAT AGAT
[GCAAT] [AATAG] [AAAGG] [ATAAG] [GTAA] [AAGAA] [AAAGTA] [AGTAA] [ATTAAA] [CTACT] [ATTGA] [GGTAT] [GAGA] [TAATGG] [AAAAT] [AATT] [ATGA] [AAGAT] [AAAG] [AGAAA] [GTAAT] [TTTTG]
TAGT TTTAATT TTCTAAT TATAAAT TAAA TTTA TTT ATTAGCAC CTA AATT AATA TTT TTGA TTAAACT TTTCT GAAT GGTTTA TTT TTATTT TCAACC TTAT TGCAAAT
Acceptor arm [ATTGC^ [CTATT^ [CCTTT^ [CTTTT^ [TTAC^ [TTCTT^ [TACTTT^ [TTACT^ [TTTAAT^ [AGTAG^ [TCAAT^ [ATGCC^ [TCTC^ [CCATTT^ [ATTTT^ [AATT^ [TCAT^ [GTCTT^ [CTTT^ [TTTCT^ [ATTAC^ [CAATA^
ATTCATT] AAATATA] TCTTTTT] AAGCCTT] TAAGGCT] TTTAATC] TATTAGA] TCTAATG] ATTTTTT] TATAGAT] CTACCTT] CTTATTC] CAATTAA] GTACTTC] TATAAAT] CTTGGAT] TTTAGAT] TTAAAAC] TCTCTTG] ATTGACT] AAATAGT] TAAATTG]
A A A A T A A G G A A T G T T G C T A T A A
D.J. Yu et al. / Gene 396 (2007) 66–74
tRNA
Non paired nucleotides replacing the DHU arm.
71
72
D.J. Yu et al. / Gene 396 (2007) 66–74
Fig. 3. Hypothetical secondary structures in the regions surrounding the repeated sequence (in bold) between trnC and trnY (panel A) and in the CR (panel B).
3.5. Levels of variability and informativeness of tephritid mitochondrial genome sequences Comparing the nucleotide sequence of the 13 PCGs, 2 rRNA subunits and the CR between B. dorsalis and the other two published tephritid mitochondrial genomes (Table 5), the highest similarity is observed, for most genes, between B. dorsalis and B. oleae. This is as expected given the taxonomic proximity of these species, that belong to the same genus, compared to C. capitata, which is in a different tribe within the subfamily Dacinae (White and Elson-Harris, 1992). It is also consistent with phylogenetic analyses incorporating these three species (Smith et al, 2003). The most conserved sequences appear to be the two rRNA genes, with identities above 92% between the two Bactrocera species. The cytochrome oxidase and the ATPase/NADH dehydrogenase complexes of genes are slightly less conserved (85–86% and 81–91% identity, respectively). The CR, with 77% identity between the two Bactrocera species, is the most variable region. Amino-acid identities (Table 5) follow the same trend, with slightly higher values for the cytochrome oxidase complex (97–98% identity) than for the ATPase and NADH dehydrogenase complexes (88–97%). The percentage of identity across gene regions for the three tephritid species is rather similar, meaning that at this shallow level of divergence the observed differences in variability across genes/ genomic regions are limited. This observation does not contradict the notion that some genes in the mitochondrial genome, namely the cytochrome oxidase complex, are more conservative than others, such as the NADH dehydrogenase and ATPase complexes (Simon et al, 1994), but rather indicates that the limited number of differences observed in comparisons between congeneric species are not sufficient for this difference to arise. This parallels what has been observed in a comparison between two genomes from different geographic samples of the same species (B. oleae), where the distribution of mutations across genomic regions did not significantly depart from randomness (Nardi et al., 2003). These estimates can provide useful guidelines for gene selection if sequences from one or a limited number of genes are
to be applied for studying evolutionary phenomena at the species/genus level. Two additional factors important for choosing a gene to be used as a molecular marker are the presence of mitochondrial pseudogenes in the nuclear genome and the possibility of efficiently modeling sequence evolution parameters. One would tend to exclude the nad2/cox1 region in any case, as the presence of pseudogenes was reported in the closely related B. oleae (Nardi et al., 2003). If sequences are to be used in a phylogenetic framework, i.e. applying methods that involve some model of sequence change through time, the CR is unlikely to be useful due to the presence of stretches of identical nucleotides, short microsatellites, and other low complexity sequences, that may complicate comparative sequence analysis. The most variable PCGs would be more appropriate and, taking into account only genes that provide a significant amount of sequence (N1 kb), nad4–5 would be the most suitable markers. Table 5 Percentage of identity at the nucleotide level between B. dorsalis, B. oleae and C. capitata calculated in different gene/gene regions. Corresponding amino acid identities are given in parentheses Region
Bd a/Bo b
Bd/Cc c
Bo/Cc
atp6 atp8 cox1 cox2 cox3 cob nad1 nad2 nad3 nad4 nad4L nad5 nad6 rrnS rrnL CR
85.69(96.89) 87.65(90.57) 85.34(98.04) 86.52(96.89) 86.82(98.47) 85.58(97.62) 88.62(94.57) 81.72(88.69) 84.18(92.31) 88.07(95.07) 91.25(98.98) 85.99(92.57) 81.52(81.98) 93.44(N/A) 92.65(N/A) 77.19(N/A)
84.22(97.78) 77.78(79.25) 85.48(97.45) 82.75(94.62) 88.09(96.56) 83.63(97.09) 87.34(92.83) 80.35(85.80) 84.46(94.02) 85.61(91.48) 85.19(91.67) 85.81(90.21) 78.29(79.07) 89.13(N/A) 90.08(N/A) 65.88(N/A)
81.42(96.44) 80.25(79.25) 84.51(98.23) 82.90(94.71) 84.28(96.95) 85.84(95.77) 87.55(92.18) 78.98(84.08) 82.20(89.74) 84.71(88.57) 85.19(92.71) 85.57(89.03) 78.86(79.89) 87.39(N/A) 88.26(N/A) 65.65(N/A)
a b c
B. dorsalis. B. oleae. C. capitata.
D.J. Yu et al. / Gene 396 (2007) 66–74
If sequences are to be used to distinguish among species/ geographic groups, and therefore the sheer number of differences is at premium, the CR and/or intergenic spacers are likely the best option. Care must be taken to avoid overweighting mutations such as unit gain/loss in homopolymer runs or microsatellites that, being much more frequent than regular point mutations, would likely lead to problems of convergence. The informativeness of the CR is supported by some preliminary sequence information (Nakahara et al., unpublished data: GenBank accession nos: AB191470/3) in four B. dorsalis specimens, that display a difference of 3.4% at the nucleotide level. On the contrary, few point mutations were observed in the 45 bp intergenic spacer across five species and seven geographical samples of the dorsalis complex (this study: GenBank accession nos. EF377349–59), but an 11 bp discrete insertion appears as an autoaphomorphy of B. carambolae. This is a very promising marker for species diagnostics, and, as well as other possible discrete genomic changes, could be easily targeted in a PCR assay. Nevertheless additional population data will be needed to confirm that this insertion is fixed and exclusive for the species. Talking about fruit flies in general, the cox1 gene is today the most commonly used molecular marker in phylogenetics (Barr and McPheron, 2006; Jamnongluk et al., 2003; Smith-Caldas et al., 2001) and phylogeography (Mun et al., 2003), and the available amount of cox1 information is likely to arise dramatically with the advent of DNA barcoding projects (Hebert et al., 2003). Focusing on species discrimination for import/export controls and general biosecurity, most available tests are based on PCR– RFLP assays targeting sequences from the nuclear ribosomal cluster (Armstrong et al., 2000) and mitochondrial CR, rrnS and rrnL, but none of these tests are currently capable of discriminating between all the key species, and attempts are being undertaken do develop more sensitive and efficient tests, i.e. taking advantage of barcoding data (Armstrong and Ball, 2005) or in the form of a micro-array biochip (Frey and Pfunder, 2006). Nevertheless, both phylogeography/population genetics studies and tests for species diagnosis heavily depend on the availability of basic comparative data on intra and inter-specific variability, and on the informativeness (that is the presence of discriminating mutations) of the gene to be targeted in the assay. With regard to this consideration, complete mitochondrial genome sequences are likely to play a major role in the near future (Cameron et al., 2007), and the completion of the genome sequencing of B. dorsalis is a significant piece added to the puzzle. Acknowledgements We wish to thank all colleagues that helped with sample collection and determination: Mr Chenzhilin, Dr. Haymer, Mr Jiangxiaolong, Dr Muraji and Mr Yejun. We also wish to thank the Editor and three anonymous Reviewers for their useful comments and suggestions, and Prof. Baldari for her assistance. This research was supported by funds of the 2008 Beijing Olympic Game (2004BA904B06) and 973 program (2002CB111405) of Ministry of Science and Technology of P. R. China, and the National Natural Sciences Foundation of
73
China (No. 30471162). F.N. was supported by a grant of the Monte dei Paschi di Siena Foundation. References Allwood, A.J., et al., 1999. Host plant records for fruit flies (Diptera: Tephritidae) in South-East Asia. Raffles Bull. Zool. 7, 92 Supp. Altschul, S.F., et al., 1997. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 25, 3389–3402. Armstrong, K.F., Ball, S.L., 2005. DNA barcodes for biosecurity: invasive species identification. Philos. Trans. R. Soc. Lond., B. Biol. Sci. 360, 1813–1823. Armstrong, K.F., Cameron, C.M., Frampton, E.R., 2000. Fruit fly (Diptera: Tephritidae) species identification: a rapid diagnostic technique for quarantine application. Bull. Entomol. Res. 87, 111–118. Avise, J.C., 2000. Phylogeography, the History and Formation of Species. Harvard University Press, Cambridge, Massachussets, USA. Bae, J.S., Kim, I., Sohn, H.D., Jin, B.R., 2004. The mitochondrial genome of the firefly, Pyrocoelia rufa: complete DNA sequence, genome organization, and phylogenetic analysis with other insects. Mol. Phylogenet. Evol. 32, 978–985. Ballard, J.W.O., 2000a. Comparative genomics of mitochondrial DNA in members of the Drosophila melanogaster subgroup. J. Mol. Evol. 51, 48–63. Ballard, J.W.O., 2000b. Comparative genomics of mitochondrial DNA in Drosophila simulans. J. Mol. Evol. 51, 4–75. Ballard, J.V.O., Rand, D.M., 2005. The population biology of mitochondrial DNA and its phylogenetic implications. Annu. Rev. Ecol. Evol. Syst. 36, 621–642. Ballard, J.W.O., Whitlock, M.C., 2004. The incomplete natural history of mitochondria. Mol. Ecol. 13, 729–744. Barr, N.B., McPheron, B.A., 2006. Molecular phylogenetics of the genus Ceratitis (Diptera:Tephritidae). Mol. Phylogenet. Evol. 38, 216–230. Boore, J.L., 1999. Animal mitochondrial genomes. Nucleic Acids Res. 27, 1767–1780. Boore, J.L., 2000. The duplication/random loss model for gene rearrangement exemplified by mitochondrial genomes of deuterostome animals. In: Sankoff, D., Nadeau, J.H. (Eds.), Comparative genomics. Kluwer Academic Publishers, Dordrecht. Brown, W.M., Prager, E.M., Wang, A., Wilson, A.C., 1982. Mitochondrial DNA sequences of primates: tempo and mode of evolution. J. Mol. Evol. 18, 225–239. Cameron, S.L., Lambkin, C., Barker, S.C., Whiting, M.F., 2007. A mitochondrial genome phylogeny of Diptera: whole genome sequence data accurately resolve relationships over broad timescales with high precision. Syst. Ent. 32, 40–59. Carroll, L.E., White, I.M., Freidberg, A., Norrbom, A.L., Dallwitz, M.J., Thompson, F.C., 2002 onwardss. Pest Fruit Flies of the world. http://delta-intkey.com. Clarke, A.R., et al., 2005. Invasive phytophagous pest arising through a recent tropical evolutionary radiation: the Bactrocera dorsalis complex of fruit flies. Annu. Rev. Entomol. 50, 293–319. Clary, D.O., Wolstenholme, D.R., 1985. The mitochondrial DNA molecule of Drosophila yakuba: nucleotide sequence, gene organization, and genetic code. J. Mol. Evol. 22, 252–271. Crease, T.J., 1999. The complete sequence of the mitochondrial genome of Daphnia pulex (Cladocera: Crustacea). Gene 233, 89–99. Dowton, M., Austin, A.D., 1999. Evolutionary dynamics of a mitochondrial rearrangement ‘hotspot’ in the Hymenoptera. Mol. Biol. Evol. 16, 298–309. Dowton, M., Campbell, N.J.H., 2001. Intramitochondrial recombination — is it why some mitochondrial genes sleep around? TREE 16, 269–271. Drew, R.A.I., Hancock, D.L., 1994. The Bactrocera dorsalis complex of fruit flies in Asia. Bull. Entomol. Res. Supp. Ser. 2, 1–68. Feijao, P.C., Neiva, L.S., de Azeredo-Espin, A.M., Lessinger, A.C., 2006. AMiGA: the arthropodan mitochondrial genomes accessible database. Bioionformatics 22, 902–903. Follett, P.A., Neven, L.G., 2006. Current trends in quarantine entomology. Annu. Rev. Entomol. 51, 359–385. Frey, J.E., Pfunder, M., 2006. Molecular techniques for identification of quarantine insects and mites: the potential of microarrays. In: Rao, J.R., Fleming, C.C., Moore, J.E. (Eds.), Molecular Diagnostics: Current Technology and Applications. ch. 6.
74
D.J. Yu et al. / Gene 396 (2007) 66–74
Hassanin, A., Leger, N., Deutsch, J., 2005. Evidence for multiple reversals of asymmetric mutational constraints during the evolution of the mitochondrial genome of metazoa, and consequences for phylogenetic inferences. Syst. Biol. 54, 277–298. Hebert, P.D., Ratnasingham, S., deWaard, J.R., 2003. Barcoding animal life: cytochrome c oxidase subunit 1 divergences among closely related species. Proc. Biol. Sci. 270, S96–S99. Hou, B.H., 2005. Risk analysis for Oriental Fruit Fly, Bactrocera dorsalis (Hendel) (Diptera: Tephritidae). PhD dissertation, Sun Yat-Sen University. 1–103. Jamnongluk, W., Baimai, V., Kittayapong, P., 2003. Molecular evolution of tephritid fruit flies in the genus Bactrocera based on the cytochrome oxidase I gene. Genetica 119, 19–25. Junqueira, A.C., et al., 2004. The mitochondrial genome of the blowfly Chrysomya chloropyga (Diptera: Calliphoridae). Gene 339, 7–15. Kajander, O.A., et al., 2000. Human mtDNA sublimons resemble rearranged mitochondrial genomes found in pathological states. Hum. Mol. Genet 9, 2821–2835. Kim, I., et al., 2005. The complete nucleotide sequence and gene organization of the mitochondrial genome of the oriental mole cricket, Gryllotalpa orientalis (Orthoptera: Gryllotalpidae). Gene 353, 155–168. Mun, J., Bohonak, A.J., Roderick, G.K., 2003. Population structure of the pumpkin fruit fly Bactrocera depressa (Tephritidae) in Korea and Japan: Pliocene allopatry or recent invasion? Mol. Ecol. 12, 2941–2951. Nardi, F., Carapelli, A., Dallai, R., Frati, F., 2003. The mitochondrial genome of the olive fly Bactrocera oleae: two haplotypes from distant geographical locations. Insect Mol. Biol. 12, 605–611. Nardi, F., Carapelli, A., Dallai, R., Roderick, G.K., Frati, F., 2005. Population structure and colonization history of the olive fly, Bactrocera oleae (Diptera, Tephritidae). Mol. Ecol. 14, 2729–2738. Passamonti, M., Boore, J.L., Scali, V., 2003. Molecular evolution and recombination in the gender-associated mitochondrial DNAs of the Manila clam Tapes philippinarum. Genetics 164, 603–611. Rand, D., 2001. Mitochondrial genomics flies high. Trends Ecol. Evol. 16, 2–4.
Scheffler, I.E., 1999. Mitochondria. John Wiley & Sons, Inc., New York. Simon, C., Frati, F., Beckenbach, A., Crespi, B., Liu, H., Flook, P., 1994. Evolution, weighring, and phylogenetic utility of mitochondrial gene sequences and a compilation of conserved polymerase chain reaction primers. Ann. Entomol. Soc. Am. 87, 651–701. Smith-Caldas, M.R.B., McPheron, B.A., Silva, J.G., Zucchi, R.A., 2001. Phylogenetic relationships among species of the fraterculus group (Anastrepha: Diptera: Tephritidae) inferred from DNA sequences of mitochondrial cytochrome oxidase I. Neotropical Entomol. 30, 565–573. Smith, P.T., Kambhampati, S., Armstrong, K.A., 2003. Phylogenetic relationships among Bactrocera species (Diptera: Tephritidae) inferred from mitochondrial DNA sequences. Mol. Phylogenet. Evol. 26, 8–17. Spanos, L., Koutroumbas, G., Kotsyfakis, M., Louis, C., 2000. The mitochondrial genome of the mediterranean fruit fly, Ceratitis capitata. Insect Mol. Biol. 9, 139–144. Stanton, D.J., Daehler, L.L., Moritz, C.C., Brown, W.M., 1994. Sequences with the potential to form stem-and-loop structures are associated with coding region duplications in animal mitochondrial DNA. Genetics 137, 233–241. White, I.M., Elson-Harris, M.M., 1992. Fruit Flies of Economic Significance: Their Identification and Bionomics. CAB International, Wallingford, UK. Wolstenholme, D.R., 1992. Animal mitochondrial DNA: structure and evolution. Int. Rev. Cytol. 141, 173–216. Yu, D.J., Zhang, G.M., Chen, Z.L., Zhang, R.J., Yin, W.Y., 2004. Rapid identification of Bactrocera latifrons (Dipt. Tephritidae) by real-time PCR using SYBR Green chemistry. J. Appl. Entomol. 128, 670–676. Yukuhiro, K., Sezutsu, H., Itoh, M., Shimizu, K., Banno, Y., 2002. Significant levels of sequence divergence and gene rearrangements have occurred between the mitochondrial genomes of the wild Mulberry Silkmoth, Bombyx mandarina, and its close relative, the domesticated Silkmoth, Bombyx mori. Mol. Biol. Evol. 19, 1385–1389. Zuker, M., 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415.