Complete chloroplast genome sequences of Praxelis (Eupatorium catarium Veldkamp), an important invasive species

Complete chloroplast genome sequences of Praxelis (Eupatorium catarium Veldkamp), an important invasive species

Gene 549 (2014) 58–69 Contents lists available at ScienceDirect Gene journal homepage: www.elsevier.com/locate/gene Complete chloroplast genome seq...

2MB Sizes 9 Downloads 106 Views

Gene 549 (2014) 58–69

Contents lists available at ScienceDirect

Gene journal homepage: www.elsevier.com/locate/gene

Complete chloroplast genome sequences of Praxelis (Eupatorium catarium Veldkamp), an important invasive species Ying Zhang, Lei Li 1, Ting Liang Yan, Qiang Liu ⁎ College of life science, Hainan Normal University, Haikou 571158, China

a r t i c l e

i n f o

Article history: Received 27 January 2014 Received in revised form 5 June 2014 Accepted 12 July 2014 Available online 15 July 2014 Keywords: Praxelis Eupatorieae Asteraceae Chloroplast genome Invasive

a b s t r a c t Praxelis (Eupatorium catarium Veldkamp) is a new hazardous invasive plant species that has caused serious economic losses and environmental damage in the Northern hemisphere tropical and subtropical regions. Although previous studies focused on detecting the biological characteristics of this plant to prevent its expansion, little effort has been made to understand the impact of Praxelis on the ecosystem in an evolutionary process. The genetic information of Praxelis is required for further phylogenetic identification and evolutionary studies. Here, we report the complete Praxelis chloroplast (cp) genome sequence. The Praxelis chloroplast genome is 151,410 bp in length including a small single-copy region (18,547 bp) and a large single-copy region (85,311 bp) separated by a pair of inverted repeats (IRs; 23,776 bp). The genome contains 85 unique and 18 duplicated genes in the IR region. The gene content and organization are similar to other Asteraceae tribe cp genomes. We also analyzed the whole cp genome sequence, repeat structure, codon usage, contraction of the IR and gene structure/organization features between native and invasive Asteraceae plants, in order to understand the evolution of organelle genomes between native and invasive Asteraceae. Comparative analysis identified the 14 markers containing greater than 2% parsimony-informative characters, indicating that they are potential informative markers for barcoding and phylogenetic analysis. Moreover, a sister relationship between Praxelis and seven other species in Asteraceae was found based on phylogenetic analysis of 28 protein-coding sequences. Complete cp genome information is useful for plant phylogenetic and evolutionary studies within this invasive species and also within the Asteraceae family. © 2014 Elsevier B.V. All rights reserved.

1. Introduction Chloroplasts (cps) are essential plant cell organelles that conduct photosynthesis in the presence of sunlight. Most of the sequenced chloroplast (cp) genomes range from 120 to 160 kb in length and contain 30% to 40% GC content. The quadripartite organization is shared by almost all cp genomes, consisting of a large-single-copy region (LSC; 80–90 kb) and a small-single-copy region (SSC; 16–27 kb), as well as two copies of inverted repeats (IRs) that are approximately 20 to 28 kb in size (Serrano et al., 2013; Zhang et al., 2012). The gene content and structure of angiosperm cp genome are highly conserved (Chumley et al., 2006; Yang et al., 2010). Expansion and contraction of the IR as well as gene and intron losses have been documented in a wide range of angiosperms (Chang et al., 2006; Hansen et al., 2007). The highly Abbreviations: cp, chloroplast genome; SSC, small single-copy; LSC, large single-copy; IRs, inverted repeats; bp, base pair(s); rRNA, ribosomal RNA; tRNA, transfer RNA; EDTA, Ethylene Diamine Tetraacetic Acid; BSA, bovine serum albumin; Tris–HCl, tris(hydroxymethyl) aminomethane; ds, double strand(ed); MP, maximum parsimony; ML, maximum likelihood; A, adenosine; C, cytidine; G, guanosine; T, thymidine; MPTs, the congruent trees; −lnL, log likelihood; CDS, coding sequence; SIs, small inversions; SSRs, simple sequence repeats. ⁎ Corresponding author. E-mail addresses: [email protected] (Y. Zhang), [email protected] (Q. Liu). 1 Equal contributor.

http://dx.doi.org/10.1016/j.gene.2014.07.041 0378-1119/© 2014 Elsevier B.V. All rights reserved.

conservative nature and slow evolutionary rate of the cp genome demonstrated that it was uniform enough to perform comparative studies across different species but sufficiently divergent to capture evolutionary events, which makes it a suitable and invaluable tool for molecular phylogeny and molecular ecology studies (Nie et al., 2012). We have recently witnessed a dramatic increase in the number of complete cp genomes; to date, there are 329 complete cp genomes that were deposited in the GenBank Organelle Genome Resource. The majority of angiosperm cp genomes are highly conserved (Chumley et al., 2006). However, in Asteraceae, the gene order in the LSC region is reversed compared with Nicotiana tabacum (Liu et al., 2013). The family Asteraceae is the second largest family of plants in the world, consisting of 2400 species distributed in 170 genera (Maia et al., 2011). Among these, seven complete cp genomes were found in the GenBank, including Guizotia abyssinica (Dempewolf et al., 2010), Helianthus annuus (Timme et al., 2007), Parthenium argentatum, Lactuca sativa (Timme et al., 2007), Jacobaea vulgaris (Kumar et al., 2009), Artemisia frigida (Liu et al., 2013) and Ageratina adenophora (Nie et al., 2012). The complete cp genomes of only three invasive species identified the complete cp genome, although there are more invasive plants that belong to Asteraceae such as A. adenophora (Nie et al., 2012), J. vulgaris (Doorduin et al., 2011), P. argentatum (Kumar et al., 2009), Eupatorium odoratum (Rani and Abraham, 2006) and Wedelia chinensis (Martin et al., 2003).

Y. Zhang et al. / Gene 549 (2014) 58–69

Praxelis, Eupatorium catarium Veldkamp, belongs to the Eupatorieae tribe of the family Asteraceae. It is a perennial weed native to South America and has an unpleasant smell when crushed. Praxelis mainly spreads through seeds and each plant produces hundreds of small black seeds. Liu et al. reported the heavy damage to the cropland, artificial pastures and gardens (Liu et al., 2011). There are also a few studies that have investigated Praxelis characteristic such as the biological characteristics (Wei et al., 2007), stress resistance (Kan et al., 2009a, 2009b), chemical studies (Maia et al., 2011), and the allelochemical composition (Tang et al., 2011). However, genetic diversity and population evolution have been previously reported. Here, we reported the complete cp genome sequences of E. catarium. The chloroplast genome sequences will provide helpful genetic tools to conduct population studies of

59

E. catarium and help to determine the genetic and evolutionary mechanisms of the alien species invasion. 2. Materials and methods 2.1. Plants, materials and cp DNA isolation Fresh Praxelis leaves were collected in Haikou City (N 20° 1′, E 110°19′) in Hainan Province, China. To remove starch and sugar from the cells, the fresh leaves were kept in the dark for 48 h at 0 °C prior to organelle isolation. The leaf tissues were ground using a conventional blender and sorbitol/TE isolation buffer (0.35 M sorbitol, 50 mM Tris– HCl, 5 mM EDTA, pH 8.0, 0.1% BSA, 0.1% 2-mercaptoethanol). The

Fig. 1. The mapped E. catarium circular chloroplast genome.

60

Y. Zhang et al. / Gene 549 (2014) 58–69

homogenate was filtered through two layers of Mira cloth (Calbiochem, Germany) and centrifuged at 1000 g for 15 min at 4 °C. The intact cp organelles were purified using sucrose step gradient centrifugation (Chumley et al., 2006). High purity cp organelles were obtained from the 52–30% sucrose interface. Chloroplast organelles were collected from a total of 12 sucrose gradient tubes in 50 ml volumes. After carefully washing the cp organelles in wash buffer (0.35 M sorbitol, 50 mM Tris–HCl, 5 mM EDTA, pH 8.0, 0.1% BSA), cp DNA was isolated from lysed chloroplasts using ultracentrifugation in a cesium chloride/ethidium bromide gradient (Yi and Kim, 2012). The cp DNA quality was analyzed on a 1% agarose gel following Bam HI and Sac I restriction enzyme digestion. 2.2. Genome sequencing and assembly The genomic DNA from chloroplasts was isolated using the Sarkosyl method (Chang et al., 2006). A short-insert sequence library was constructed following the manufacturer's protocol (Illumina, USA). Chloroplast DNA (5 mg) was fragmented using dsDNA fragmentase (NEB, USA) at 37 °C for 30 min, and the fragmented DNA was purified using MinElute column (Qiagen, Germany) and eluted in 30 ml elution buffer. T4 DNA polymerase, Klenow polymerase and T4 polynucleotide kinase (Takara, Japan) were then added to blunt the DNA fragments at 20 °C for 30 min. After purification, an A-tailing was performed at the 3′ end of the DNA fragments using a Klenow fragment and then adaptors (SEQ 6 + 7) were ligated to the end of the DNA fragments using T4 DNA ligase. The adaptor-ligated DNA was purified using MinElute column and DNA was eluted in 10 ml ddH2O. DNA fragments ranging between 200 and 500 bp were recovered from agarose gel using a gel extraction kit (Tiangen, China) and amplified through PCR for construction of a sequencing library. A single lane of one flow cells was used for sequencing performed on the Illumina GAII, according to manufacturer's instructions, at Beijing Genomics Institute (BGI) in Shenzhen, China. The sequencing was carried out as a single end run of 51 bp. Further image analysis and base calling were performed using the Illumina Pipeline 1.3.2 (Nie et al., 2012). 2.3. PCR-based assembly validation and genome annotation To avoid assembly errors from homopolymer runs and to acquire a high quality complete cp genome sequence, we designed four pairs of primers to confirm four junctions between the IRs and SSC/LSC regions (Table S1). The cp genome was annotated using the program DOGMA (Dual Organellar GenoMe Annotator) (Yi and Kim, 2012) coupled with manual corrections for start and stop codons. Protein-coding genes were identified using the plasmid/bacterial genetic code. Codon usage was predicted using CodonW (http://codonw.sourceforge.net/). The annotated GenBank files of the E. catarium cp genome were used to draw gene maps using the Organellar Genome DRAW tool (OGDRAW) (Hansen et al., 2007). The maps were then examined for further comparison of gene order and content. 2.4. Comparison of E. catarium chloroplast DNA with other analyzed Asteraceae genomes The full E. catarium cp genome compared with all other complete Asteraceae cp genomes (including H. annuus, NC007977; L. sativa, DQ383816; P. argentatum, GU120098; G. abyssinica, EU549769, J. vulgaris, HQ234669; A. adenophora, JF826503; A. frigida, NC020607.1) was analyzed using the mVISTA program in Shuffle-LAGAN mode (Ebihara et al., 2008). The divergent sequences extracted from all eight cp genomes were aligned and used for marker identification using ClustalW (Thompson et al., 1994).

Table 1 Genes present in Praxelis cp genome. 1 2 3 4 5 6 7 8 9 10 11 12 13

a b c d e

Photosystem I Photosystem II Cytochrome b6/f ATP synthase Rubisco NADH oxidoreductase Large subunit ribosomal proteins Small subunit ribosomal proteins RNAP Other proteins Proteins of unknown function Ribosomal RNAs Transfer RNAs

psaA, B, C, I, J, ycf3a, ycf4 psbA, C, D, E, F, H, I, J, K, L, M, N, T, lhbA petA, B, D, G, L, N atpA, B, E, Fa, H rbcL ndhA, Bb,c, C, D, E, F, G, H, I, J, Kb rpl2b,c, 14, 16, 20, 22, 23c, 32, 33, 36 rps2, 3, 4, 7c, 8, 11, 12b,c,d, 14, 15, 16,18,19c rpoA, rpoB, C1a, C2 accD, ccsA, cemA, clpPa, matK, infA ycf1c,e, ycf2c, ycf15c, ycf68c,e rrn23c, 16c, 5c, 4.5c trnA–UGCb,c, trnC–GCA, trnD–GUC, trnF–GAAc, trnfM–CAU, trnG–UCC, trnH–GUG, trnI–CAUc, trnI–GAUc, trnK–UUU, trnL–CAAc, trnL–UAAb, trnL–UAG, trnM–CAU, trnN–GUUc, trnP–UGG, trnQ–UUG, trnR–ACGc, trnR–UCU, trnS–GCU, ctrnS–GGA, trnT–GGU, trnT–UGU, trnV–GACc, trnV–UACb, trnW–CCA, trnY–GUA, trnE–UUC

Gene containing two introns. Gene containing a single intron. Two gene copies in the IRs. Gene divided into two independent transcription units. Pseudogene.

2.5. Repeat structure and small inversion The repeat structures in the E. catarium cp genome were analyzed according to Maia et al.'s method with some modifications (Maia et al., 2011). Tandem repeats were analyzed using the Tandem repeat Finder program (Timme et al., 2007). REPuter (Castro et al., 2013) was used to identify and locate disperse repeats, including the direct (forward) and inverted (palindrome) repeats, with settings that identified areas with less than 90% repeats (hamming distance equal to 3) and repeats with a size greater than 30 bp. After program analysis, tandem repeats less than 15 bp in length and the REPuter redundant results were manually removed. To obtain the possible small inversions, the previous method was used to search IRs from 11 to 24 bp in length using REPuter, and then the candidate small inversions were collected when the repeats' distance was less than 50 bp (Yang et al., 2010). The likely secondary structures of these small inversions were evaluated by using MFOLD (version 3.2). The putative SIs were analyzed against other cp genomes by running a Blast program and collecting into a final small inversions list. 2.6. Phylogenetic analysis Sequences were aligned using ClustalW in MEGA5 (Tamura et al., 2011), and the alignment was edited manually. Phylogenetic analyses Table 2 Genes have introns. Gene

Location

ExonI bp

IntronI bp

ExonII bp

rps12 rpoC1 atpF ycf3 ndhK clpP rpl2 ndhB trnL–UAA trnV–UAC trnA–UGC

LSC-IR LSC LSC LSC LSC LSC IR IR LSC LSC IR

114 1113 159 153 714 228 348 756 37 37 37

– 664 711 754 55 623 559 670 434 573 652

249 657 123 228 126 291 384 777 50 38 37

IntronII bp

ExonIII bp

16 13 788

312 411 126

814

69

rps12 is trans-spliced gene with 5′ end exon located in the LSC region and the duplicated 3′ end exon located in IR regions.

Y. Zhang et al. / Gene 549 (2014) 58–69

61

Table 3 The codon–anticodon recognition pattern and codon usage. Amino acid

Codon

No.a

tRNA

Amino acid

Codon

No.a

Val Thr Thr

GUC ACC ACA

174 236 397

trnD–GUC

Ala Arg Gln

GCG CGA CAG

Thr Val Asn Arg Trp Stop Ser Ile Arg Asn Thr His Val Pro Pro Ser Pro Gly

ACG GUU AAC AGG UGG UAG AGC AUC AGA AAU ACU CAC GUG CCG CCA AGU CCC GGU

126 485 264 168 438 47 118 461 456 860 497 148 198 157 296 362 185 550

trnR–ACG trnN–GUU

Arg Tyr Arg Ser Pro Gly Gly Gly Glu Ser Tyr Arg Glu Ile Ala Leu Ser Met

CGC UAU CGG UCG CCU GGG GGA GGC GAG UCC UAC CGU GAA AUA GCA CUU UCA AUG

a

trnP–UGG trnL–UAG

trnH–GUG trnW–CCA

trnT–GGU

Amino acid

Codon

No.a

Amino acid

147 344 213

Stop Ile His

UAA AUU CAU

82 1002 449

trnL–UAA

101 753 118 148 397 306 668 190 306 293 178 328 865 631 395 550 406 567

Leu Ala Lys Lys Gln Phe Asp Val Cys Ala Cys Leu Leu Leu Asp Phe Stop Ile

CUA GCC AAA AAG CAA UUU GAC GUA UGC GCU UGU CUC UUG UUA GAU UUC UAA AUU

351 224 894 299 636 903 197 506 83 587 216 175 537 776 758 514 82 1002

tRNA

trnS–GGA

trnG–UCC trnV–UAC trnF–GAA trnC–GCA

Numerals indicate the frequency of usage of each codon in 24,523 codons in 87 potential protein-coding genes.

Fig. 2. Percent identity plot for comparison of eight Asteraceae chloroplast genomes using the mVISTA program.

trnfM–CAU trnM–CAU

trnL–CAA trnK–UUU trnV–GAC trnY–GUA trnA–UGC trnS–GCU trnT–UGU trnQ–UUG trnI–GAU trnE–UUC trnL–UAA

62

Y. Zhang et al. / Gene 549 (2014) 58–69

Table 4 Comparison with the homologues between the A. adenophora cp genome and Helianthus annuus (Ha), Lactuca sativa (Ls), Guizotia abyssinica (Ga), Parthenium argentatum (Pa) and Jacobaea vulgaris (Jv) by the percent identity of coding and non-coding regions. Name

ccsA–trnL–UAG trnG–UCC–trnfM–CAU rpl33–rps18 lhbA–trnG–UCC rpoC2–rps2 cemA–petA ndhG–ndhE psbK–psbI rpl16–rps3 psaI–ycf4 matK–trnK–UUU clpP matK ycf3 trnV–UAC–trnM–CAU 23RNA–trnA–UGC psbN–psbH rps15 psbH psbI rbcL ycf4 ndhK atpF rpl20 ndhI rps8 rpoA infA cemA rps14 ndhG ndhH petA rpl36 atpA rps2–atpI ndhJ petL rpl16 rps18 rps2 ycf3 psaC petD trnN–GUU–ycf1 atpI rps4 clpP ycf15 psaA atpH ndhE trnV–GAC–16SRNA pbsC lhbA atpF psaB trnA–UGC–23SRNA psbA petN ycf3 rpl23–trnL–CAU rpl2 ndhB psaI ndhB psbE rps7 clpP rps12 psbF

Type

Intergenic Intergenic Intergenic Intergenic Intergenic Intergenic Intergenic Intergenic Intron Intergenic Intergenic Intron Exon Intron Intergenic Intergenic Intron Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Exon Intergenic Exon Exon Exon Exon Exon Exon Exon Exon Intergenic Exon Exon Exon Exon Exon Exon Exon Intergenic Exon Exon Exon Exon Intergenic Exon Exon Exon Intron Exon Intron Exon Exon Exon Exon Exon Exon Exon

Start

122,960 36,117 67,747 35,725 23,195 61,844 119,172 7903 82,181 59,354 3657 70,083 2103 42,340 51,613 132,033 74,532 113,963 74,634 8259 54,970 59,744 48,833 27,834 68,472 117,785 80,727 78,230 80,369 61,154 36,530 118,641 114,333 62,084 79,794 28,314 24,099 48,277 65,842 81,770 67,916 23,388 44,010 119,882 77,262 108,161 24,314 45,463 70,706 100,162 39,196 26,222 119,383 100,707 32,449 35,536 26,828 36,966 104,175 491 10,219 43,094 86,977 85,286 96,045 59,243 96,715 64,352 97,781 71,811 69,587 64,223

End

123,084 36,301 67,915 36,045 23,387 62,083 119,382 8258 83,374 59,743 4348 70,705 3656 43,093 51,787 132,190 74,633 114,241 74,855 8420 56,394 60,298 49,546 28,244 68,852 118,285 81,128 79,243 80,602 61,843 36,832 119,171 115,514 63,043 79,907 29,840 24,313 48,753 65,937 82,180 68,221 24,098 44,135 120,127 77,807 108,487 25,057 46,068 70,996 100,353 41,448 26,467 119,688 100,940 34,879 35,724 26,986 39,170 104,326 1552 10,314 43,321 87,141 85,633 96,714 59,353 97,491 64,603 98,248 71,879 69,700 64,342

Length (bp)

125 185 169 321 193 240 211 356 1194 390 692 623 1554 754 175 158 102 279 222 162 1425 555 714 411 381 501 402 1014 234 690 303 531 1182 960 114 1527 215 477 96 411 306 711 126 246 546 327 744 606 291 192 2253 246 306 234 2431 189 159 2205 152 1062 96 228 165 348 670 111 777 252 468 69 114 120

%Identity Aa

Gu

Ls

Ha

Pa

Af

Jv

82.8 88.7 90.2 92.2 91.4 97.1 94.3 95.2 94.1 95.6 93.6 93.4 96.2 97.1 96 96.8 98 98.2 98.6 95.2 97.9 98.2 89.5 98.5 96.9 98 98 97.9 97.9 98.8 98.7 98.7 98.6 98 93.2 99 99.5 98.3 97.9 98.5 98.7 99 100 99.2 99.6 99.4 99.5 99.5 99.7 99 99.1 99.2 99.7 99.6 99.3 99.5 98.7 99.4 100 99.4 97.9 99.6 100 99.7 99.9 98.2 99.7 100 99.8 98.6 100 100

79.7 87.6 88.1 91.4 92.9 90.8 96.2 93.3 91.1 90.3 92.1 91.6 96.8 96.6 97.7 95.6 97.1 96.8 97.3 96.4 97.7 97.3 99.2 98.8 98.2 97.8 98.3 98.2 98.7 98.4 98.3 98.5 98.5 99 100 99 98.1 99 100 99 99 98.3 96.8 99.2 99.5 99.1 99.3 99.3 99.3 100 99 98.8 98.7 99.6 99.1 99.5 99.4 99.2 98.7 99.2 100 99.6 100 100 99 100 99.6 99.6 100 100 100 100

79.7 76.7 86.7 84.7 87.4 83.1 87.9 92.4 90.5 88.4 91.4 93.3 93.9 94.3 95.4 94.9 96.1 96.1 96.4 99.4 97 96 98.3 96.8 97.1 97.4 97.3 97.1 97.4 97.7 97.7 96.6 97.5 97.3 98.2 97.2 98.1 97.9 95.8 98.3 98.7 97.7 98.4 97.6 98 96.6 98.4 97.9 98.6 97.4 98.5 98.4 98 98.3 98.3 99.5 99.4 98.6 98 99.1 99 99.1 100 99.7 99.4 99.1 99.6 99.6 99.4 100 100 99.2

76 84.8 89.1 84.9 94.4 86.7 93.8 90.4 92.3 95.1 92.1 94.4 95.7 96.6 97.1 96.2 98 97.5 96.8 95.7 96.9 97.7 99 97.8 97.9 97.8 97.5 98.5 98.3 98.4 97.4 98.5 98.2 98.4 100 98.7 99.5 99 99 99 99.3 98.9 99.2 99.2 99.3 99.4 98.8 99.2 98.6 100 98.9 98.8 99 99.6 99.3 98.9 98.7 99.3 99.3 98.7 100 99.6 100 99.1 99.7 100 99.6 100 99.8 100 100 100

76 84.3 92.6 91.4 92.9 92.9 92.1 88.1 91.7 90.6 91.7 92.6 96.2 95.6 95.4 96.8 98 97.5 98.2 95.2 97.8 97.1 98.6 98.8 97.6 97.8 98.3 98.1 98.7 98.1 98.3 98.7 98.2 98.8 100 98.6 98.6 98.7 100 99 99 99.2 98.4 99.6 99.1 99.4 99.1 98.8 99.3 100 99.1 99.2 99.7 99.1 99.3 99.5 99.4 99.4 100 99.3 100 99.1 100 99.4 99.9 100 99.7 100 99.8 100 100 100

72.4 82.2 90.9 88.9 82.1 90.9 87.7 88.2 88.7 87.8 92.2 91.9 93 93.5 97.1 94.9 95.1 96.4 95.5 98.1 95.4 96 98.3 95.6 97.6 96.8 96.3 96.9 97 97 96.7 96.4 96.7 97.6 95.6 97 98.6 97.5 97.9 96.8 96.4 97.7 98.4 97.2 97.4 97.9 97.7 98 97.9 97.4 98 98.4 98.4 99.6 98.4 97.9 98.1 98.5 98 98.3 99 99.6 99.4 99.1 99.6 100 99.6 99.6 100 100 100 100

71.2 85 79.1 84.1 79.7 86.9 79.7 84.2 87.8 92.3 91.1 92.3 93.1 94.2 91.4 95.6 95.1 95 95 98.1 95.8 96.2 98.2 96.1 97.1 96.8 97 96.9 95.7 95.8 97.7 97.4 97.3 97.9 100 97.6 95 97.7 97.9 98.3 98 98.5 98.4 98 97.3 98.5 97.6 98 97.3 96.9 98.4 98.4 98.4 96.2 98.4 97.4 98.7 98 98.7 99 97.9 98.2 96.4 99.7 99.6 100 99.6 99.6 99.6 100 99.1 100

Y. Zhang et al. / Gene 549 (2014) 58–69

63

Table 5 The characters of 8 Asteraceae chloroplast genomes. Species

E. catarium

A. adenophora

P. argentatum

J. vulgaris

G. abyssinica

H. annuus

L. sativa

A. frigida

Length (bp) GC % LSC (bp) SSC (bp) IR (bp) Gene IR repeat gene CDS rRNA tRNA

151,410 37.22 85,311 18,547 23,766 138 18 85 8 37

150,689 37.5 84,829 18,359 23.755 130 18 80 8 37

152,803 37.6 84,565 19,390 24,424 129 17 87 8 37

150,686 37.56 82,855 18,277 24,777 124 18 87 8 37

151,762 37.52 83,636 18,227 24,950 132 18 85 8 37

151,104 38 83,530 18,308 24,633 138 18 85 8 43

152,772 38 84,105 18,599 25,034 128 18 84 7 37

151,076 37.48 82,740 18,394 24,971 134 18 87 8 37

using maximum parsimony (MP) and maximum likelihood (ML) were performed using MEGA5, and the parameters were the same as those described by Young et al. (2011).

2.7. Detection of polymorphic loci According to the Doorduin et al.'s method (Doorduin et al., 2011), potential microsatellite regions were tracked by looking for five or more repeats of A and T nucleotides by using MISA.

3. Results and discussion 3.1. Features of Praxelis cp genome The E. catarium cp genome is 151,410 bp in length and has a GC content of 37.22%, which is consistent with the other reported Asteraceae cp genomes (Liu et al., 2013; Nie et al., 2012). Similar to those of other angiosperms, the E. catarium cp genome maps as a circular molecule with a typical quadripartite structure: a pair of IRs (23,776 bp, covering 42.9%), LSC (18,547 bp, covering 35.4%) and SSC (85,311 bp, covering

Fig. 3. Comparison of the border position of SSC, LSC and IR regions among the eight Asteraceae chloroplast genomes. Selected genes or portions of genes are indicated by the boxes above the genome.

64

Y. Zhang et al. / Gene 549 (2014) 58–69

30.9%), which are separated by the IRs (Fig. 1). To validate the assembly, four junction regions between the IRs and SSC/LSC were confirmed using PCR amplification and Sanger sequencing. After annotation, this genome sequence has been submitted to GenBank (GenBank ID: KF922320). Features on the transcriptional clockwise and counter-clockwise strands are drawn on the inside and outside of the outer circle, respectively. Genes belonging to different groups are color-coded. The genome coordinate and GC content are shown in the inner circle. The thick lines indicate inverted repeats (IRA and IRB), which separate the genome into small (SSC) and large (LSC) single copy regions. The map was drawn by using OGDRAW (Hansen et al., 2007). 3.2. Genome content and organization The E. catarium cp genome contains 119 unique-coding genes, including 37 tRNA genes, 4 rRNA genes and 85 predicted protein-coding genes (Table 1). In addition, there are 18 duplicated in the IR, for a total of 137 genes present in the E. catarium cp genome (Fig. 1). In 18 duplicated genes, 7 genes were predicted protein-coding genes in the IR including ycf15, rps7, ndhB, rpl23, ycf2, rps19 and rpl2. The other genes were 7 tRNAs and 4 rRNAs, which are duplicated completely in the IR region. The same result can be found in other Asteraceae such as P. argentatum (Doorduin et al., 2011) and A. adenophora (Nie et al., 2012) but not in A. frigida (Liu et al., 2013). Among the 137 annotated genes, 11 genes have introns, 4 protein-coding genes and 3 tRNA genes have single introns, and 4 protein-coding genes have two introns. There are 11 genes with introns, seven (five protein-coding genes and two tRNA genes) in the LSC, three (two protein-coding genes and one (rps12) tRNA gene), one spanning the IR and LSC (summarized in Table 2). Sequence analysis indicates that 48.66%, 2.81% and 5.60% of genome sequences encode proteins, tRNAs and rRNAs, respectively, and 52.7% of genome sequences are non-coding and filled with introns, intergenic spacers and pseudogenes. In addition, 87 protein-coding genes in this genome represent 73,569 bp nucleotides coding for 24,523 codons. Based on the sequences of protein-coding genes and tRNA genes within the cp genome, Leu (10.37%) and Ile (1.22%) are the most and least used amino acids (see Table 3). rps12 is a transspliced gene with a 5′ end exon located in the LSC region and the duplicated 3′ end exon is located in the IR regions. The same location is found in all other Asteraceae plants (Liu et al., 2013). ycf3 has the largest intron (754 bp), which is different from A. adenophora (Nie et al., 2012). 3.3. Comparison with other Asteraceae cp genomes To analyze this invasive plant at the genome-level, the sequence of all the eight Asteraceae cp genomes was plotted using the VISTA program with the annotation of E. catarium as reference (Fig. 2, percent identity plot as summarized in Table 4). E. catarium cp genome size is between H. annuus (151,104 bp) and G. abyssinica (151,762 bp), thus far, the fourth largest among the eight reported Asteraceae cp genomes (Table 5). Moreover, the genome is approximately 0.721 kb and 0.712 kb larger than the J. vulgaris and A. adenophora genomes, respectively. The E. catarium cp genome contains one of the largest LSCs and smallest IR regions among the eight cp genomes. Another plant A. frigida has the smallest LSC region. The SSC region is inverted compared with other angiosperm species, such as Arabidopsis (Sato et al., 1999), in the other six Asteraceae cp genomes. However, the gene order in the SSC region in A. frigida is inverted compared to other Asteraceae plants, which is similar to tobacco (Liu et al., 2013). The two LSC region inversions, a large 23 kb and a smaller 3.4 kb, were found in the E. catarium cp genome, which is consistent with other Asteraceae plants, indicating that the two inversions may be a key feature of the Asteraceae chloroplast genome (Nie et al., 2012). The length of angiosperm cp genomes is variable primarily due to expansion and contraction of the inverted repeat IR region and the

single copy boundary regions. To elucidate this mechanism, the IR/SC boundary regions of the eight Asteraceae cp genomes were compared (Fig. 3). At the two SSC boundaries, the general structure was revealed in dicots (i.e., tobacco, Panax and Arabidopsis), and includes ycf1 spans and a ycf1 pseudogene adjacent to JSB in IRb (Yang et al., 2010). The E. catarium IRs expanded 421 bp into the 5′ portion of the ycf1 gene, and that of A. adenophora, H. annuus, J. vulgaris and G. abyssinica expanded 467 bp, 576 bp, 567 bp and 563 bp, respectively. However, it is noteworthy that P. argentatum is 174 bp apart from the IRb/SSC border. In addition, in L. sativa and A. frigida, the ycf1 gene was not found in the SSC region, and SSycf1 was found in the IRB regions in A. frigida. In addition to expansion to the ycf1 gene, the IR region was also expanded to the rps19 gene in all the eight Asteraceae species. It was expanded to 97 bp, 100 bp, 101 bp, 41 bp, 97 bp, 95 bp, 58 bp and 60 bp in E. catarium, A. adenophora, H. annuus, J. vulgaris, G. abyssinica, L. sativa, P. argentatum and A. frigida, respectively. The ndhF varied in distance from the IRa/SSC border, and was entirely located in the SSC region in all the eight species. In both L. sativa and A. frigida, ndhF was located only 1 bp and 75 bp from the IRb/SSC border, and both species are invasive plants. Compared with other monocot and dicot species, the position of the trnH gene in the cp genome is highly conserved. In general, the trnH gene is located in the IR region in the monocots, compared with its location in the LSC region in the dicots (Kuang et al., 2011; Wiegert et al., 2012). Similar to other dicots, all eight Asteraceae species had the trnH gene located in the LSC region. All aligned sequences indicate that the Asteraceae cp genomes are highly conserved, which is similar to Nie's research (Nie et al., 2012). The coding region is more conserved than the non-coding counterpart similar to other angiosperms such as A. frigida Wild (Liu et al., 2013) and oil palm (Uthaipaisanwong et al., 2012). The ycf1 gene, which is a pseudogene in A. adenophora and P. argentatum, is the most divergent of all the genes (Kumar et al., 2009). The rpoC1 gene contains two introns, which also show high sequence divergence, similar to A. adenophora. In

Table 6 Promising regions identified for developing phylogenetic markers in Asteraceae family. Region

Length (bp)

Tree length

CI

RI

Pars. inf. char (%)

Topology gene versus species tree

ccsA–trnL–UAG trnG–UCC–trnfM–CAU rpl33–rps18 lhbA–trnG–UCC rpoC2–rps2 cemA–petA ndhG–ndhE psbK–psbI rpl16–rps3 clpP matK ycf3 rps15 psbH psbI rbcL ycf4 ndhK atpF rpl20 ndhI rps8 rpoA infA cemA rps14 ndhG ndhH Combined regions

125 185 169 321 193 240 211 356 1194 623 1554 754 279 222 162 1425 555 714 411 381 501 402 1014 234 690 303 531 1182 14,931

61 57 24 46 39 43 119 79 200 70 232 74 188 31 8 156 46 96 32 27 278 33 78 17 68 19 352 672 3266

0.95 0.91 0.95 0.96 0.92 0.93 0.95 0.86 0.90 0.91 0.94 0.97 0.96 0.84 1.00 0.82 0.89 0.97 1 0.93 0.97 0.91 0.90 0.88 0.94 0.89 0.97 0.98 0.93

0.57 0.69 0.90 0.89 0.57 0.73 0.50 0.62 0.67 0.84 0.77 0.83 0.50 0.61 1.00 0.54 0.72 0.75 1 0.67 0.47 0.70 0.68 0.71 0.67 0.67 0.42 0.44 0.54

6.92% 5.16% 3.47% 4.32% 3.85% 4.14% 5.22% 6.17% 2.98% 4.74% 2.35% 1.05% 4.48% 3.15% 0.60% 2.90% 1.98% 0.98% 1.70% 1.05% 2.17% 1.50% 1.38% 2.14% 1.30% 1.32% 3.33% 1.87% 2.60%

Congruent Congruent Congruent Congruent Congruent Congruent Congruent Incongruent Congruent Incongruent Congruent Congruent Congruent Incongruent Congruent Congruent Congruent Incongruent Incongruent Congruent Congruent Congruent Incongruent Congruent Congruent Incongruent Congruent Incongruent Congruent

Y. Zhang et al. / Gene 549 (2014) 58–69

addition, several regions are found to show high divergence, including trnk–psbK, aptL–aptF, trnS–trnG, ndhC–trnM and psbL–petG (Table 4). 3.4. Phylogenetic trees E. catarium belongs to the tribe Eupatorium in the Asteraceae family. Several studies have been conducted to analyze the phylogenetic relationship in the Asteraceae family based on chloroplast coding or noncoding sequences (Doorduin et al., 2011; Kumar et al., 2009; Liu et al., 2013; Nie et al., 2012). In general, the phylogenetic trees of the molecular markers should be congruent with that of the species because the sequence evolution rates are linked to the evolution and life history of the species (Nie et al., 2012). Thus, DNA regions that have the congruent trees (MPTs) can be identified, and may be molecular markers to analyze the evolution history of the Asteraceae family. When comparing the E. catarium full cp genome with all complete Asteraceae chloroplast genomes, the regions that showed moderate sequence divergence and

65

that could be aligned between these eight species are listed in Table 6. Fourteen markers (ccsA–trnL, trnG–trnfM, rpl33–rps18, lhbA–trnG, rpoC2–rps2, cemA–petA, ndhG–ndhE, rpl16–rps3, matK, rps15, rbcL, ndhI, infA, ndhG) contained greater than 2% parsimony-informative characters. Among these, the ccsA–trnL region contained the highest parsimony-informative character values, at 6.92%, but Nie's study showed that it was the ndhD–ccsA region with a parsimony-informative character value of 4.5% (Nie et al., 2012). In Fig. 4, the most complete parsimony phylogenetic trees, using all the alignable divergent regions (28 regions in all) were constructed. In a previous comparison of H. annuus against L. sativa (Timme et al., 2007) and G. abyssinica (Dempewolf et al., 2010), with J. vulgaris against H. annuus, L. sativa, G. abyssinica and P. argentatum (Zhang et al., 2012), and with A. adenophora against J. vulgaris, H. annuus, L. sativa, G. abyssinica and P. argentatum (Nie et al., 2012), the ndhC–trnV, psbM– trnD, Matk and clpP regions were already identified as divergent regions that contained high phylogenetic information as phylogeny markers in

Fig. 4. Maximum parsimony (MP) trees of all the selected 28 chloroplast regions of 8 Asteraceae species. The phylogram of “combined regions” was constructed from the MP analysis using all the 28 regions together.

66

Y. Zhang et al. / Gene 549 (2014) 58–69

the Asteraceae. In our current study, the MPTs of the clpP regions are incongruent, which is consistent with Doorduin's research (Doorduin et al., 2011), however, the Inf. Value is as high as 4.74%. The ndhC–trnV and psbM–trnD regions were not used to align the MPTs for the lower Inf. Value (less to 0.6%), but ndhC–trnV had Inf. Values of 4% (Doorduin et al., 2011) and 2.3% (Nie et al., 2012) and the MPTs were both congruent. However, Matk showed the same expression with the Inf. Value (2.35%). Among the 20 regions that were congruent phylogenetic trees with the species, 17 new markers were used in the present molecular phylogenetic studies, with an Inf. Value from 1.05% to 5.22%. In addition, many of these regions are not yet used in present molecular phylogenetic studies and further studies may be worthwhile. The selected 37 chloroplast regions were also extracted from cp genomes belonging 22 taxa. MP analysis constructed a single tree with a length of 17,393, and with a consistency index of 0.5116 and a retention index of 0.6360 (Fig. 5). ML bootstrap values were high and all the 19 nodes had 100% bootstrap support. MP and ML had the same phylogenetic topologies and the phylogenetic tree formed two major clades: monocots and eudicots (Fig. 5). The six species in the Asteraceae family were clustered into Asterals and placed within the euasterids II. However, E. catarium was not grouped together with A. adenophora in the tribe Eupatorium.

3.5. Repeat structure and small inversions Repeat regions play important roles in genome recombination and rearrangement research (Yang et al., 2010). Tandem and dispersed repeats were analyzed for E. catarium cp genomes. Twenty-eight tandem repeats were identified, of which 10 were 15–20 bp, 6 were 21–30 bp, 5 were 31–40 bp, 1 was 51–60 bp, 1 was 71–80 bp, 2 were 81–90 bp and 3 were more than 91 bp. Sixty dispersed repeats were identified, of which 6 were 21–30 bp, 12 were 31–40 bp, 11 were 41–50 bp, 3 were 51–60 bp, 4 were 61–70 bp, 6 were 71–80 bp, 3 were 81–90 bp, and 15 were more than 91 bp. In total, 88 repeats were identified, 69% in intergenic spacer regions, 10% in introns and 9% in CDS regions, respectively (Fig. 6, Table S1). These repeat motifs will provide an informative source for developing markers for population studies and phylogenetic analysis (Nie et al., 2012). The repeat structures in the eight species of Asteraceae were also analyzed using REPuter (Fig. 7). Forward repeats and inverted repeats are common in these species, but the repeat structures of the four invasive species have no obvious Asteraceae characteristics. In addition, in the same E. catarium subtribe, the repeat structures between Praxelis and A. adenophora are different. Of the 8 Asteraceae cp genomes studied, E. catarium contains the greatest total number of repeats that are 75 bp or greater in length. A. adenophora

Fig. 5. The MP phylogenetic tree is based on 32 protein-coding from 37 plant taxa.

Y. Zhang et al. / Gene 549 (2014) 58–69

belongs to the Eupatorium tribe and E. catarium a higher number of repeats that are 40 bp or greater in length was also reported by Liu et al. (Liu et al., 2013). SIs are flanked by a pair of IRs ranging from 11 to 24 bp in size. SIs varied in length from 5 to 50 bp. Small inversions or SIs between IRs are interesting upon examination. SIs can generally be determined through pair-wise comparison between the sequences from closely related taxa. Sixteen SIs have been reported previously for plant chloroplasts (Kim and Lee, 2005). Two additional SIs have been found recently in the intergenic regions of psbA–trnH and psbC–trnS (Catalano et al., 2008; Jasen, 2006) and one additional SI is located in the psaB coding region of the date palm (Yang et al., 2010). In the mungbean cp genome, 28 SIs of 30 bp or longer were identified with a sequence identity of 90% (Tangphatsornruang et al., 2010). In the E. catarium cp genome, we detected 20 SIs when searching for IRs. Fourteen SIs were confirmed. Four SIs were found with high homologous sequences in Asteraceae plants. The remaining two were putative SIs because we could not find homologous sequences among other monocots (Table S1). The folded stem–loop structures of six unconfirmed SIs of Praxelis are shown in Fig. 8. If the putative SIs are shown to have commonalities in their genome, they may provide phylogenetic information or may even play functional roles in stabilizing their corresponding mRNAs (Yang et al., 2010).

3.6. SSR marker Simple sequence repeats (SSRs), also called microsatellites, are considered valuable molecular markers for population genetics because they exhibit high variation within the same species (Tsai et al., 2008). SSRs are highly polymorphic due to a high mutation rate that affects the number of repeat units. Within the E. catarium cp genome, 23 different SSR loci repeated more than 5 times (Table 7). Of these, 20 loci are homopolymers, one is a di-polymer, and two are tri-polymers. Nineteen of the homopolymer loci contain multiple A or T nucleotides, while the remaining one contains multiples of the C nucleotide. All of the tripolymers contain multiple ATA or GAA. These SSR loci contribute to the A–T richness of the E. catarium cp genome.

Table 7 The SSR loci of E. catarium cp genome. Position of Repeat Repeat length Locus repeat of consensus

Region

1982 2351 5467 8927 13,511 17,401 18,094 28,280 30,084 43,389 44,257 50,281 59,379 63,247 77,952 85,241 109,342 109,801 110,011 110,293 120,157 123,371 151,110

Intergenic spacer CDS Intergenic spacer Intergenic spacer CDS Intron CDS Intergenic spacer Intergenic spacer Intron Intergenic spacer Intergenic spacer Intergenic spacer Intergenic spacer Intergenic spacer Intergenic spacer CDS CDS CDS CDS Intergenic spacer Intergenic spacer Intergenic spacer

T T C T A T A T A ATA TA T T T A T GAA A A T T A A

13 11 10 10 10 12 10 11 11 5 8 16 10 10 15 12 5 10 11 10 10 12 12

psbA–matK matK rps16–trnQ–(UUG) trnS–(GCU)–trnC–(GCA) rpoB rpoC1 rpoC1 atpF–atpA trnR–(UCU)–trnT–(GGU) ycf3 ycf3–trnS–(GGA) ndhC–trnV–(UAC) psaI–ycf4 petA–psbJ petD–rpoA rps19–rpl2 ycf1 ycf1 ycf1 ycf1 psaC–ndhD trnL–(UAG)–rpl32 rpl2–rps19

67

4. Conclusion Using the Illumina high-throughput sequencing technology, we obtained the complete E. catarium cp genome. This is the second chloroplast genome sequenced in the Eupatorieae tribe and also the eighth in the Asteraceae family. The cp genome evolution features in eight Asteraceae plants were analyzed as well as the whole genome sequences. Phylogenetic analysis demonstrates a sister relationship between Praxelis and seven other species in Asteraceae, including A. adenophora, H. annuus, G. abyssinica, L. sativa, A. frigida, J. vulgaris and P. argentatum. The result could be useful for molecular phylogenetic and molecular ecological studies within this species and also within the Asteraceae family. Supplementary data to this article can be found online at http://dx. doi.org/10.1016/j.gene.2014.07.041. Acknowledgments This work was supported by grants from the General Program of the National Science Foundation of China (No. 31060095 and No. 31360173). References Castro, I., Pinto-Carnide, O., Ortiz, J.M., Martin, J.P., 2013. Chloroplast genome diversity in Portuguese grapevine (Vitis vinifera L.) cultivars. Mol. Biotechnol. 54 (2), 528–540. http://dx.doi.org/10.1007/s12033-012-9593-9 (Jun). Catalano, S., Saidman, B., Vilardi, J., 2008. Evolution of small inversions in chloroplast genome: a case study from a recurrent inversion in angiosperms. Cladistics 25, 93–104. Chang, C.C., Lin, H.C., Lin, I.P., Chow, T.Y., Chen, H.H., Chen, W.H., Cheng, C.H., Lin, C.Y., Liu, S.M., Chaw, S.M., 2006. The chloroplast genome of Phalaenopsis aphrodite (Orchidaceae): comparative analysis of evolutionary rate with that of grasses and its phylogenetic implications. Mol. Biol. Evol. 23, 279–291. Chumley, T.W., Palmer, J.D., Mower, J.P., Fourcade, H.M., Calie, P.J., Boore, J.L., Jansen, R.K., 2006. The complete chloroplast genome sequence of Pelargonium × hortorum: organization and evolution of the largest and most highly rearranged chloroplast genome of land plants. Mol. Biol. Evol. 23, 2175–2190. Dempewolf, H., Kane, N.C., Ostevik, K.L., Geleta, M., Barker, M.S., Lai, Z., Stewart, M.L., Bekele, E., Engels, J.M., Cronk, Q.C., Rieseberg, L.H., 2010. Establishing genomic tools and resources for Guizotia abyssinica (L.f.) Cass.—the development of a library of expressed sequence tags, microsatellite loci, and the sequencing of its chloroplast genome. Mol. Ecol. Resour. 10, 1048–1058. Doorduin, L., Gravendeel, B., Lammers, Y., Ariyurek, Y., Chin, A.W.T., Vrieling, K., 2011. The complete chloroplast genome of 17 individuals of pest species Jacobaea vulgaris: SNPs, microsatellites and barcoding markers for population and phylogenetic studies. DNA Res. 18, 93–105. Ebihara, A., Farrar, D.R., Ito, M., 2008. The sporophyte-less filmy fern of eastern North America Trichomanes intricatum (Hymenophyllaceae) has the chloroplast genome of an Asian species. Am. J. Bot. 95, 1645–1651. Hansen, D.R., Dastidar, S.G., Cai, Z., Penaflor, C., Kuehl, J.V., Boore, J.L., Jansen, R.K., 2007. Phylogenetic and evolutionary implications of complete chloroplast genome sequences of four early-diverging angiosperms: Buxus (Buxaceae), Chloranthus (Chloranthaceae), Dioscorea (Dioscoreaceae), and Illicium (Schisandraceae). Mol. Phylogenet. Evol. 45, 547–563. Jasen, J., 2006. A chloroplast DNA hairpin structure provides useful phylogenetic data within tribe Senecioneae (Asteraceae). Can. J. Bot. 84, 862–868. Kan, L., Xie, G., Wang, J., 2009a. Effect of drought stress on the growth and eco-physiologic characteristics of invasive plant Eupatorium catarium seedlings. Chin. J. Trop. Crops 30, 425–432. Kan, L., Xie, G., Wang, J., 2009b. Seed germination of Eupatorium catarium under salt stress. Chin. J. Trop. Agric. 29, 26–31. Kim, K.J., Lee, H.L., 2005. Widespread occurrence of small inversions in the chloroplast genomes of land plants. Mol. Cells 19, 104–113. Kuang, D.Y., Wu, H., Wang, Y.L., Gao, L.M., Zhang, S.Z., Lu, L., 2011. Complete chloroplast genome sequence of Magnolia kwangsiensis (Magnoliaceae): implication for DNA barcoding and population genetics. Genome 54, 663–673. Kumar, S., Hahn, F.M., McMahan, C.M., Cornish, K., Whalen, M.C., 2009. Comparative analysis of the complete sequence of the plastid genome of Parthenium argentatum and identification of DNA barcodes to differentiate Parthenium species and lines. BMC Plant Biol. 9, 131. Liu, J.-f., Liu, Q., Li, C.-l., Wang, H., 2011. Distribution of invasive plants Eupatorium catarium along different levels of the highways in Haikou city. J. Guangdong Agric. Sci. China 1–5. Liu, Y., Huo, N., Dong, L., Wang, Y., Zhang, S., Young, H.A., Feng, X., Gu, Y.Q., 2013. Complete chloroplast genome sequences of Mongolia medicine Artemisia frigida and phylogenetic relationships with other plants. PLoS One 8, e57533. Maia, G.L., Falcao-Silva Vdos, S., Aquino, P.G., de Araujo-Junior, J.X., Tavares, J.F., da Silva, M.S., Rodrigues, L.C., de Siqueira-Junior, J.P., Barbosa-Filho, J.M., 2011. Flavonoids from Praxelis clematidea R.M. King and Robinson modulate bacterial drug resistance. Molecules 16, 4828–4835.

68

Y. Zhang et al. / Gene 549 (2014) 58–69

Fig. 6. Repeat structure analysis in the E. catarium cp genome. The cutoff value for tandem repeat is 15 bp and 30 bp for dispersed repeat. A. Frequency of repeats by length; B. Repeat type; C. Location distribution of all the repeats.

Martin, K.P., Beena, M.R., Joseph, D., 2003. High frequency axillary bud multiplication and ex vitro rooting of Wedelia chinensis (Osbeck) Merr.—a medicinal plant. Indian J. Exp. Biol. 41, 262–266. Nie, X., Lv, S., Zhang, Y., Du, X., Wang, L., Biradar, S.S., Tan, X., Wan, F., Weining, S., 2012. Complete chloroplast genome sequence of a major invasive species, crofton weed (Ageratina adenophora). PLoS One 7, e36869. Rani, D.N., Abraham, T.E., 2006. Kinetics and thermal stability of two peroxidase isozymes from Eupatorium odoratum. Appl. Biochem. Biotechnol. 128, 215–226. Sato, S., Nakamura, Y., Kaneko, T., Asamizu, E., Tabata, S., 1999. Complete structure of the chloroplast genome of Arabidopsis thaliana. DNA Res. 6, 283–290. Serrano, M., Wang, B., Aryal, B., Garcion, C., Abou-Mansour, E., Heck, S., Geisler, M., Mauch, F., Nawrath, C., Metraux, J.P., 2013. Export of salicylic acid from the chloroplast requires the multidrug and toxin extrusion-like transporter EDS5. Plant Physiol. 162, 1815–1821. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., Kumar, S., 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol. Biol. Evol. 28, 2731–2739. Tang, L.C., Wang, N., Yang, X.H., Deng, S.M., 2011. Analysis of the volatile component of the invasive plant—Eupatorium catarium Veldkamp. J. Anhui Agric. Sci. China 39, 5805–5806. Tangphatsornruang, S., Sangsrakru, D., Chanprasert, J., Uthaipaisanwong, P., Yoocha, T., Jomchai, N., Tragoonrung, S., 2010. The chloroplast genome sequence of mungbean (Vigna radiata) determined by high-throughput pyrosequencing: structural organization and phylogenetic relationships. DNA Res. 17, 11–22. Thompson, J.D., Higgins, D.G., Gibson, T.J., 1994. CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, positionspecific gap penalties and weight matrix choice. Nucleic Acids Res. 22, 4673–4680.

Fig. 7. Repeat structures in the eight species of Asteraceae.

Timme, R.E., Kuehl, J.V., Boore, J.L., Jansen, R.K., 2007. A comparative analysis of the Lactuca and Helianthus (Asteraceae) plastid genomes: identification of divergent regions and categorization of shared repeats. Am. J. Bot. 94, 302–312. Tsai, L.C., Wang, J.C., Hsieh, H.M., Liu, K.L., Linacre, A., Lee, J.C., 2008. Bidens identification using the noncoding regions of chloroplast genome and nuclear ribosomal DNA. Forensic Sci. Int. Genet. 2, 35–40. Uthaipaisanwong, P., Chanprasert, J., Shearman, J.R., Sangsrakru, D., Yoocha, T., Jomchai, N., Jantasuriyarat, C., Tragoonrung, S., Tangphatsornruang, S., 2012. Characterization of the chloroplast genome sequence of oil palm (Elaeis guineensis Jacq.). Gene 500, 172–180. Wei, C., Guo-yu, L., Feng, A., Ju-sheng, J., Zhen-hui, W., Qiu-bo, C., Xin-min, W., 2007. Niche characteristics of Eupatorium catarium community in Hainan. J. Northwest For. Univ. China 22, 24–27. Wiegert, K.E., Bennett, M.S., Triemer, R.E., 2012. Evolution of the chloroplast genome in photosynthetic euglenoids: a comparison of Eutreptia viridis and Euglena gracilis (Euglenophyta). Protist 163, 832–843. Yang, M., Zhang, X., Liu, G., Yin, Y., Chen, K., Yun, Q., Zhao, D., Al-Mssallem, I.S., Yu, J., 2010. The complete chloroplast genome sequence of date palm (Phoenix dactylifera L.). PLoS One 5, e12762. Yi, D.K., Kim, K.J., 2012. Complete chloroplast genome sequences of important oilseed crop Sesamum indicum L. PLoS One 7, e35872. Young, H.A., Lanzatella, C.L., Sarath, G., Tobias, C.M., 2011. Chloroplast genome variation in upland and lowland switchgrass. PLoS One 6, e23980. Zhang, T., Fang, Y., Wang, X., Deng, X., Zhang, X., Hu, S., Yu, J., 2012. The complete chloroplast and mitochondrial genome sequences of Boea hygrometrica: insights into the evolution of plant organellar genomes. PLoS One 7, e30531.

Y. Zhang et al. / Gene 549 (2014) 58–69

Fig. 8. Folded stem–loop structures in six SIs of Praxelis.

69