Journal Pre-proofs Plastome evolution and phylogeny of subtribe Aeridinae (Vandeae, Orchidaceae) Young-Kee Kim, Sangjin Jo, Se-Hwan Cheon, Myounghai Kwak, YoungDong Kim, Ki-Joong Kim PII: DOI: Reference:
S1055-7903(19)30189-7 https://doi.org/10.1016/j.ympev.2019.106721 YMPEV 106721
To appear in:
Molecular Phylogenetics and Evolution
Received Date: Revised Date: Accepted Date:
27 March 2019 8 December 2019 12 December 2019
Please cite this article as: Kim, Y-K., Jo, S., Cheon, S-H., Kwak, M., Kim, Y-D., Kim, K-J., Plastome evolution and phylogeny of subtribe Aeridinae (Vandeae, Orchidaceae), Molecular Phylogenetics and Evolution (2019), doi: https://doi.org/10.1016/j.ympev.2019.106721
This is a PDF file of an article that has undergone enhancements after acceptance, such as the addition of a cover page and metadata, and formatting for readability, but it is not yet the definitive version of record. This version will undergo additional copyediting, typesetting and review before it is published in its final form, but we are providing this version to give early visibility of the article. Please note that, during the production process, errors may be discovered which could affect the content, and all legal disclaimers that apply to the journal pertain.
© 2019 Published by Elsevier Inc.
Plastome evolution and phylogeny of subtribe Aeridinae (Vandeae, Orchidaceae) Young-Kee Kima, Sangjin Joa, Se-Hwan Cheona, Myounghai Kwakb, Young-Dong Kimc, KiJoong Kima* aDivision
of Life Sciences, Korea University, Seoul 02841, Korea
bDepartment
of Plant Resources, National Institute of Biological Resources, Incheon 22689,
Korea cDepartment
of Life Science, Hallym University, Chuncheon 24252, Korea
*Corresponding author E-mail:
[email protected]
Abstract Subtribe Aeridinae (Vandeae, Epidendroideae, Orchidaceae) consists of 83 genera and 2,345 species. The present study completely decoded the plastomes and nuclear ribosomal (nr) RNA gene clusters of seven species of Aeridinae belonging to Gastrochilus, Neofinetia, Pelatantheria, and Thrixspermum and compared them with existing data to investigate their genome evolution and phylogeny. Although no large structural variations were observed among the Aeridinae plastomes, 14 small inversions (SI) were found in Orchidaceae for the first time. Therefore, the evolutionary trends and usefulness of SI as molecular identification markers were evaluated. Since all 11 ndh genes in the Aeridinae plastome were lost or pseudogenized, the evolutionary trends of ndh genes are discussed at the tribe and family levels. In the maximum likelihood tree reconstructed from 83 plastome genes, the five Orchidaceae subfamilies were shown to have diverged in the following order: Apostasioideae, Vanilloideae, Cypripedioideae, Orchioideae, Epidendroideaeae. Divergence times for major
lineages were found to be more recent, 5–10 Mya, than previous studies, which only used two or three genes. Vandeae, which includes Aeridinae, formed a sister group with Cymbidieae and Epidendreae. The Vandeae, Cymbidieae, and Epidendreae lineages were inferred to have diverged at 25.31 Mya; thus, numerous speciation events within Aeridineae occurred since then. Furthermore, the present study reconstructed a phylogenetic tree from 422 nrITS sequences belonging to Aerdinae and allied taxa and uses it to discuss the phylogenetic positions and species identities of five endangered species. Keywords: Plastome, Orchidaceae, Aeridinae-Vandeae, Endangered species, Molecular clock, nrRNA gene repeating unit.
1. Introduction Orchidaceae is one of the most specious flowering plant family, consisting of five subfamilies, 22 tribes, 736 genera, and 28,000 species (Chase et al., 2015; Christenhusz and Byng, 2016). Among these, tribe Vandeae is divided into the subtribes Adrorhizinae, Aeridinae, Angraecinae, and Polystachyinae (Chase et al., 2015). Aeridinae is the largest of these, consisting of 83 genera and 1,314 species (Chase et al., 2015). Aeridinae lives mainly in tropical and subtropical Asia and northern Australia, but some species are also distributed in northeastern Asia, and Africa (Hidayat et al., 2012). Aeridinae is mainly composed of epiphytes monopodial (Hidayat et al., 2012; Pridgeon et al., 2014). Morphological features, such as the number of pollinia, presence or absence of nectar spurs, shape and size of labellum, and shape of the column-feet, have been used to differentiate the genera in Aeridinae (Pridgeon et al., 2014). However, these morphological features have been derived independently many times during Aeridinae’s evolution (Topik et al., 2005). Phylogenetic studies have been conducted to identify the phylogenetic
relationships among Aeridinae taxa using a few plastome genes or nuclear ribosomal internal transcribed spacer (nrITS) (Carlsward et al., 2006; Fan et al., 2009; Gardiner et al., 2013; Kocyan et al., 2008; Liu et al., 2011; Padolina et al., 2005; Topik et al., 2005; Tsai et al., 2010). Many studies have supported Aeridinae as monophyletic, but the relationships among the taxa in Aeridinae have not yet been reliably established. The plastomes of 130 Orchidaceae species in 39 genera have been decoded thus far (NCBI database, March 1, 2019); however, most studies have concentrated on Cymbidieae (11 species for Cymbidium) and Malaxideae (40 species for Dendrobium), which are horticulturally important. After that, Epidendreae (10 species for Corallorhiza), Nervilieae (two species for Epipogium), and Neottieae (six species for Neottia)—including mycoheterotrophic species, which are evolutionarily interesting—have been intensively studied. The 61 species in the remaining 34 genera have only been studied sporadically. These studies have been insufficient because they have only decoded three plastomes of Phalaenopsis in tribe Vandeae (Chang et al., 2006; Jheng et al., 2012; Kim et al., 2016). Plastomes of orchids show various major structural modification including massive gene losses, large IR expansions/contractions, and several genome rearrangements, etc. These major structural modifications are common in mycoheterotrophic orchid species which lost photosynthetic capacity (Logacheva et al., 2011, Barrett et al., 2018, Kim et al., 2019). In contrast, the plastomes of photosynthetic orchid species are relatively stable in their structures and gene contents (Yang et al., 2014, Kim et al., 2015, Kim et al., 2017). Only small IR shifts and a few parallel pseudogenizations or losses of ndh gene class were reported from the photosynthetic orchid species (Chang et al., 2006, Yang et al., 2013). From other families, several small inversion (SIs) have been described after first being reported in Poaceae (Kelchner and Wendel, 1996; Kim and Lee, 2004, 2005; Jo et al., 2019); however, there have been no studies regarding SIs in orchid plastomes. The SIs are usually located on the hairpin
structures due to the flip-flop mutation between the two stem regions. Stem-loops that form hairpin structures are often present 3’ downstream of several plastome genes and are thought to contribute to mRNA stabilization. Orchid plants are widely used for medicinal and industrial purposes depending on the species, and most species are widely cultivated for ornamental purposes since they bloom with colorful and fragrant flowers. Therefore, wild orchid plants have been gathered to such an extent by enthusiasts that many wild species have become extinct or endangered in their native habitats. In addition, factors such as habitat destruction and loss of moisture mediators are also accelerating the extinction of orchid species (Swarts and Dixon, 2009). Therefore, orchids are very important for endangered species conservation, and phylogenetic and genetic diversity studies of relevant species are essential to identifying why and how the corresponding species have become endangered. Specifically, studies on the plastomes of these species can be used later as basic data for further studies related to the conservation of endangered orchids. Genus Gastrochilus, consists of 56 species, mainly distributed in areas ranging from Indonesia in southeast Asia to China, Japan, and Korea in northeast Asia (Chase et al., 2015; Lee, 2011; Lee et al., 2007). Among these, the species G. japonicus and G. fuscopunctatus are distributed in the southern parts of Korea, the northern limit of the genus’ distribution range. Genus Neofinetia is composed of three species distributed in the southern regions of Korea, China, and Japan (Chen et al., 2009). Neofinetia falcata, which grows wild in Korea, (Lee, 2011). The wild type of N. richardsiana is not known, as the species has only been reported as cultivated individuals (Christenson, 1996). N. falcata ‘CheongSan’ is a variety of N. falcata characterized by shorter, thicker leaves. Genus Pelatantheria comprises eight species distributed in areas ranging from Sumatra to Korea and the southern region of Japan (Chen et al., 2009). P. scolopendrifolia, distributed in the southern part of Korea (Lee, 2011). The
plants have a characteristic shape similar to chilopods. Genus Thrixspermum is a large genus consisting of 161 species that are widely distributed in the tropics, ranging from Sri Lanka to the Pacific Ocean (Chen et al., 2009). Among these species, T. japonicum is a small, 2–13 cm-long epiphytic plant that grows in Korea, Japan, and China (Lee, 2011). The present study completely decoded the plastomes and nrRNA gene families of these seven species (Gastrochilus japonicus, G. fuscopunctatus, Neofinetia falcata, N. falcata ‘CheongSan’, N. richardsiana, Pelatantheria scolopendrifolia, and Thrixspermum japonicum) to reveal their structures and evolutionary genomic characteristics. The Orchidaceae phylogenetic tree was reconstructed using the full plastome data produced in the present study and other available data libraries to discuss the phylogenetic relationships of Aeridinae. Furthermore, using the molecular clock based on all plastome genes, the divergence times of major Orchidaceae clades and Aeridinae taxa were estimated and compared with previous results based on only two or three regions. In addition, a phylogenetic tree was constructed with the nrITS sequences of Aeridinae and related species to identify the phylogenetic positions of endangered species and discuss their systematically outstanding questions. Moreover, data such as SIs, simple sequence repeats (SSRs), and large repeats, which are very likely to be used in studies of endangered species and hybrid taxa, are discussed to present markers that can be used in studies on the genetic structure of populations.
2. Materials and Methods 2.1. Taxon sampling and DNA extraction Leaf materials and their voucher information used in the next generation sequencing (NGS) study are given in Table 1. Fresh leaf samples were ground to a powder with liquid nitrogen using a pestle and mortar. Genomic DNA was extracted using a G-spin™ Ⅱ plant
genomic DNA extraction kit (iNtRON, Seoul). The quality of the genomic DNA was evaluated by a UV/VIS spectrophotometer (Thermo, Wilmington, DE, USA). The genomic DNAs were deposited in the Plant DNA Bank in Korea under the accession numbers displayed in Table 1.
2.2. Plastome sequencing, assembly, and annotation Following library construction, the DNAs of N. falcata, N. falcata ‘CheongSan’, and N. richardsiana were sequenced using an Illumina HiSeq 2000 (Illumina, San Diego, CA, USA). The other four species, G. fuscopunctatus, G. japonicus, P. scolopendrifolia, and T. japonicum, were sequenced using an Illumina MiSeq (Illumina, San Diego, CA, USA). Raw reads were imported into Geneious 6.1.8 and trimmed using a 0.05 error probability limit. Phalaenopsis equestris (NC017609) was used as a reference sequence for reference-guided assembly in the Geneious 6.1.8 Geneious assembler (Kearse et al., 2012). Complete plastome sequences were annotated using BLASTn, tRNAscan-SE 2.0 (Lowe and Chan, 2016), and the find annotation functions in Geneious 6.1.8. In the ndh genes analysis, this study extracted and aligned 11 gene sequences corresponding to ndhA through ndhK from 45 Orchidaceae chloroplasts to determine which chloroplasts contained the ndh gene complex. Based on these alignments, the sequence similarity was compared to species with a complete open reading frame (ORF) of ndh genes. The gene sequences at which 50% or less of the complete CDS remained or at which the similarity was 50% or lower were judged as lost gene. Gene sequences that had both 50% sequence similarity to the relevant CDS and many stop codons in the middle, sites for which it was hard to identify any gene due to frame shift mutation, etc., were judged to have a pseudogene. The gene sequences at which the CDS was complete or few (five or fewer)
internal stop codons existed and RNA editing was possible were judged to have a gene in ndh genes analysis. This criterion referred to a study of character reconstructions of ndh gene status in Orchidaceae (Kim et al., 2019). Circular plastome maps were constructed using the OGDRAW web server (Lohse et al., 2007). The nuclear ribosomal (nr) DNA unit was also assembled from NGS datasets. A Corallorhiza bulbosa nrDNA sequence was downloaded as a reference (KM390008). A complete nrDNA unit, except for the externally transcribed spacer regions, was recovered by reference-guided assembly using the Geneious assembler.
2.3. Phylogenetic analysis Forty-four plastome sequences (38 Orchidaceae plastomes and six outgroup plastomes) were downloaded from the NCBI for phylogenetic analysis (Supplementary Table S1). We chose these 38 previously sequenced Orchidaceae species (of the 130 species in the family) for several reasons: three species distantly related to Cymbidium and Dendrobium were selected; we also selected photosynthetic species over non-photosynthetic ones to represent the tribes because chloroplast genes have often been lost in many nonphotosynthetic species. Therefore, the 38 selected species represent the complete range of the major Orchidaceae lineages whose taxa have been decoded. Seventy-nine CDS and four rRNA regions were extracted from a total of 51 plastome sequences. Extracted regions were each aligned using MUSCLE v3.8.425 (Edgar, 2004). Alignments were concatenated to a length of 86,823 bp. The PAUP Modeltest (Posada and Crandall, 1998) within Geneious 6.1.8 was performed to elucidate the best nucleotide substitution model (GTR model). A maximum likelihood (ML) tree was constructed using RAxML BlackBox (Stamatakis et al., 2008) and 100 bootstrap replicates (-346508.438747
ML value). The same alignment and GTR substitution model were used to construct a Bayesian inference tree using MrBayes 3.2.6 (Huelsenbeck and Ronquist, 2001) implemented within Geneious 6.1.8. PartitionFinder v2.1.1 (Lanfear et al., 2012) was used to determine the best-fitted model for each alignment of CDS and rRNA (53 genes to the GTR estimated model, 22 genes to the HKY estimated model, three genes to the HKY all equal model, two genes to the TRN all equal model, and three genes to the TRN estimated model). All alignments were grouped by the five bet-fitted models. Partitioned data matrix was used to construct a Bayesian inference tree using MrBayes_CIPRES api (Miller et al., 2015) with a Markov chain Monte Carlo (MCMC) chain length of 1,000,000. A total of 414 nrITS sequences (389 sequences for Aeridinae, 10 sequences for Vandeae, and 15 sequences for Orchidaceae) were downloaded from NCBI (Supplementary Table S1). We selected only the sequences with voucher-specimen information. A total of 422 sequences (414 NCBI sequences, seven sequences from the present study, and additional Cymbidium macrorhizon sequences) were used in nrITS sequence alignment. Sequences were aligned using MAFFT (1,156 bp), and an ML tree (GTR substitution model) was generated using RAxML BlackBox (-20792.820555 ML value). A bootstrap value was imported into each tree node using TreeGraph v.2.14.0-771 (Stöver and Müller, 2010).
2.4. Time estimation A total of 83 genes (79 CDS and four rRNA) were used to estimate the divergence time of each node. Base substitution models were same as Bayesian Inference analysis. BEAUti v2 and BEAST v2.0 (Bouckaert et al., 2014) were used to estimate divergence times. Three fossil datasets, Asparagales (normal distribution, mean 105.3, sigma 8.0), Dendrobium (log-normal distribution, sigma 2.0, offset 23.2), and Goodyera (log-normal distribution,
sigma 2.0, offset 15.0) were used to calibrate the nodes (Conran et al., 2009; Gustafsson et al., 2010; Iles et al., 2015; Ramírez et al., 2007). The effective sample size (ESS) was evaluated using log files and Tracer v1.6.0. The coefficient of variation in our data set was 0.578. Therefore, the relaxed clock log normal model (Drummond et al., 2006) and Yule model were used to perform Markov chain Monte Carlo (MCMC) 50,000,000 times. Trees were collected for each 5,000 generations, and a total of 10,000 trees were obtained. Identical runs were performed to collect 20,000 trees. A total of 20,000 trees were combined using TreeAnnotator v.1.8.0 with a burn-in of 20% and a posterior probability of 0.9. A combined maximum clade credibility tree was annotated using the R packages ‘strap’ (Bell and Lloyd, 2015) and ‘phytools’ (Revell, 2012).
2.5. Identification of small inversions, SSRs, and large repeats Small inversions (SI) were confirmed after validation with the Mfold web server (Zuker, 2003) and Geneious 6.1.8. Ancestral state of SI was estimated 100 times and concatenated using the ‘phytools’ package in R for each SI site. Perfect simple sequence repeats (SSR)—with a minimum total length of 10 bp—were identified using Phobos v3.3.12 (Mayer, 2010). REPuter (Kurtz, 2001) was used to identify large forward and palindromic repeats with a minimum total length of 24 bp.
3. Results 3.1. General plastome features and gene contents Among the plastomes of the seven newly sequenced species of Aeridinae, the coverage of the four species sequenced with Miseq was 501–1045x and the coverage of the three
species sequenced with Hiseq was 335–347x. Table 1 shows information such as the number of raw, trimmed, and aligned reads and average read lengths and coverage depth. Table 2 shows other pieces of information such as the lengths of entire annotated plastomes, lengths by region, and GC content. Plastome sizes ranged from 146,183 bp for G. fuscopunctatus to 149,221 bp for T. japonicum, both of which are in the range of common angiosperm plastome sizes. No large differences were found in the lengths of parts such as LSC, SSC, or IR, and GC content remained inside of the small range of 36.1 to 36.8% (table 2). The seven Aeridinae plastomes exhibit quadripartite structures such as the LSC, IR (X2), and SSC, as with other typical Orchid plants. Although they are typical plastomes with no inversions or rearrangements, 14 small inversions (SI) were identified in 10 Aeridinae plastomes (Fig. 1). All 14 were found in areas that form hairpins composed of stems and loops; 12 SIs were located in the IGS and two in the intron (Fig. 2). With respect to parts, 12 SIs were distributed in the LSC and the other two in the IR and SSC, respectively (Fig. 1). Among these 14 main types of SIs, those in which base substitution occurred in part of the stem or loop were recorded as subtypes, and the dG values that indicate the stability of each hairpin are in Supplementary Table S2. The SI type substitution process is indicated on the phylogenetic tree (Fig. 3) to easily identify the trait distribution states and evolutionary processes of the 14 SIs. The number of genes in the Aeridinae chloroplast genome is similar to that of general plants, with the exception of the 11 genes that constitute the ndh gene complex (Supplementary Table S3). The plastomes of all species consist of 68 CDS, 30 tRNA, and four rRNA genes. Among these, 16 genes, including trnK-UUU and rps16, have one intron, while clpP and ycf3 have two introns. The phylogenetic tree of these 45 species (see the phylogenetic tree analysis results) and the presence or lack of ndh genes are shown in Figure 4.
None of the 11 ndh genes were complete in any of the 10 Aeridinae species; they were either lost or pseudogenized depending on the taxon. Four genes—Ndh A, F, H, and I—were lost in all taxa. Seven other genes were also lost or pseudogenized depending on the taxon. In the case of N. falcata, N. falcata cv, and N. richardsiana, ndhB, ndhD, ndhE, and ndhG were pseudogenized, and the remaining seven genes were lost. In the case of G. fuscopunctatus, three genes were pseudogenized and eight were lost; in G. japonicus, five genes were pseudogenized and six were lost; in P. scolopendrifolia, five genes were pseudogenized and the remaining six genes were lost; and in T. japonicum, 10 genes were lost and only ndhB remained as a pseudogene (Fig. 4).
3.2. IR/SSC boundary and sequence divergence among plastomes To investigate the degree of IR contraction/extension, LSC-IR and IR-SSC junction sites were compared across 10 Aeridinae species. In the three Phalaenopsis species published previously, the ycf1 genes were located in the SSC; however, in the seven species decoded in the present study, the ycf1 genes were located across the SSC/IR-A region. Therefore, pseudogene segments of 60–163 bp were found on the opposite IR-B (Fig. 5). The rpl2 genes were located across the LSC/IR-B region. The IR portion of the rpl2 gene was 56 pb in Thrixspermum, but 31 bp in the other nine taxa. The distance from the IR-A/LSC boundary to psbA was 96 bp in Gastrochilus, Neofinetia, and Pelatantheria, 123 bp in three Phalaenopsis species, and 159 bp in Thrixspermum. The number of SSRs and the regions of distribution were analyzed to elucidate allied species or intra-species variations. Firstly, we analyzed regions in which mono-, di-, tri-, tetra-, and penta-SSR appeared over at least 10 bp. G. fuscopunctatus had a total of 56 SSRs and G. japonicus had a total of 61 SSRs. N. falcata and N. falcata ‘CheongSan’ each had a total of 57
SSRs and N. richardsiana had a total of 58 SSRs. P. scolopendrifolia had a total of 71 SSRs and T. japonicum had a total of 58 SSRs (Fig. 6A). Most SSRs were located in the IGS, with a few located in CDS or introns. Among these, intron SSRs were mainly located in the LSC region, whereas CDS SSRs were mainly located in the SSC region (Fig. 6B). Data that cannot be easily shown in Figure 6 are organized in Supplementary Table S4. A total of 29 large repeats over 20 bp were found (table 3), with exact lengths ranging from 24 to 56 bp. The repeat shared by the largest number of species was a 34 bp long and found in seven species in the ycf2 region. Large repeats located in the CDS region were found in accD, ccsA, ycf1, and ycf2, and one large repeat was found in trnS (GCU).
3.3 Phylogenetic relationships and origin times of Vandeae and the major Orchidaceae groups The five subfamilies of Orchidaceae formed monophyletic group, respectively and showed associative relationships within each subfamily (Figure 7). Each node was supported by a bootstrap value of 100% and a BI value of 1.0. Within Epidendroideae, each of eight tribes used in this study formed monophyletic group, respectively. The clade consists of Vandeae, Cymbidieae, and Epidendreae was monophyletic group and supported by 99% bootstrap value and 0.99 BI value. In addition, the monophyly of each of the three tribes was supported by 100% bootstrap value, respectively. They show the (Vandeae (Cymbidieae, Epidendreae)) relationship. The Cymbidieae-Epidendreae clade was supported by 69% bootstrap value and 0.95 BI value. Within Vandeae, the five genera form the clade of (Phalaenopsis (Thrixspermum (Neofinetia (Pelatantheria, Gastrochilus)))), and all internal nodes were supported by bootstrap values of 95% or 100% and BI value of 1.0.
In the present study, whole nrRNA repeating unit (18S-ITS1-5.8S-ITS2-28S) sequences were identified in all seven taxa for which plastomes were identified; however, since the sequences of all other taxa for comparison existed only in the ITS region, only ITS sequences were used to prepare the phylogenetic tree. Since the number of taxa was too large to show the entire phylogenetic tree, only the nodes with four genera that include the selected species of the present study are enlarged (Fig. 8). The ML phylogenetic tree shows bootstrap support values higher than 70% (Supplementary Fig. S1). Thrixspermum japonicum formed a monophyly with the T. saruwatarii–T. laurisilvaticum clade (Fig. 8A), and Neofinetia richardsiana was nested within a N. falcata clade (Fig. 8B). In the case of the two Gastrochilus species analyzed, G. fuscopunctatus formed a sister group with the G. formosanus–G. raraensis clade and G. japonicus formed a sister group with the G. obliquus–G.dasypogon clade (Fig. 8C). The Pelatanthera scolopendrifolia analyzed in the present study was clustered with another sequence of the same species (Fig. 8D). Divergence time was estimated for each internal node of the phylogenetic tree (Fig. 9). This is a Bayesian tree and is relatively similar to the ML tree in Figure 7. Orchidaceae was inferred to have originated at 101.82 (88.03-117.00) Mya, the five subfamilies at 68.93–39.75 Mya; Apostasioideae at 68.93 (55.56-85.54) Mya, Vanilloideae at 59.33 (44.66-73.73) Mya, Cypripedioideae at 52.92 (42.87-66.05) Mya, Orchidoideae at 44.65 (36.22-55.76) Mya, Epidendroideae at 44.65 (36.22-55.76) Mya, and Vandeae at 25.31 Mya (table 4).
4. Discussion 4.1. Evolution of small inversions in Aeridinae The 14 SIs were plotted on the phylogenetic tree to infer evolutionary direction and, consequently, their usefulness as molecular markers (Fig. 3). The three Neofinetia taxa
commonly lacked SI 10, with more distinguished characters than other genera of Aeridinae. All the remaining taxa were distinguished with respect to SI distribution patterns; in particular, two species of Gastrochilus and two Phalaenopsis taxa were easily distinguished; but the varieties of Phalaenopsis could not be distinguished. The present study is meaningful in that it uses SI character to show the usefulness as markers to judge allied taxa or hybrid parentage. Since all SIs are located in introns or IGS regions, primers can be easily prepared using the conserved exons on both sides or the base sequences of the coding regions to identify taxa using PCR sequencing. However, since SIs can easily occur due to flip-flop mutations (Rogalski et al., 2006), the taxa level at which the SI markers can be used has not yet been strictly evaluated. Nevertheless, although the SI markers gave limited information about the 10 species of Aeridinae, they seemed to be useful classifying species, varieties, or genera. SI markers are considered particularly useful for tracking maternal lineages of orchids cultivated for horticulture, since many of these orchids have interspecific or intergeneric hybrid origins (Rowley, 1982). In addition, as shown in Figure 3, if SIs in all plastome regions are combined and used, then SIs should be useful in studies regarding phylogenetic relationships among allied genera.
4.2. Evolution of ndh genes in Aeridinae The plastomes of the 10 Aeridinae taxa were unique in that all 11 ndh genes were pseudogenized or deleted (Fig. 4). Nothing in common could be found between these the number of plastomes decoded was small compared to all those in Orchidaceae and the phenomenon was sporadic throughout the lineages. Since the IR contributes to the structural stability of plastomes in plants (Palmer and Thompson, 1982), the ndhB genes present in the IR are thought to be the most stable among
the 11 ndh families and are predicted to have been lost or pseudogenized most recently. However, numerous stop codons in ndhB, ranging from 17 to 25 in number, were found in the 10 studied species. Moreover, ndhB genes were identified as pseudogenes due to a frameshift mutation following sequence deletion or duplication. Among these, the ndhB gene in Pelatanthera scolopendrifolia was identified to contain only exon 1, plus many stop codons. Therefore, it seems impossible to restore gene function for ndhB pseudogenes by RNA editing in any species. The ndhD and ndhG genes present in the SSC disappeared in T. japonicum, while existing as pseudogenes in the remaining nine taxa (Fig. 4). In the case of the ndhD pseudogenes, large deletions of 384–547 bp were observed, depending on the species; thus, restoration of function by RNA editing would also be impossible. In the case of the ndhG pseudogenes, although only one internal stop codon exists in Phalaenopsis, three to eight internal stop codons exist in other taxa; therefore, if numerous assumptions are made, restoration of function by RNA editing may be possible. The ndhK-ndhJ-ndhC cistron present in the LSC was found only in G. japonicus and P. scolopendrifolia, but contained many indels; thus, function restoration by RNA editing is impossible for the remaining species. Even if it is assumed that a subunit’s function can be restored with RNA editing, since ndh gene complexes can function as enzymes only when all the other subunits are present, the overall restoration of ndh gene complex function by plastid RNA editing can be considered impossible. However, pseudogenized ndh genes are reported to have gained their function back through RNA editing in ferns such as Adiantum capillus-veneris (Wolf et al., 2004). In this case, it was shown that the functions of the ndhB, ndhG, and ndhH genes could be restored through RNA editing one to five times; however, as mentioned above in Aeridinae orchids, only the functions of the ndhG gene are restorable by RNA editing and only in limited species. ndh protein complexes are essential for photosynthesis, so it is very likely that these
genes were actually transferred from the plastome to the nucleus or mitochondria. However, recent sequencing of Phalaenopsis equestris showed that neither the nuclear nor mitochondrial genomes contain plastid ndh genes (Kim et al., 2015; Lin et al., 2017, 2015; Liu, 2015). Nevertheless, the possibility remains that homologous plastid ndh genes have not been verified since the sequence of the nuclear genome is incomplete. Another possibility is that other genes in the nucleus, or mitochondrial ndh genes, may have replaced the functions of the plastid ndh gene; however, no studies regarding this issue have been conducted. There is a tendency towards losing certain plastome genes in non-photosynthetic species (Graham et al., 2017; Lam et al., 2018; Wicke et al., 2011). Among these, ndh genes are known to have been lost first in Corallorhiza and Neottia, in which full and partial mycoheterotrophs and photosynthetic taxa coexist (Barrett et al., 2018; Feng et al., 2016; Logacheva et al., 2011). In addition, with respect to photosynthetic taxa, ndh genes have been lost in orchids such as Oberonia (Kim et al., 2017), Paphiopedilum (Hou et al., 2018; Lin et al., 2015), Phalaenopsis (Chang et al., 2006; Jheng et al., 2012; Kim et al., 2016), and Phragmipedium (Kim et al., 2015). It has also been reported that ndh genes are often lost in taxa other than Orchidaceae. In particular, the loss of ndh genes has been reported in Cuscuta, Pinus, etc., but there is no explanation regarding alternative functional restoration in such cases either. In the case of photosynthetic orchids, given that ndh gene indels appear more frequently in epiphytic species than terrestrial ones, it can be inferred that the loss of ndh genes may be part of a molecular preadaptation that leads to evolving into a new life form. This inference is also consistent with the fact that ndh genes are lost first from mycoheterotroph plastomes (Graham et al., 2017; Lam et al., 2018). In addition, the loss of ndh genes appears to be more widely distributed in Epidendroideae than in Orchidoideae, a fact that requires further in-depth study. According to the present study’s findings, the loss or pseudogenization of ndh genes in plastomes is a common phenomenon in most epiphytic
Aeridinae.
4.3. Evolution of the LSC/IR/SSC boundaries and hot spots in Aeridinae The sizes and structures of the 10 Aerdinae plastomes were similar to those of general angiosperms or photosynthetic orchids; however, the lengths of their IR-SC boundaries were slightly different (Fig. 5). This characteristic is attributable to the fact that, in the case of Phalaenopsis, the ycf1 genes are located in the SSC, which diverges first in the phylogenetic tree, while in the remaining seven species that form one clade, the ycf1 genes are located across the SSC-IR and are short pseudogenes approximately 60–156 bp long. To identify whether this phenomenon is genus-specific to orchid plastomes, the plastome sequences of Cymbidium and Dendrobium, many of which have been decoded, were evaluated. The results showed no phylogenetic tendencies, even though the distances from rpl22 and psbA to the beginning part of the LSC varied greatly within the same genus. Aeridinae consists of many natural species and numerous cultivars that are derived from the interspecific or intergeneric hybrids such as Phalaenopsis and Vanda; therefore, highly variable plastome markers such as SSRs, SIs, and large repeats may be useful for identifying closely related species. Plastome SSRs are useful markers for showing high polymorphism (Cato and Richardson, 1996; Tauta, 1989); therefore, the SSRs found in plastomes have also been proposed as markers to identify allied species or varieties (Yi and Kim, 2012). The SIs are also located on non-coding sequences, which have been proposed to be helpful for identifying species or related taxa and reconstructing phylogenies (Kim and Lee, 2004). These non-coding markers can be used not only to identify allied taxa, but also to evaluate the genetic diversity of endangered plants. Moreover, these markers can also be used to measure the genetic diversity of wild populations or species to determine their risk of
extinction; this way, they can be protected in advance or their genetic diversity can be analyzed to effectively proliferate populations for restoration.
4.4 Origin of the subtribe Aeridinae and the major groups of Orchidaceae using whole plastomes The ML tree prepared based on the sequences of 82 plastome genes showed that each of the five Orchidaceae subfamilies formed monophyletic groups and diverged in the order of Apostasioideae, Vanilloideae, Cypripedioideae, Orchioideae, and Epidendroideae. Each these major nodes were supported by a bootstrap value of 100% and a BP value of 1.0. These results are in accordance with those of the Orchidaceae phylogeny obtained using the sequences of three genes: atpB, psaB, and rbcL (Givnish et al., 2015). However, our results are different from a previous study using two genes—matK and rbcL, which indicated that Cypripedioideae diverged first, followed by Vanilloideae (Gustafsson et al., 2010). These differences may be due to the length and variation of the sequences used and taxon sampling and phylogenetic tree reconstruction methods. Since there are more taxa in used data matrix, we believe that the tree in the present study using 82 genes is more reliable than those in the previous studies that used two or three genes. The results of the time of origin estimations for Orchidaceae and its five subfamilies are in Figure 9 and Table 4. In the present study, the time of origin for Orchidaceae was estimated to be 101.82 (88.03-117.00) Mya, which is in accordance with previous studies (Givnish et al., 2015; Gustafsson et al., 2010). The divergence times of the five subfamilies were estimated to be 68.93 (55.56-85.54) mya for Apostasioideae, 59.33 (44.66-73.73) mya for Vanilloideae, 52.92 (42.87-55.76) mya for Cypripedioideae, and 44.65 (36.22-55.76) for Orchioideae–Epidendroideae (table 4), which are more recent than those described in the
same previous studies (Givnish et al., 2015; Gustafsson et al., 2010). In particular, the divergence time of Orchioideae–Epidendroideae, estimated to be 44.65 Mya, is approximately 15 Mya more recent than described in the two previous studies (Givnish et al., 2015; Gustafsson et al., 2010). These differences can be attributed to the data used to construct the phylogenetic tree, and/or chronological estimation methodologies. The present study used the same fossil data as the previous studies; however, the branch lengths of several internal nodes may be different from those in previous studies, since the entire chloroplast genome was used in the present study rather than two or three genes. The ML tree shows a (Vandeae (Cymbidieae, Epidendreae)) clade. The sister group relationship between Cymbidieae and Epidendreae was relatively weakly supported by a BP value of 0.69 and a BI value of 0.95, and its internal node length was much shorter than other nodes. Previous studies suggested the differences between ML and BI topologies for the three tribes (Givnish et al., 2015; Gustafsson et al., 2010). But, our tree supports one of their results even though the support values are relatively not strong. The divergence time of Vandeae was estimated to be 25.31 (21.17-31.05) Mya in the present study, which is 8–10 Mya more recent than 32.67–35.00 Mya, as estimated in previous studies (Givnish et al., 2015; Gustafsson et al., 2010). However, these results are similar to those of a study conducted using nrITS, matK, and ycf1, which showed that a Vandeae group, including Campylocentrum, Dendrophylax, Phalaenopsis, and Angraecum, diverged at 24.0 (Stem age)–20.0 (Crown age) Mya (Pessoa et al., 2018). In addition, the time at which Phalaenopsis and other genera diverged was estimated to be 14.11 Mya (table 4), which is approximately 4.5 Mya older than that estimated in previous studies (Givnish et al., 2015; Gustafsson et al., 2010). This difference may be due to the fact that the researchers used different DNA markers and taxa. Moreover, there only appear to be differences when looking at the median values, but these differences are within the error range.
4.5. Phylogenetic positions of the five endangered taxa in Aeridinae The ITS sequence tree of 422 Aeridinae taxa (Fig. 8, Supplementary Fig. S1) is the most comprehensive phylogenetic tree estimated thus far. It elucidates phylogenetic positions and taxonomic statuses of the endangered species used in the present study. The tree constructed using all the plastome genes shows that the five Aeridinae genera diverged 7.9713.75 Mya (Fig. 9). The five genera consist of 298 species, and the fact that the speciation progressed over a relatively short period of time has also been presented in previous studies (Freudenstein and Chase, 2015; Givnish et al., 2015; Hidayat et al., 2012; Zou et al., 2015). The nrITS phylogenetic tree showed some differences in the intergeneric and interspecific relationships compared to previous chloroplast phylogenetic trees (Fig. 8, Supplementary Fig. S1). These differences are thought to be because the nrITS tree in the present study has more species. Aeridinae formed a monophyly with four genera, including endangered species, positioned in four different clades. Thrixspermum is a paraphyletic group containing Cleisomeria, Dimorphorchis, Microsaccus, Robiquetia, Sarcohilus, etc., and appears in two clades. Some Thrixspermum species are mixed with Dimorphoorchis species. T. japonicum formed a monophyletic group with the T. saruwatarii–T. laurisilvaticum clade (Fig. 8A). Therefore, detailed phylogenetic studies on all six genera are necessary to reveal the phylogenetic relationships among these genera and species. Neofinetia richardsiana was included in N. falcata (Fig. 8B). Neofinetia is a small genus of only three species that is widely grown for ornamental purposes. N. richardsiana is characterized by the absence of floral spurs and a cultivated individual was recorded as the type specimen (Christenson, 1996). It is not known as a wild species, and growers have long
recognized it as a variety of N. falcata. The N. richardsiana ITS sequence used in the present study belonged to the variation range of N. falcata. The plastome of N. falcata is 146,491 bp long and that of N. richardsiana is 146,497 bp; they differ in only seven places (four base substitutions and three indels). Furthermore, all these variations were observed in the IGS region, not in the CDS (Supplementary Fig. S3). Neofinetia falcata ‘CheongSan’ has very short thick leaves compared to the type species. Its ITS is the same as the type species, and its plastome is 146,498 bp, in which differences in seven places (five base substitutions and two indels) were observed. However, it only has a difference in length of 1 bp, and only one 1 bp substitution and one 2 bp indel. In addition, N. richardsiana and N. falcata ‘CheongSan’ share four base substitutions and two indels, indicating that they were derived from a similar stock of N. falcata. These small differences observed in the whole plastomes of Neofinetia are within a range that can be judged as intra-species variation or NGS sequencing errors. Therefore, it is more reasonable to treat N. richardsiana as an N. falcata-derived variety rather than an independent species, given not only the ITS tree but also the similarities between their whole plastomes. The genus Gastrochilus formed a monophyetic group in the ITS tree. However, the two endangered species distributed within Korea appear to have originated from different lineages, and their area of distribution appears to have been extended to the southern part of the Korean Peninsula. For instance, G. fuscopunctatus formed a sister group with the G. formosanus–G. raraensis clade, and G. japonicus formed a sister group with the G. obliquus–G.dasypogon clade (Fig. 8C). In a simple comparison of the plastomes, a difference in length can be seen between the two species, where G. japonicus is 147,697 bp long and G. fuscopunctatus is 146,183 bp (table 2). This 1,514-bp difference appears to be a result of variations scattered across many regions rather than those concentrated in any one area of the plastome. The difference between the two species of Gastrochilus is very large compared with the 6-bp
difference between N. falcata and N. richardsiana. Pelatanthera scolopendrifolia was clustered with another accessed sequence of the same species (Fig. 8D); however, the ITS tree suggests that this genus is polyphyletic. Pelatanthera was included in Cleisostoma, which is a more inclusive group, and appeared in two different major clades. Of these, the larger clade included Schoenorchis. Pelatanthera scolopendrifolia was positioned independent of this large clade. The ITS tree suggests that the recent transposition of this species from Cleisostoma scolopendrifolia to Pelatanthera scolopendrifolia is invalid. More comprehensive phylogenetic studies of the Cleisostoma, Pelatanthera, and Schoenorchis species should be conducted to clarify these relationships.
Acknowledgments We thank to two anonymous reviewers for their helpful comments to improving the manuscript. We also thank to Mr. Noah Last for English editing of manuscript. This work was supported by the National Research Foundation of Korea (NRF) under grant no. NRF2015M3A9B8030588 to KJK and by the National Institute of Biological Resources (NIBR) under the genetic diversity research program (2016, 2018) for endangered species to KJK and MK.
Appendix A. Supplementary material Supplementary data to this article can be found online.
References
Barrett, C.F., Wicke, S., Sass, C., 2018. Dense infraspecific sampling reveals rapid and independent trajectories of plastome degradation in a heterotrophic orchid complex. New Phytol. 218, 1192–1204. https://doi.org/10.1111/nph.15072 Bell, M.A., Lloyd, G.T., 2015. Strap: an R package for plotting phylogenies against stratigraphy and assessing their stratigraphic congruence. Palaeontology 58, 379–389. https://doi.org/10.1111/pala.12142 Bouckaert, R., Heled, J., Kühnert, D., Vaughan, T., Wu, C.H., Xie, D., Suchard, M.A., Rambaut, A., Drummond, A.J., 2014. BEAST 2: a software platform for Bayesian evolutionary analysis. PLoS Comput. Biol. 10, 1–6. https://doi.org/10.1371/journal.pcbi.1003537 Carlsward, B.S., Whitten, W.M., Williams, N.H., Bytebier, B., 2006. Molecular phylogenetics of Vandeae (Orchidaceae) and the evolution of leaflessness. Am. J. Bot. 93, 770–786. https://doi.org/10.3732/ajb.93.5.770 Cato, S.A., Richardson, T.E., 1996. Inter- and intraspecific polymorphism at chloroplast SSR loci and the inheritance of plastids in Pinus radiata D. Don. Theor. Appl. Genet. 93, 587–592. https://doi.org/10.1007/BF00417952 Chang, C.C., Lin, H.C., Lin, I.P., Chow, T.Y., Chen, H.H., Chen, W.H., Cheng, C.H., Lin, C.Y., Liu, S.M., Chang, C.C., Chaw, S.M., 2006. The chloroplast genome of Phalaenopsis aphrodite (Orchidaceae): comparative analysis of evolutionary rate with that of grasses and its phylogenetic implications. Mol. Biol. Evol. 23, 279–291. https://doi.org/10.1093/molbev/msj029 Chase, M.W., Cameron, K.M., Freudenstein, J. V., Pridgeon, A.M., Salazar, G., van den Berg, C., Schuiteman, A., 2015. An updated classification of Orchidaceae. Bot. J. Linn. Soc. 177, 151–174. https://doi.org/10.1111/boj.12234
Chen, X.Q., Liu, Z.J., Zhu, G.H., Lang, K.Y., Ji, Z.H., Luo, Y.B., Jin, X.H., Cribb, P.J., Wood, J.J., Gale, S.W., 2009. Orchidaceae. in: Chen, X.Q., Wood, J.J. (Eds.), Flora of China. Science Press, Beijing. Christenhusz, M.J.M., Byng, J.W., 2016. The number of known plants species in the world and its annual increase. Phytotaxa 261, 201–217. https://doi.org/10.11646/phytotaxa.261.3.1 Christenson, E.A., 1996. A new species of Neofinetia from China and northern Korea (Orchidaceae: Aeridinae). Lindleyana 11, 220–221. Conran, J.G., Bannister, J.M., Lee, D.E., 2009. Earliest orchid macrofossils: early Miocene Dendrobium and Earina (Orchidaceae: Epidendroideae) from New Zealand. Am. J. Bot. 96, 466–474. https://doi.org/10.3732/ajb.0800269 Drummond, A.J., Ho, S.Y.W., Phillips, M.J., Rambaut, A., 2006. Relaxed phylogenetics and dating with confidence. PLoS Biol. 4, 699–710. https://doi.org/10.1371/journal.pbio.0040088 Edgar, R.C., 2004. MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 32, 1792–1797. https://doi.org/10.1093/nar/gkh340 Fan, J., Qin, H.N., Li, D.Z., Jin, X.H., 2009. Molecular phylogeny and biogeography of Holcoglossum (Orchidaceae: Aeridinae) based on nuclear ITS, and chloroplast trnL-F and matK. Taxon 58, 849–861. https://doi.org/10.2307/27756950 Feng, Y.L., Wicke, S., Li, J.W., Han, Y., Lin, C.S., Li, D.Z., Zhou, T.T., Huang, W.C., Huang, L.Q., Jin, X.H., 2016. Lineage-specific reductions of plastid genomes in an orchid tribe with partially and fully mycoheterotrophic species. Genome Biol. Evol. 8, 2164–2175. https://doi.org/10.1093/gbe/evw144
Freudenstein, J. V., Chase, M.W., 2015. Phylogenetic relationships in Epidendroideae (Orchidaceae), one of the great flowering plant radiations: progressive specialization and diversification. Ann. Bot. 115, 665–681. https://doi.org/10.1093/aob/mcu253 Gardiner, L.M., Kocyan, A., Motes, M., Roberts, D.L., Emerson, B.C., 2013. Molecular phylogenetics of Vanda and related genera (Orchidaceae). Bot. J. Linn. Soc. 173, 549– 572. https://doi.org/10.1111/boj.12102 Givnish, T.J., Spalink, D., Ames, M., Lyon, S.P., Hunter, S.J., Zuluaga, A., Iles, W.J.D., Clements, M.A., Arroyo, M.T.K., Leebens-Mack, J., Endara, L., Kriebel, R., Neubig, K.M., Whitten, W.M., Williams, N.H., Cameron, K.M., 2015. Orchid phylogenomics and multiple drivers of their extraordinary diversification. Proc. R. Soc. B Biol. Sci. 282. https://doi.org/10.1098/rspb.2015.1553 Graham, S.W., Lam, V.K.Y., Merckx, V.S.F.T., 2017. Plastomes on the edge: the evolutionary breakdown of mycoheterotroph plastid genomes. New Phytol. 214, 48–55. https://doi.org/10.1111/nph.14398 Gustafsson, A.L.S., Verola, C.F., Antonelli, A., 2010. Reassessing the temporal evolution of orchids with new fossils and a Bayesian relaxed clock, with implications for the diversification of the rare South American genus Hoffmannseggella (Orchidaceae: Epidendroideae). BMC Evol. Biol. 10. https://doi.org/10.1186/1471-2148-10-177 Hidayat, T., Weston, P.H., Yukawa, T., Ito, M., Rice, R., 2012. Phylogeny of subtribe aeridinae (orchidaceae) inferred from DNA sequences data: advanced analyses including Australasian genera. J. Teknol. (Sciences Eng. 59, 87–95. https://doi.org/10.11113/jt.v59.1591 Hou, N., Wang, G., Zhu, Y., Wang, L., Xu, J., 2018. The complete chloroplast genome of the rare and endangered herb Paphiopedilum dianthum (Asparagales: Orchidaceae). Conserv.
Genet. Resour. 10, 709–712. Huelsenbeck, J.P., Ronquist, F., 2001. MrByes: Bayesian inference of phylogenetic trees. Bioinformatics 17, 754–755. https://doi.org/10.1093/bioinformatics/17.8.754 Iles, W.J.D., Smith, S.Y., Gandolfo, M.A., Graham, S.W., 2015. Monocot fossils suitable for molecular dating analyses. Bot. J. Linn. Soc. 178, 346–374. https://doi.org/10.1111/boj.12233 Jheng, C.F., Chen, T.C., Lin, J.Y., Chen, T.C., Wu, W.L., Chang, C.C., 2012. The comparative chloroplast genomic analysis of photosynthetic orchids and developing DNA markers to distinguish Phalaenopsis orchids. Plant Sci. 190, 62–73. https://doi.org/10.1016/j.plantsci.2012.04.001 Jo, S., Kim, Y.-K., Cheon, S.-H., Fan, Q., Kim, K.-J. 2019. Characterization of 20 complete plastomes from the tribe Laureae (Lauraceae) and distribution of small inversions. Plos One 14. https://doi.org/10.1371/journal.pone.0224622 Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S., Cooper, A., Markowitz, S., Duran, C., Thierer, T., Ashton, B., Meintjes, P., Drummond, A., 2012. Geneious basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647–1649. https://doi.org/10.1093/bioinformatics/bts199 Kelchner, S.A., Wendel, J.F., 1996. Hairpins create minute inversions in non coding regions of chloroplast DNA. Curr. Genet. 30, 259–262. https://doi.org/10.1007/s002940050130 Kim, G.B., Kwon, Y., Yu, H.J., Lim, K.B., Seo, J.H., Mun, J.H., 2016. The complete chloroplast genome of Phalaenopsis “Tiny star”. Mitochondrial DNA 27, 1300–1302. https://doi.org/10.3109/19401736.2014.945566
Kim, H.T., Kim, J.S., Moore, M.J., Neubig, K.M., Williams, N.H., Whitten, W.M., Kim, J.H., 2015. Seven new complete plastome sequences reveal rampant independent loss of the ndh gene family across orchids and associated instability of the inverted repeat/small single-copy region boundaries. PLoS One 10. https://doi.org/10.1371/journal.pone.0142215 Kim, K.-J., Lee, H.-L., 2005. Widespread occurrence of small inversions in the chloroplast genomes of land plants. Mol. Cells 19, 104–13. Kim, K.-J., Lee, H., 2004. Complete chloroplast genome sequences from Korean ginseng (Panax ginseng Nees) and comparative analysis of sequence evolution among 17 vascular plants. DNA Res. 11, 247–61. https://doi.org/10.1093/dnares/11.4.247 Kim, Y.-K., Kwak, M.H., Chung, M.G., Kim, H.-W., Jo, S., Sohn, J.-Y., Cheon, S.-H., Kim, K.-J., 2017. The complete plastome sequence of the endangered orchid Oberonia japonica (Orchidaceae). Mitochondrial DNA Part B 2, 711–713. https://doi.org/10.1080/23802359.2017.1390409 Kim, Y.-K., Jo, S., Cheon, S.-H., Joo, M.-J., Hong, J.-R., Kwak, M.H., Kim, K.-J., 2019. Extensive losses of photosynthesis genes in the plastome of a mycoheterotrophic orchid, Cyrtosia septentrionalis (Vanilloideae: Orchidaceae). Genome Biol. Evol. 11, 565-571. https://doi.org/10.1093/gbe/evz024 Kocyan, A., Vogel, E.F. d., Conti, E., Gravendeel, B., 2008. Molecular phylogeny of Aerides (Orchidaceae) based on one nuclear and two plastid markers: A step forward in understanding the evolution of the Aeridinae. Mol. Phylogenet. Evol. 48, 422–443. https://doi.org/10.1016/j.ympev.2008.02.017 Kurtz, S., 2001. REPuter: the manifold applications of repeat analysis on a genomic scale. Nucleic Acids Res. 29, 4633–4642. https://doi.org/10.1093/nar/29.22.4633
Lam, V.K.Y., Darby, H., Merckx, V.S.F.T., Lim, G., Yukawa, T., Neubig, K.M., Abbott, J.R., Beatty, G.E., Provan, J., Soto Gomez, M., Graham, S.W., 2018. Phylogenomic inference in extremis: a case study with mycoheterotroph plastomes. Am. J. Bot. 105, 480–494. https://doi.org/10.1002/ajb2.1070 Lanfear, R., Calcott, B., Ho, S.Y.W., Guindon, S., 2012. PartitionFinder: combined selection of partitioning schemes and substitution models for phylogenetic analyses. Mol. Biol. Evol. 29, 1695–1701. https://doi.org/10.1093/molbev/mss020 Lee, N.S., 2011. Illustrated flora of Korean orchids. Ewha Womans Univ. Press, Seoul. (in Korean). Lee, N.S., Lee, W.B., Choi, B.H., Tae, K.H., 2007: Orchidaceae Juss. in: Park, C.W. (Ed.), The genera of vascular plants of Korea. Academic Publishing, Seoul. Lin, C.S., Chen, J.J.W., Chiu, C.C., Hsiao, H.C.W., Yang, C.J., Jin, X.H., Leebens-Mack, J., de Pamphilis, C.W., Huang, Y.T., Yang, L.H., Chang, W.J., Kui, L., Wong, G.K.S., Hu, J.M., Wang, W., Shih, M.C., 2017. Concomitant loss of ndh complex-related genes within chloroplast and nuclear genomes in some orchids. Plant J. 90, 994–1006. https://doi.org/10.1111/tpj.13525 Lin, C.S., Chen, J.J.W., Huang, Y.T., Chan, M.T., Daniell, H., Chang, W.J., Hsu, C.T., Liao, D.C., Wu, F.H., Lin, S.Y., Liao, C.F., Deyholos, M.K., Wong, G.K.S., Albert, V.A., Chou, M.L., Chen, C.Y., Shih, M.C., 2015. The location and translocation of ndh genes of chloroplast origin in the Orchidaceae family. Sci. Rep. 5, 1–10. https://doi.org/10.1038/srep09040 Liu, Z.J., 2015. The genome sequence of the orchid Phalaenopsis equestris. Nat. Genet. 47, 65–72. https://doi.org/10.1038/ng.3149
Liu, Z.J., Chen, L.J., Chen, S.C., Cai, J., Tsai, W.C., Hsiao, Y.Y., Rao, W.H., Ma, X.Y., Zhang, G.Q., 2011. Paraholcoglossum and Tsiorchis, two new orchid genera established by molecular and morphological analyses of the Holcoglossum alliance. PLoS One 6. https://doi.org/10.1371/journal.pone.0024864 Logacheva, M.D., Schelkunov, M.I., Penin, A.A., 2011. Sequencing and analysis of plastid genome in mycoheterotrophic orchid Neottia nidus-avis. Genome Biol. Evol. 3, 1296– 1303. https://doi.org/10.1093/gbe/evr102 Lohse, M., Drechsel, O., Bock, R., 2007. OrganellarGenomeDRAW (OGDRAW): a tool for the easy generation of high-quality custom graphical maps of plastid and mitochondrial genomes. Curr. Genet. 52, 267–274. https://doi.org/10.1007/s00294-007-0161-y Lowe, T.M., Chan, P.P., 2016. tRNAscan-SE On-line: integrating search and context for analysis of transfer RNA genes. Nucleic Acids Res. 44, W54–W57. https://doi.org/10.1093/nar/gkw413 Mayer, C., 2010. Phobos Version 3.3. 12. A tandem repeat search Progr. 20. Miller, M.A., Schwartz, T., Pickett, B.E., He, S., Klem, E.B., Scheuermann, R.H., Passarotti, M., Kaufman, S., O'Leary, M.A. 2015. A RESTful API for access to phylogenetic tools via the CIPRES science gateway. Evol. Bioinforma. 11, EBO-S21501. doi: 10.4137/EBO.S21501 Padolina, J., Linder, C.R., Simpson, B.B., 2005. A phylogeny of Phalaenopsis using multiple chloroplast markers. Selbyana 23–27. Palmer, J.D., Thompson, W.F., 1982. Chloroplast DNA rearrangements are more frequent when a large inverted repeat sequence is lost. Cell 29, 537–550. https://doi.org/10.1016/0092-8674(82)90170-2
Pessoa, E.M., Viruel, J., Alves, M., Bogarín, D., Whitten, W.M., Chase, M.W., 2018. Evolutionary history and systematics of Campylocentrum (Orchidaceae: Vandeae: Angraecinae): a phylogenetic and biogeographical approach. Bot. J. Linn. Soc. 186, 158–178. https://doi.org/10.1093/botlinnean/box089 Posada, D., Crandall, K.A., 1998. Modeltest: testing the model of DNA substitution. Bioinformatics 14, 817–818. https://doi.org/10.1093/bioinformatics/14.9.817 Pridgeon, A.M., Cribb, P.J., Chase, M.W., Rasmussen, F.N., 2014. Genera Orchidacearum vol. 6, Epidendroideae, Part 3. Oxford University Press Inc., New York.Ramírez, S.R., Gravendeel, B., Singer, R.B., Marshall, C.R., Pierce, N.E., 2007. Dating the origin of the Orchidaceae from a fossil orchid with its pollinator. Nature 448, 1042–1045. https://doi.org/10.1038/nature06039 Revell, L.J., 2012. phytools: An R package for phylogenetic comparative biology (and other things). Methods Ecol. Evol. 3, 217–223. https://doi.org/10.1111/j.2041210X.2011.00169.x Rogalski, M., Ruf, S., Bock, R., 2006. Tobacco plastid ribosomal protein s18 is essential for cell survival. Nucleic Acids Res. 34, 4537–4545. https://doi.org/10.1093/nar/gkl634 Rowley, G.D., 1982. Intergeneric hybrids in succulents. Natl. Cactus Succul. J. 37, 45–49. Stamatakis, A., Hoover, P., Rougemont, J., 2008. A rapid bootstrap algorithm for the RAxML web servers. Syst. Biol. 57, 758–771. https://doi.org/10.1080/10635150802429642 Stöver, B.C., Müller, K.F., 2010. TreeGraph 2: combining and visualizing evidence from different phylogenetic analyses. BMC Bioinformatics 11, 7. Swarts, N.D., Dixon, K.W., 2009. Terrestrial orchid conservation in the age of extinction. Ann. Bot. 104, 543–556. https://doi.org/10.1093/aob/mcp025
Tauta, D., 1989. Hypervariablity of simple sequences as a general source for polymorphic DNA marks. Nucleic Acids Res 17, 6463–6471. Topik, H., Yukawa, T., Ito, M., 2005. Molecular phylogenetics of subtribe Aeridinae (Orchidaceae): Insights from plastid matK and nuclear ribosomal ITS sequences. J. Plant Res. 118, 271–284. https://doi.org/10.1007/s10265-005-0217-3 Tsai, C.C., Chiang, Y.C., Huang, S.C., Chen, C.H., Chou, C.H., 2010. Molecular phylogeny of Phalaenopsis Blume (Orchidaceae) on the basis of plastid and nuclear DNA. Plant Syst. Evol. 288, 77–98. https://doi.org/10.1007/s00606-010-0314-1 Wicke, S., Schneeweiss, G.M., dePamphilis, C.W., Müller, K.F., Quandt, D., 2011. The evolution of the plastid chromosome in land plants: gene content, gene order, gene function. Plant Mol. Biol. 76, 273–297. https://doi.org/10.1007/s11103-011-9762-4 Wolf, P.G., Rowe, C.A., Hasebe, M., 2004. High levels of RNA editing in a vascular plant chloroplast genome: Analysis of transcripts from the fern Adiantum capillus-veneris. Gene 339, 89–97. https://doi.org/10.1016/j.gene.2004.06.018 Yang, J.B., Li, D.Z., Li, H.T. 2014. Highly effective sequencing whole chloroplast genomes of angiosperms by nine novel universal primer pairs. Mol. Ecol. Resour 14, 1024-1031. https://doi.org/10.1111/1755-0998.12251 Yang, J.B., Tang, M., Li, H.T., Zhang, Z.R., Li, D.Z. 2013. Complete chloroplast genome of the genus Cymbidium: lights into the species identification, phylogenetic implications and population genetic analysis. BMC Evol. Biol. 13. https://doi.org/10.1186/1471-214813-84 Yi, D.K., Kim, K.J., 2012. Complete chloroplast genome sequences of important oilseed crop Sesamum indicum L. PLoS One 7. https://doi.org/10.1371/journal.pone.0035872
Zou, L.H., Huang, J.X., Zhang, G.Q., Liu, Z.J., Zhuang, X.Y., 2015. A molecular phylogeny of Aeridinae (Orchidaceae: Epidendroideae) inferred from multiple nuclear and chloroplast regions. Mol. Phylogenet. Evol. 85, 247–254. https://doi.org/10.1016/j.ympev.2015.02.014 Zuker, M., 2003. Mfold web server for nucleic acid folding and hybridization prediction. Nucleic Acids Res. 31, 3406–3415. https://doi.org/10.1093/nar/gkg595
Figure and table captions Table 1. NGS Results for seven Aeridinae species. Table 2. General features of Aeridinae plastome sequences (Vandeae, Orchidaceae). Table 3. Large repeats shared within Aeridinae species. Table 4. Comparison of divergence time estimation by taxon. Fig. 1. Plastid circle map of seven Aeridinae species. Pseudogenes are marked with the Greek letter c. Fourteen small inversions are marked on the circle map with numbers. Fig. 2. Stem-loop structure of fourteen small inversions across seven Aeridinae species. Major types are represented. Details of free energy, sequences, loop length, and subtypes are described in Supplementary Table S2. Fig. 3. Small inversion trait changes are plotted in the phylogenetic tree. Each species’ small inversion status is represented in the heatmap. Fig. 4. Summary of the gene content of eleven ndh genes in 45 Orchidaceae species. Greencolored boxes represent active genes, light green-colored boxes indicate pseudogenes, and blank boxes denote loss of the gene.
Fig. 5. Comparison of the LSC, IR, and SSC border regions among ten Aeridinae species. Fig. 6. Summary of simple sequence repeats (SSR) across the Aeridinae species. (a) Number of SSRs for each Aeridinae species by SSR unit size. (b) Number of SSRs for each Aeridinae species by location such as CDS, IGS, intron, IR, LSC, and SSC. Fig. 7. A maximum likelihood tree infered from 51 species. Seventy-nine CDS and four nrRNA genes from chloroplast sequences were used. An alignment of 86,623 bp was used to perform RaxML under the GTR substitution model (-346508.438747 ML value). MrBayes_CIPRES was used to construct the partitioned Bayesian tree. The obtained Bayesian inference and bootstrap values are marked above or under the tree node. Fig. 8. A maximum likelihood tree infered from the nuclear internal transcribed spacer (ITS) of Aeridinae species. A total of 396 sequences of Aeridinae, 10 sequences of Vandeae, and 16 sequences of Cymbideae–Epidendreae were used to construct the data matrix. An alignment of 1,156 bp was used to perform RaxML under the GTR substitution model (-20792.820555 ML value). Sequences from the present study are marked in red. Since the entire nrITS ML tree is too large to present in script, the interesting parts are described here, and the entire nrITS ML tree is shown in Supplementary Figure S1. (a) Genus Thrixpermum, part of the nrITS ML tree. (b) Genus Neofinetia, part of the nrITS ML tree. (c) Genus Gastrochilus, part of the nrITS ML tree. (d) Genus Pelatantheria, part of the nrITS ML tree. Fig. 9. Time estimation tree obtained by BEAST. A total of 83 genes (79 CDS and four rRNA) were used to estimate divergence times. Fifty-three genes were for the GTR estimated model, 22 genes for the HKY estimated model, three genes for the HKY all equal model, two genes for the TRN all equal model, and three genes for the TRN estimated model were used as a result of PartitionFinder. Three fossil datasets, Asparagales (normal distribution, mean 105.3, sigma 8.0), Dendrobium (log-normal distribution, sigma 2.0, offset 23.2), and Goodyera (log-
normal distribution, sigma 2.0, offset 15.0) were used to calibrate the nodes. The blue box in the node represents 95% HPD. Supplementary Table S1. Accesion number of sequences used in the present study Supplementary Table S2. Description of fourteen small inversions among ten Aeridinae species Supplementary Table S3. General gene content of Aeridinae species. Each asterisk indicates the presence of an intron. The ndh gene family was truncated, deleted, or pseudogenized in the Aeridinae species individually: Gastrochilus japonicus and G. fuscopunctatus have ndhB and ndhG genes as pseudogenes; Neofinetia falcata, N. falcata ‘CheongSan’, and N. richardsiana have ndhB, ndhD, ndhE, and ndhG genes as pseudogenes; Pelatantheria scolopendrifolia has the ndhB gene as a pseudogene; Thrixspermum japonicum has ndhB and ycf1 genes as pseudogenes. Other ndh genes are not present in the chloroplast genome sequences. Supplementary Table S4. Simple sequence repeat (SSR) distribution across the seven Aeridinae plastomes Supplementary Fig. S1. Aeridinae maximum likelihood tree of nuclear internal transcribed spacers. Supplementary Fig. S2. Ancient trait estimation of each small inversion in ten Aeridinae species. Supplementary Fig. S3. Alignment of three Neofinetia species. All indels are scattered in intergenic space.
Table 1. NGS Results for seven Aeridinae species.
Species Gastrochilus fuscopunctatus Gastrochilus japonicus Neofinetia falcata Neofinetia falcata ‘CheongSan’ Neofinetia richardsiana Pelatantheria scolopendrifolia Thrixspermum japonicum
NGS MiSeq
VoucherSpecimen 2015-1265
Total raw reads
Trimmed reads
# aligned reads to cp
# aligned reads to nrDNA
Average trimmed read length (std dev)
8,793,118
7,635,916
422,385
6,458
217.2 (83.7)
MiSeq
2015-1266
9,956,806
8,570,381
583,081
11,125
239.4 (56.1)
HiSeq2000
2014-0944
20,417,108
20,417,108
489,466
26,071
97.6 (13.9)
HiSeq2000
2014-0010
19,734,280
19,734,280
514,996
18,039
97.8 (13.9)
HiSeq2000
2014-0945
19,942,342
19,942,342
497,925
19,137
97.0 (15.4)
MiSeq
2015-1268
11,868,580
10,587,690
255,395
13,618
249.4 (49.3)
MiSeq
2015-1270
19,904,604
17,466,593
430,896
50,167
248.3 (52.4)
Table 2. General features of Aeridinae plastome sequences (Vandeae, Orchidaceae).
Scientific Name
Accession Number
Genome Size (bp)
LSC
SSC
IR
GC%
Coverage of cp plastome
nrDNA size (bp)
Coverage of nrDNA
Gastrochilus fuscopunctatus
KX87123 3
146,183
83,12 5
11,14 6
25,95 6
36.80 %
847x
7,339
164.5
Gastrochilus japonicus
KX87123 6
147,697
84,69 5
11,17 4
25,91 4
36.80 %
1045x
8,049
262.5
Neofinetia falcata
KT72690 9
146,491
83,80 2
11,77 5
25,45 7
36.60 %
347x
6,456
332.2
Neofinetia falcata ‘CheongSan’
KT72690 8
146,498
83,80 9
11,77 5
25,45 7
36.60 %
335x
6,484
252.0
Neofinetia richardsiana
KT72690 7
146,497
83,80 8
11,77 5
25,45 7
36.60 %
337x
6,276
271.0
Pelatantheria scolopendrifolia
KX87123 2
146,860
86,07 5
11,73 5
24,52 5
36.50 %
501x
7,797
360.8
Thrixspermum japonicum
KX87123 4
149,220
85,30 1
11,54 6
26,18 7
36.10 %
845x
9,314
1133.6
Phalaenopsis aphrodite subsp. formosana
NC00749 9
148,964
85,95 7
11,54 3
25,73 2
36.70 %
-
Phalaenopsis equstris
NC01760 9
148,959
85,96 7
11,30 0
25,84 6
36.70 %
-
Phalaenopsis ‘Tiny Star’
NC02559 3
148,918
85,88 5
11,52 3
25,75 5
36.70 %
-
Table 3. Large repeats shared within Aeridinae species. Species
Region
Length (bp)
G. fuscopunctatus P. scolopendrifolia
trnG(UCC) -trnfM(CAU)
30
ACTACTATGCATACTAGTATGCATAGTAGT
G. fuscopunctatus T. japonicum
petN-psbM
46
ATAGTGTGGTAGAAAGAGCTATATATAGCTCTTTCTACCACACT AT
G. japonicus G. fuscopunctauts
clpP-psbB
26
CTATATATTCTATATAGAATATATAG
accD atpF intron
31
TTGTACGGAAAGTACAAGTAGTATTGAAAAT
24
TTTGTTCCTATTTCTACTATAGAA
atpH-atpI
56
ATCGAAGTAGTTCTGACAATTCAGTAATATTACTGAATTGTCAG AACTACTTCGAT
psbE-petL
26
TATGATTTCTTTCTCCTCCCTCCTGT
rps15-ycf1
38
TATGTTTTGTATATTTTGATCAAAATATACAAAACATA
rps19-psbA
25
AAGATAGCAATCCCCCAATATCTTG
P. aphrodite P. ‘Tiny Star’
N. falcata N. richardsiana P. aprodite P. ‘Tiny Star’ P. equestris
N. falcata 'CheongSan' N. falcata N. richardsiana
Sequences
rps8-rpl14
38
TATAGATGAAAATAGGATATATCCTATTTTCATCTATA
trnG(GCC)trnfM(CAU)
28
CTACTATGCATACTAGTATGCATAGTAG
ycf2
34
AAGTCACTTCGTTTCTTTTTGTCCAAGTCACTTC
accD
24
ATAGACCCCATTGAATTTCATTCA
accD-psaI
24
TTTCTATCTTTACCTTTCAAAACA
ycf1
24
GGAAAAAAAAGATCTTTTTTTTCC
ccsA clpP-psbB
30
TTTCAAAAAAAAATCGATTTTTTTTTGAAA
25
ATTATATATTATTTATTATATATTA
clpP-psbB
27
TTTATATAATATATATATAATATATTT
clpP-psbB
30
ATTTAATATATTATTTAATATTATTTAATA
clpP-psbB
52
petN-psbM
36
TATTATATATTATTTATTATATATTAATTATATATTATTTATTAT ATATTAT TCTTATATATCTTATATATATATAAGATATATAAGA
psbZ-trnG(GCC)
28
ACGCATATATGATATATCATATATGCGT
trnL(UAA) intron
27
TAATATTAATATGAATATGAGTAATAT
N. falcata 'CheongSan' N. falcata N. richardsiana P. scolopendrifolia N. falcata 'CheongSan' N. falcata N. richardsiana T. japonicum G. japonicus G. fuscopunctatus N. falcata 'CheongSan' P. scolopendrifolia G. japonicus G. fuscopunctatus T. japonicum P. scolopendrifolia N. falcata 'CheongSan' N. falcata N. richardsiana T. japonicum P. aphrodite subsp. formosana P. ‘Tiny Star’ P. equestris
ycf2
45
TCTTTTTGTCCAAGTCACTTCGTTTCTTTTTGTCCAAGTCACTTC
rps16 intron
34
TTTCTATTTCTATCTATATAGATAGAAATAGAAA
ycf2
34
GAAGTGACTTGGACAAAAAGAAACGAAGTGACTT
trnH-psbA
29
AAGATAGCAATCCCCCAATATCTTGTTCT
trnS(GCU)
26
CGGAGAGAGAGGGATTCGAACCCTCG
ycf2
34
AAGTCACTTCGTTTCTTTTTGTCCAAGTCACTTC
Table 4. Comparison of divergence time estimation by taxon. Node
This study (83 genes, stem age)
1.
Orchidaceae
101.82
Gustafsson et al., 2010
Givnish et al., 2015
(two genes, stem age)
(three genes, stem age)
104
111.38
77
89.46
69
83.6
71
76.43
61
63.99
49
48.06
35
32.67
N/A
9.54
(88.03-117.00)
2.
Apostasioideae
68.93 (55.56-85.54)
3.
Vanilloideae
59.33 (44.66-73.73)
4.
Cypripedioideae
52.92 (42.87-66.05)
5.
Orchidoideae
44.65 (36.22-55.76)
6.
Neottieae
39.75 (32.05-49.48)
7.
Vandeae
25. 31 (21.17-31.05)
8.
Phalaenopsis – Other Aeridinae
13.75 (9.21-18.33)
Graphical abstract
Highlights 1. Plastomes of seven species within Aeridinae (Orchidaceae) were completely decoded. 2. Fourteen small inversions were found in Orchidaceae for the first time. 3. All 11 ndh genes in the Aeridinae plastome were lost or pseudogenized. 4. Phylogenetic relationship of Aeridinae was discussed with plastome and 422 nrITS data. 5. Divergence times for major lineages of Orchidaceae using whole plastome data were found to be more recent, 5–10 Mya, than previous studies, which only used two or three genes.
Author Contributions K.-J.K. and M.K. designed research; K.-J. K., Y.-K.K., S.J., S.-H.C., and M.K. collected research materials; Y.-K.K. and S.J. performed research; Y.-K.K., S.J., and S.-H.C. analyzed data and deposited the data to NCBI data library; and K.-J.K. and Y.-K.K. wrote manuscript; and K.-J.K. and Y.D.K. secured research funds.